Home

Codepoint

A code point is a numeric value that uniquely identifies a character in a character set or encoding system. In Unicode, code points are abstract identifiers for characters and are typically written as a value with the prefix U+, followed by hexadecimal digits. For example, U+0041 designates the Latin capital letter A, and U+1F600 designates the grinning face emoji. A code point is not the character itself and not a specific byte sequence; it is an abstract reference used by software to identify a character.

Code points are mapped to bytes by encodings. Unicode defines several encoding forms, such as UTF-8, UTF-16,

Unicode assigns code points in the range U+0000 to U+10FFFF, organized into planes, with the Basic Multilingual

Code points are distinct from glyphs—the shapes produced by fonts. A single code point may render differently

and
UTF-32,
which
specify
how
code
points
are
represented
in
storage
or
transmission.
For
instance,
the
code
point
U+0041
is
encoded
as
the
single
byte
0x41
in
UTF-8,
as
0x0041
in
UTF-16,
and
as
0x00000041
in
UTF-32.
The
encoding
form
determines
the
number
of
bytes
and
their
arrangement,
while
the
code
point
remains
the
universal
identifier
for
the
character.
Plane
spanning
U+0000
to
U+FFFF
and
supplementary
planes
from
U+10000
to
U+10FFFF.
Some
ranges
are
reserved
for
private
use,
and
the
surrogate
pair
range
U+D800
to
U+DFFF
is
used
in
UTF-16
encoding
and
is
not
assigned
to
characters
itself.
Certain
code
points
are
non-characters
or
reserved
for
internal
use.
across
fonts,
and
multiple
code
points
can
combine
to
form
a
single
visual
symbol.
They
are
central
to
text
processing,
storage,
search,
and
normalization
in
modern
computing.