Home

Unicode

Unicode is a character encoding standard designed to support the representation and processing of text from all writing systems. It assigns a unique code point to each character, symbol, punctuation, and control function, enabling consistent encoding, storage, and interchange across software and platforms.

Unicode was developed by the Unicode Consortium and is coordinated with ISO/IEC 10646. It defines a repertoire

Encoding forms: UTF-8, UTF-16, and UTF-32. UTF-8 is variable-length and ASCII-compatible, using 1 to 4 bytes per

Within Unicode, characters may be combined with diacritics or other marks, leading to normalization forms such

Impact and usage: Unicode is the dominant standard for text encoding in modern software, the web, and

that
includes
more
than
100,000
characters
across
multiple
scripts,
symbols,
and
emoji.
The
code
space
ranges
from
U+0000
to
U+10FFFF,
organized
into
17
planes,
with
the
Basic
Multilingual
Plane
containing
the
most
commonly
used
characters.
code
point.
UTF-16
uses
2
bytes
for
most
characters
and
4
bytes
for
supplementary
characters
via
surrogate
pairs.
UTF-32
uses
a
fixed
4
bytes
per
code
point.
Endianness
and
byte
order
marks
apply
to
UTF-16
and
UTF-32
as
needed.
as
NFC,
NFD,
NFKC,
and
NFKD
that
define
canonical
or
compatibility
equivalence
of
text.
Rendering
requires
fonts
and
rendering
engines
to
map
code
points
to
glyph
shapes.
data
interchange.
It
supports
internationalization
and
multilingual
text,
but
practical
limits
include
font
coverage
and
the
complexity
of
emoji
selection
and
variation
selectors.