Home

UTF16LE

UTF-16 Little Endian (UTF-16LE) is a Unicode encoding form that uses 16-bit code units to represent Unicode scalar values. It is a variant of UTF-16 that stores the least significant byte first in each 16-bit unit, i.e., little-endian order.

In UTF-16LE, code points in the Basic Multilingual Plane (U+0000 to U+FFFF) are encoded as a single

The bytes within each 16-bit unit are stored in little-endian order, so the low byte comes first.

A Byte Order Mark (BOM) can be used to signal endianness when UTF-16 is used in contexts

UTF-16LE is widely used in Windows environments and in various programming APIs for internal string representation.

16-bit
unit.
Code
points
above
U+FFFF
(up
to
U+10FFFF)
require
a
surrogate
pair,
consisting
of
a
high
surrogate
in
the
range
U+D800
to
U+DBFF
followed
by
a
low
surrogate
in
the
range
U/DC00
to
U+DFFF.
For
example,
the
character
'A'
(U+0041)
becomes
41
00
in
UTF-16LE.
The
character
U+1D11E
(musical
symbol
G
clef)
is
encoded
as
the
surrogate
pair
0xD834
0xDD1E,
yielding
the
byte
sequence
34
D8
1E
DD
in
UTF-16LE.
where
endianness
might
be
ambiguous.
For
UTF-16LE,
the
BOM
would
be
the
bytes
FF
FE.
The
BOM
is
optional;
when
the
encoding
is
explicitly
specified
as
UTF-16LE,
the
BOM
may
be
omitted.
It
can
be
more
compact
than
UTF-8
for
many
non-ASCII
scripts,
but
it
introduces
complexity
due
to
surrogate
pairs
and
makes
random
access
by
code
points
more
involved
compared
to
fixed-width
encodings.