UTF8 - Infinite Lexicon - Infinite Lexicon

UTF8

UTF-8 is a variable-length character encoding for Unicode that has become the dominant encoding for text on the Internet. It encodes every Unicode code point using one to four bytes, with the first 128 code points identical to ASCII to preserve backward compatibility with existing text.

In UTF-8, ASCII characters use a single byte (0x00 to 0x7F). Multibyte sequences use leading bit patterns:

History and usage: UTF-8 was developed in the early 1990s as part of the Unicode standard and

Advantages and considerations: UTF-8 offers ASCII compatibility, variable length for efficient English text, self-synchronization, and no

See also Unicode, UTF-16, UTF-32, ASCII.

self-synchronizing,

a