u8p
u8p, also known as 8-bit Unicode Transformation Format, is a character encoding scheme used in computing to represent Unicode characters. It is a variable-width encoding that uses one to four bytes per character, depending on the character's Unicode code point. The first 128 code points (U+0000 to U+007F) are represented using a single byte, which is identical to ASCII. Code points from U+0080 to U+07FF are represented using two bytes, and code points from U+0800 to U+FFFF are represented using three bytes. Code points from U+10000 to U+10FFFF are represented using four bytes.
u8p is designed to be backward compatible with ASCII and UTF-8, making it suitable for use in
The u8p encoding scheme is defined in the Unicode Standard, which is maintained by the Unicode Consortium.
u8p is not to be confused with UTF-8, which is a different variable-width character encoding scheme used