charsets - Infinite Lexicon - Infinite Lexicon

charsets

Charsets, short for character sets, are systems that map characters used in written language to numeric codes. They enable computers to store and transmit text by representing each character as a number, typically a byte or a sequence of bytes.

A charset is not the same thing as an encoding, though the two terms are often used

Unicode is a universal character set that assigns a unique code point to each character across the

Common legacy and regional charsets include ASCII (7-bit), ISO-8859-1 (Latin-1), Windows-1252, Shift JIS, EUC-KR, GB2312, Big5,

Standards organizations maintain charsets and encodings. Unicode is defined by the Unicode Consortium and ISO/IEC 10646.

Practical issues include mojibake from decoding with the wrong charset, endianness differences in multi-byte encodings, and

Best practice is to use Unicode encodings (prefer UTF-8) for new data, declare the encoding in interfaces

interchangeably.

a

a

interoperability

encoding-related