Home

Codepages

Codepages, also called code pages or character maps, are mappings between 8-bit byte values and characters used by a computing system to display text. Each code page defines up to 256 characters, though some values may be reserved for control codes rather than printable characters. Code pages are often identified by a number, especially in IBM and Microsoft ecosystems (for example code page 437).

Historically, there was no universal standard for representing characters beyond the basic ASCII set. The IBM

Common examples include ASCII (0–127), ISO-8859-1 (Latin-1) for Western European languages, Windows-1252, Windows-1251 for Cyrillic, Shift

Issues with codepages arise when text encoded in one code page is misinterpreted as another, producing mojibake.

PC
introduced
code
page
437,
which
provided
printable
characters
for
English
and
some
additional
symbols.
Over
time,
other
code
pages
added
support
for
more
languages,
leading
to
a
proliferation
of
encodings.
ISO/IEC
8859
standards
offered
several
8-bit
encodings
for
different
language
groups,
such
as
ISO-8859-1
(Latin-1).
Windows
developed
its
own
family
of
code
pages
(for
instance
Windows-1252,
Windows-1251).
Some
encodings,
like
Shift
JIS
and
GB2312,
use
multiple
bytes
per
character,
so
they
are
not
strictly
single-byte
code
pages,
but
they
are
still
commonly
referred
to
as
code
pages.
JIS
for
Japanese,
GB2312
and
its
successors
for
Chinese,
and
EUC-KR
for
Korean.
MacRoman
is
another
historical
example.
With
the
widespread
adoption
of
Unicode
and
UTF-8,
new
data
often
avoids
code
pages,
but
legacy
data,
software,
and
systems
continue
to
rely
on
them.
Modern
software
typically
supports
conversion
between
code
pages
and
Unicode
to
maintain
compatibility.