Home

characterset

A characterset, also written as character set, is a defined collection of characters that a system can recognize, display, or process. Each character in the set is associated with a code point, a numeric value used by software to identify the character.

A characterset is distinct from a character encoding. The encoding is the method by which code points

Common examples include ASCII, a 7-bit set of 128 characters; ISO/IEC 8859 series and Windows-1252, which extend

UTF-8 is the most widely used encoding on the web and is backward compatible with ASCII, using

In modern computing, Unicode is commonly adopted as the character set, with UTF-8 as the preferred encoding

are
translated
into
bytes
for
storage
or
transmission.
Conversely,
fonts
determine
how
characters
are
visually
rendered,
but
do
not
define
the
set
of
characters
or
their
encoded
representations.
ASCII
with
8-bit
code
points
for
various
languages;
and
Unicode,
a
large,
universal
set
designed
to
cover
almost
all
written
languages.
While
Unicode
provides
code
points
for
characters,
actual
data
interchange
relies
on
encodings
such
as
UTF-8,
UTF-16,
or
UTF-32
to
map
those
code
points
to
bytes.
one
to
four
bytes
per
code
point.
UTF-16
uses
2-
or
4-byte
units,
and
UTF-32
uses
fixed
4-byte
units.
Endianness,
byte
order
marks,
and
normalization
can
affect
how
encoded
text
is
interpreted,
occasionally
causing
mojibake
when
misinterpreted.
for
interoperability.
Clear
specification
of
the
encoding
used
in
files,
networks,
and
databases
remains
essential
to
ensure
correct
text
handling
across
systems.