Home

tekensets

Tekensets is a formal concept in typography and information processing that denotes a defined collection of characters and symbols used to represent text in a particular language, domain, or system. A tekenset specifies which characters are available, their ordering, and rules for combining them. The term is used in academic discussions of writing systems as well as in software design to plan character repertoires for fonts, input methods, or data encodings.

A tekenset typically includes base letters, digits, punctuation, diacritics, ligatures, numerals, and special symbols; it may

Relation to existing standards: Tekensets are conceptually distinct from encodings but are closely tied to them.

Applications and challenges: Tekensets guide font design, input method editors, localization, and data interchange. Challenges include

See also: character set, Unicode, ASCII, ligature, typography.

also
include
control
or
formatting
codes
in
some
contexts.
Tekensets
can
be
fixed,
as
in
a
specific
encoding,
or
extensible,
as
standards
allow
for
additional
symbols
through
staged
updates.
In
practice,
tekensets
are
often
implemented
by
mapping
them
to
a
universal
encoding
such
as
Unicode,
which
assigns
unique
code
points
to
the
symbols.
Historically,
several
character
repertoires
emerged
before
Unicode,
such
as
ASCII,
ISO/IEC
646,
and
various
national
sets.
Modern
typography
and
software
design
commonly
treat
a
tekenset
as
a
subset
of
Unicode
or
as
a
bridge
between
legacy
encodings
and
Unicode.
normalization,
compatibility,
and
rendering
of
composite
characters,
diacritics,
and
ligatures
across
platforms
and
languages.