Home

casefolds

Casefolds refer to the process of converting text to a form suitable for case-insensitive comparison by applying Unicode case folding rules. Case folding is designed to be language- and locale-independent, enabling consistent caseless matching across scripts and alphabets. It is defined in the Unicode Standard as a form of mapping from code points to a caseless representation, and includes not only simple lowercase mappings but also expansions where a single character becomes multiple letters, as well as ligature expansions.

In practice, casefolding often differs from simple lowercasing. For example, the German character ß lowercases to

Implementation and usage: Many programming environments provide a casefold function or method. In Python, strings expose

Relationship to normalization: Case folding is distinct from Unicode normalization; both may be applied, but for

“ß”
but
casefolding
produces
“ss”
(and
uppercase
ß
is
seldom
used);
ligature
characters
such
as
“fi”
may
be
expanded
to
“fi”
or
“ffi”
depending
on
the
mapping.
In
general,
casefold
aims
to
maximize
the
odds
that
two
strings
compare
equal
under
a
case-insensitive
comparison.
casefold():
“ß”.casefold()
->
“ss”.
In
ICU-based
libraries,
there
is
a
CaseFolding
operation
derived
from
Unicode.
Case
folding
is
intended
for
comparisons,
not
for
display
or
normalization
of
text;
it
is
not
a
locale-aware
transformation.
comparison
purposes,
case
folding
should
precede
or
accompany
normalization
depending
on
the
pipeline.
See
Unicode
Standard
Annex
21
for
the
exact
mappings.