Home

collations

A collation is a set of rules that determine how strings are compared and ordered. Collations reflect linguistic conventions and depend on locale. They govern which characters are considered equal, how accents and case are treated, and how punctuation and digits influence ordering.

Collations define strength or sensitivity levels that describe how strictly differences are treated. At the primary

Standards include the Unicode Collation Algorithm (UCA), which provides a language-independent framework and uses data from

In databases and software, collations affect sorting (ORDER BY), indexing, and string comparisons for equality. They

Common database examples include utf8_general_ci and utf8_unicode_ci in MySQL, locale-based collations in PostgreSQL, COLLATE clauses in

level,
differences
such
as
case
and
diacritics
are
ignored;
secondary
and
tertiary
levels
distinguish
accents
and
case;
higher
levels
may
also
consider
punctuation
and
spacing.
Unicode
normalization
and
canonical
equivalence
influence
comparisons
across
character
forms,
especially
for
characters
that
have
multiple
valid
representations.
CLDR.
Implementations
exist
in
ICU
and
in
many
programming
libraries;
databases
expose
collations
as
locale-
or
charset-specific
rules.
Binary
or
raw
collations
compare
bytes
directly
and
are
deterministic
but
do
not
reflect
natural
language
ordering.
enable
locale-aware
search
and
consistent
results
across
applications,
but
can
complicate
cross-language
data
and
introduce
performance
considerations.
Before
applying
a
collation,
normalization
to
a
canonical
Unicode
form
(for
example
NFC)
is
often
advisable
to
ensure
stable
comparisons.
SQL
Server,
and
Oracle
NLS_SORT
options.