Home

Dediacritized

Dediacritized is an adjective describing text from which diacritical marks—such as accents, tildes, diaeresis, umlauts, cedillas, and other diacritics—have been removed. The result is base letters without their distinguishing marks, often referred to as ASCII or unaccented equivalents. The term combines de- (remove) with diacritic, and is used in linguistics, data processing, and digital typography to indicate the lack of diacritics in a given form.

In digital processing, dediacritization is typically achieved through Unicode normalization and diacritic-stripping algorithms. A common approach

Applications include search and indexing, where diacritics can impede matching, data migration and interoperability between systems

decomposes
characters
into
a
base
letter
plus
combining
diacritical
marks
(for
example,
café
→
cafe)
and
then
discards
the
combining
marks.
Some
workflows
map
to
ASCII
using
transliteration
rules
(cafe,
naïve
→
naive,
São
→
Sao).
Challenges
arise
because
some
languages
rely
on
diacritics
to
distinguish
meanings
or
pronunciations,
and
not
all
characters
have
unambiguous
base-letter
equivalents
(for
example,
Turkish
i/İ
or
Scandinavian
ø).
with
limited
character
support,
and
user
interfaces
that
require
plain-letter
input.
Dediacritization
can
improve
compatibility
and
consistency
but
may
obscure
orthographic
information
and
alter
interpretation.
Contextual
use
often
pairs
dediacritized
forms
with
their
diacritized
originals
to
balance
searchability
and
readability.