diacriticremoved
Diacritic removal, in computing and linguistics, refers to transforming characters that carry diacritical marks into their base Latin letters without diacritics. It is often used to normalize text for searching, indexing, and data exchange. The operation is most commonly applied to Latin-script text, but many scripts have diacritics as well.
Typical approach is to decompose characters into base letters and combining diacritics (Unicode normalization form NFD
Limitations: removing diacritics can alter pronunciation, meaning, or orthography. Some languages rely on diacritics to distinguish
Applications: text search and normalization; URL slugs and file-name generation; data deduplication and indexing; OCR post-processing