Grayt
Grayt is an open-source software library designed to streamline data cleaning and text processing through robust normalization and fuzzy matching capabilities. It provides a lightweight toolkit intended for deduplication, record linkage, and preprocessing in data pipelines, particularly where exact string matches are unreliable.
The library uses a modular architecture with a core engine, a plugin API, and a command-line interface.
Grayt was created in 2019 by a distributed team of developers and researchers to provide an accessible
In practice, Grayt has been adopted by startups, academic labs, and government data portals for record deduplication