Zdteks
zdteks is a lightweight, open‑source toolkit designed for extracting structured data from unstructured textual documents. Developed in 2022 by a consortium of research laboratories at the University of Grenoble and the Institute of Data Science in São Paulo, the project aims to provide a unified interface for natural language processing, regular expression matching, and machine learning‑based entity recognition. The name “zdteks” derives from the French phrase “Zéro Dépendance–Texte,” reflecting the library’s emphasis on minimal external dependencies while working primarily on text.
The core of zdteks revolves around a modular architecture that allows users to chain processing steps—tokenization,
In practice, zdteks is employed in domains that require rapid prototyping of text analytics workflows, including
Future development plans for zdteks include support for multilingual processing, incorporation of transformer‑based models for higher‑accuracy