textssuch
Textssuch is a modular text processing and search framework designed to enable rapid pattern matching and semantic querying across large text collections. It provides a command-line interface, an embeddable library, and a lightweight query language intended for log analysis, data cleaning, and exploratory text research.
Originally conceived by a small team of developers, textssuch emphasizes speed and composability. The project is
Key features include tokenization with language detection, optional stemming and lemmatization, stopword handling, and optional indexing
The query language uses a simple syntax with quoted phrases, operators such as AND, OR, NOT, and
Usage examples emphasize clarity and reproducibility: textssuch -q 'error AND timeout' logs/*.log -o results.json, or textssuch