langidpy
Langidpy is a Python library for automatic language identification. It implements a character n-gram based classifier to determine the language of a given text, typically returning a language code along with a confidence score. The library ships with a pre-trained model that covers a broad set of languages and performs well on short to moderately long text snippets. Output languages are reported using ISO language codes, commonly ISO 639-1.
Langidpy provides both a Python API and a command-line interface. In Python, the typical usage involves calling
A notable strength of langidpy is its ability to be retrained with user-provided data. This allows domain
Limitations include reduced accuracy on very short texts or when texts contain multiple languages or code-switching.