Ngrammit
ngrammit is a software toolkit designed to extract and analyze n-grams from text data. It provides utilities to generate unigrams through higher-order n-grams, count frequencies, and compute statistics to support natural language processing, corpus linguistics, and text mining. The library emphasizes a modular design that can be integrated into preprocessing pipelines for research or production tasks. Key features include tokenization options, case normalization, punctuation handling, optional stopword filtering, and support for skip-gram configurations. It can operate on large corpora via streaming processing or on in-memory datasets and can read plain text as well as structured inputs such as JSON or CSV.
Output from ngrammit includes frequency tables, relative frequencies, and ranking of n-grams by measures such as
ngrammit is released as open-source software with a permissive license. It is maintained by a community of