ngramanalyser
ngramanalyser is a software tool or library designed to extract and analyze n-grams from text. It supports both word-level and character-level n-grams and is commonly used in natural language processing, information retrieval, and linguistic research. The tool tokenizes input text, applies optional normalization such as lowercasing, accents removal, and punctuation handling, and can perform stopword removal or stemming. Users can specify a range of n values (for example, from 1 to 2 or 2 to 5), and the analyzer computes all n-grams within that range using a sliding window over the token sequence. The resulting data typically include frequency counts for each n-gram, relative frequencies, and the ability to extract the top-N most frequent n-grams. Some implementations offer additional features such as confidence weighting, min-supported thresholds, or filtering by length.
ngramanalyser outputs are designed to integrate with data pipelines and can be exported to common formats
Implementations vary across languages, but typical offerings are available in Python, Java, and JavaScript, with APIs