variablegram

A variablegram is a unit of sequence data that generalizes the idea of an n-gram by allowing the length of the unit to vary within a predefined range. Unlike fixed-length n-grams, which are limited to a single window size, a variablegram can capture motifs that occur at different scales, from short collocations to longer phrases. The concept is used in fields such as natural language processing, text mining, and bioinformatics to model patterns that do not conform to a single fixed length.

Construction typically involves selecting a minimum length m and maximum length M, then enumerating all substrings

Relation to related concepts: variablegrams extend n-grams and relate to variable-length motifs, substrings, and k-grams with

variable-length

informativeness.