kSubstrat
kSubstrat is a framework and set of techniques for processing k-length substrings, or k-substrings, within strings, sequences, and logs. It generalizes the concept of k-grams by focusing on scalable extraction, indexing, and querying of all substrings of fixed length k.
The core components include a k-substring extractor, which slides a window of length k across a sequence
Operations supported by kSubstrat typically include exact substring matching, frequency counting, and existence queries, with extensions
Applications span multiple domains. In genome analysis, k-substrings correspond to k-mers used in assembly and comparison.
Performance and limitations: extraction is generally linear in the input length, while memory usage depends on
Related concepts include k-mer, n-gram, suffix array, and de Bruijn graph. See also: k-mer, n-gram, substring.