STRmerkit
STRmerkit is a software library developed by the research group at the Institute of Computational Science for efficient computation of string-based pattern recognition tasks. The library implements a modular architecture that separates core string manipulation routines from domain‑specific machine learning components. Core components include fast suffix tree construction, substring search, and edit‑distance calculation. These are exposed through a C++ API and can be accessed from Python using a lightweight wrapper built with pybind11.
The design of STRmerkit emphasizes performance and scalability. Internally, the library uses memory‑mapped data structures and
STRmerkit also offers a set of high‑level utilities for motif discovery, clustering of string patterns, and
Numerous research articles, including a 2022 publication in the Journal of Bioinformatics, have cited STRmerkit for