gensim
Gensim is an open-source Python library for topic modeling, document similarity, and vector space modeling. It is designed to process large text corpora using memory-efficient streaming and incremental algorithms, enabling scalable exploration of semantic structure in text.
The library provides implementations of core topic modeling algorithms such as Latent Dirichlet Allocation (LDA), Latent
Gensim supports offline and online training, including multi-core parallelization via LdaMulticore, and streaming interfaces that process
The project was started by Radim Rehurek and is maintained by a community of contributors. It is