gensim

Gensim is an open-source Python library for topic modeling, document similarity, and vector space modeling. It is designed to process large text corpora using memory-efficient streaming and incremental algorithms, enabling scalable exploration of semantic structure in text.

The library provides implementations of core topic modeling algorithms such as Latent Dirichlet Allocation (LDA), Latent

Gensim supports offline and online training, including multi-core parallelization via LdaMulticore, and streaming interfaces that process

The project was started by Radim Rehurek and is maintained by a community of contributors. It is