BERTopic

BERTopic is a topic modeling approach and open-source Python library designed to discover topics in large collections of text. It combines modern neural sentence embeddings with traditional clustering to produce coherent topics, even from short texts, without requiring bag-of-words representations.

At its core, BERTopic transforms documents into dense vector embeddings using transformer models (for example, sentence-transformers).

The workflow typically involves: computing embeddings for documents, reducing and clustering, extracting topic representations, and assigning

BERTopic is language-agnostic when used with suitable multilingual transformer models, and it integrates with common Python

Applications include analyzing large text collections in business, research, journalism, and social media to discover themes,

a

human-interpretable

hyperparameters;