documenttopic
Documenttopic is a term used to describe the thematic composition assigned to a document by topic modeling methods. In typical formulations, each document is represented by a topic distribution, often denoted theta_d, which gives the probability mass across a set of latent topics. The topics themselves are distributions over words and are learned from a corpus.
The most common approach is Latent Dirichlet Allocation (LDA), where documents are mixtures of topics and topics
Interpreting documenttopic requires inspecting the topic-word distributions to label topics; the documenttopic vectors can be used
Evaluation uses perplexity on held-out data and topic coherence measures such as UMass, CV, or C_P, with
See also: topic modeling, LDA, NMF, neural topic models, topic coherence.