npmi
Normalized pointwise mutual information (NPMI) is a measure of association between two discrete events, commonly words, that quantifies how much their joint occurrence deviates from what would be expected if they were independent, after adjusting for their individual frequencies. It is defined as NPMI(w1,w2) = PMI(w1,w2) / -log p(w1,w2), where PMI(w1,w2) = log [ p(w1,w2) / (p(w1) p(w2)) ]. The logarithm is typically natural or base 2, and the normalization yields a range from -1 to 1. A value near 1 indicates a strong positive association, 0 indicates independence, and values near -1 suggest mutual exclusivity.
Estimation is usually performed from a text corpus. p(w) is approximated by the frequency of w divided
NPMI is used to identify collocations and informative word pairs, and serves as a feature in various
Limitations include dependence on corpus size, windowing strategy, and smoothing choices; rare-word bias can persist. Variants