Home

Keyness

Keyness is a term used in corpus linguistics to describe how characteristic a word or lexical item is of a target corpus relative to a reference corpus. A word is said to have high keyness if it occurs more frequently in the target corpus than would be expected given the reference distribution; conversely, a word with low or negative keyness is underrepresented in the target corpus.

Calculation typically involves comparing observed word frequencies in the target and reference corpora with an expected

Keyness analysis is widely used for exploratory discourse analysis, genre description, authorial studies, historical linguistics, and

Limitations and cautions include dependence on the quality and comparability of the corpora, sample size effects,

distribution.
The
standard
approach
uses
a
2x2
contingency
table:
word
present
or
absent
versus
target
or
reference
corpus.
Statistical
measures
such
as
the
log-likelihood
ratio
(G-test)
or
Pearson
chi-square
quantify
the
deviation
from
expectation.
Other
variants
include
standardized
log
odds
of
frequency,
log
odds
ratio,
and
permutation-based
methods.
The
result
is
a
keyness
score,
often
used
to
rank
words
to
form
a
keyword
list
for
the
target
corpus.
media
analysis.
It
helps
identify
terms
that
signal
topics,
styles,
or
sociolinguistic
features
distinctive
of
the
target
corpus.
Positive
keyness
highlights
items
that
define
the
material,
while
negative
keyness
highlights
items
relatively
suppressed.
and
genre
or
topic
differences
that
may
drive
results
rather
than
intrinsic
characteristics.
Statistical
significance
does
not
in
itself
determine
interpretive
importance;
qualitative
analysis
is
usually
required
to
explain
why
a
term
is
keyworded
in
a
given
context.