Home

corpusdriven

Corpusdriven, often written as corpus-driven, is a term in linguistics describing approaches in which evidence from large text corpora guides linguistic description and theory. In corpusdriven research, patterns and generalizations are expected to emerge from actual usage, with hypotheses revised to fit observed frequencies, collocations, and constructions, rather than being imposed from preconceived categories.

Methodology and scope: Researchers analyze large corpora—written, spoken, or mixed—using statistical and computational tools to uncover

Relation to theory and limitations: Some scholars treat corpusdriven work as distinct from corpus-based studies, emphasizing

frequency
patterns,
collocations
and
multiword
expressions,
syntactic
constructions,
and
variation
across
registers
or
genres.
Outputs
include
distributional
profiles
and
evidence
of
semantic
prosody,
which
reflect
evaluative
associations
in
context.
Data
sources
range
from
balanced
general
corpora
to
domain-specific
or
web-derived
corpora.
discovery
over
verification;
in
practice
the
distinction
is
blurred.
Benefits
include
empirical
grounding,
discovery
of
usage
patterns
beyond
intuition,
and
relevance
for
lexicography
and
language
teaching.
Limitations
include
corpus
representativeness,
annotation
biases,
and
the
risk
of
drawing
broad
conclusions
from
limited
data;
results
are
typically
interpreted
alongside
qualitative
analysis
and
existing
theory.