Home

Multiword

Multiword refers to sequences of two or more words that function as a single unit within a language. In linguistic and natural language processing contexts, this concept is most often discussed under the heading of multiword expressions (MWEs). MWEs encompass a wide range of word combinations, including idioms, phrasal verbs, fixed collocations, and compound nouns. Many MWEs have meanings or usages that cannot be reliably inferred from their individual components, though some are more transparent or compositional.

Common subtypes include idioms, where the whole expression has a figurative meaning (for example, kick the bucket

MWEs pose particular challenges for language technologies. They can affect tokenization, parsing, machine translation, and information

meaning
to
die);
phrasal
verbs,
where
a
verb
combines
with
a
particle
or
preposition
to
yield
a
distinct
meaning
(look
up,
break
down);
collocations,
which
are
habitual
word
pairings
with
strong
co-occurrence
tendencies
(strong
tea,
heavy
rain);
and
compound
nouns
or
proper
names
(New
York,
coffee
table).
Light-verb
constructions,
such
as
take
a
walk
or
make
a
decision,
are
another
often
discussed
category.
MWEs
can
be
fixed
or
semi-fixed,
and
they
may
vary
in
their
degree
of
syntactic
flexibility.
retrieval,
especially
when
their
meanings
are
non-literal
or
when
they
translate
poorly
across
languages.
Detection
and
treatment
of
MWEs
rely
on
linguistic
analysis,
statistical
methods,
and,
increasingly,
neural
models,
often
aided
by
specialized
lexicons
and
annotated
corpora.
Cross-linguistic
variation
means
MWEs
behave
differently
across
languages,
requiring
language-specific
resources
and
approaches.