Home

ordlister

Ordlister are lists of words used in linguistics, education, lexicography, and natural language processing. They can be monolingual or multilingual and are commonly organized alphabetically, by frequency, by semantic domain, or by morphological properties. The term ordlister is Danish in origin and translates to "word lists." In other languages the same concept is referred to as word lists or lexicons, depending on scope and usage.

Ordlister are typically constructed from existing linguistic resources such as corpora, dictionaries, or teacher-curated inventories. Common

Applications include language learning and teaching, dictionary compilation, text analysis, search optimization, and various natural language

Limitations and considerations: word lists reflect source data and time of compilation; dialect, register, and orthographic

variants
include
frequency
lists
that
rank
words
by
usage,
orthographic
lists
that
group
items
by
spelling
patterns,
morphological
or
lemmatized
lists
that
reduce
inflected
forms
to
base
lemmas,
and
domain-specific
lists
that
collect
technical
terms
or
proper
names.
In
computational
contexts,
word
lists
may
serve
as
lexicons
for
tagging,
as
stopword
sets
for
filtering,
or
as
input
for
spell-checkers
and
search
engines.
processing
tasks.
Word
lists
enable
researchers
to
study
vocabulary
size
and
distribution,
support
vocabulary
acquisition,
and
provide
resources
for
managing
low-resource
languages.
They
are
also
used
to
annotate
corpora
and
to
benchmark
NLP
systems.
conventions
influence
content.
They
may
omit
multiword
expressions
or
proper
nouns,
and
their
usefulness
depends
on
size,
granularity,
and
intended
domain.
See
also
frequency
lists,
lexicons,
and
corpora.