Home

Wordlists

Wordlists are compilations of words or strings used as input, reference data, or training material for software, linguistics, and security tasks. They vary by language, domain, and purpose, and can be general-purpose or specialized, single-language or multilingual.

Common categories include dictionary or lexicon lists used in spell checking and language modeling; frequency lists

Wordlists are typically generated from corpora, dictionaries, or public resources, and may be augmented with morphological

Important considerations include coverage and quality (how well the list represents the target language or domain),

See also: dictionaries, corpora, lexical databases, natural language processing resources.

that
rank
words
by
usage
in
large
text
corpora;
stemmer
or
lemmatizer
dictionaries
that
map
inflected
forms
to
canonical
lemmas;
and
domain-specific
lists
such
as
technical
vocabularies,
named-entity
lists,
or
multiword
expressions.
In
information
security,
password
wordlists
are
large
compilations
of
candidate
passwords
used
for
auditing
password
strength
or
security
assessments;
their
use
raises
ethical
and
legal
considerations
and
should
comply
with
applicable
rules.
variants,
synonyms,
or
transliterations.
They
are
stored
in
formats
such
as
plain
text
(one
word
per
line)
or
structured
formats
like
JSON
or
CSV,
and
may
include
metadata
such
as
frequency
counts
or
language
tags.
They
support
applications
including
search
indexing,
spell
checking,
autocomplete,
data
cleaning,
and
language
research.
normalization
and
de-duplication,
licensing
and
attribution,
and
storage
efficiency.
Wordlists
can
reflect
biases
in
the
source
data
and
may
require
updates
to
remain
current.