Home

Udpipe

UDPipe is an open-source natural language processing toolkit designed to process text in alignment with the Universal Dependencies framework. It provides a trainable pipeline that can tokenize, segment sentences, perform part-of-speech tagging, lemmatization, morphological analysis, and dependency parsing. The resulting annotations are compatible with the CoNLL-U format used by UD treebanks, enabling standardized linguistic annotation across languages.

The tool combines language-specific models with a general architecture that supports training on user-provided data. Users

UDPipe provides models for many languages covered by the Universal Dependencies project, allowing researchers and developers

Availability and licensing are governed by its open-source status, with distribution through the official UDPipe resources

can
apply
pre-trained
models
to
annotate
raw
text
or
train
new
models
on
UD
treebanks
to
create
language-specific
parsers.
UDPipe
is
implemented
in
C++
and
offers
bindings
for
Python
and
Java,
making
it
usable
as
a
command-line
utility
or
as
a
library
within
larger
NLP
workflows.
to
annotate
text
out
of
the
box
for
a
wide
range
of
languages.
It
also
supports
training
custom
models
from
UD
data,
facilitating
experiments
in
language-specific
morphology
and
syntax.
The
output
can
be
used
directly
for
downstream
tasks
such
as
parsing,
corpus
annotation,
or
linguistic
research,
and
can
be
converted
or
integrated
into
other
UD
processing
pipelines.
and
the
Universal
Dependencies
project.
The
toolkit
is
commonly
used
in
academic
and
research
contexts
for
linguistic
annotation,
treebank
construction,
and
evaluation
of
parsing
and
tagging
models.