UDcorpora

UDcorpora is the collection of annotated corpora produced under the Universal Dependencies (UD) project. It provides multilingual corpora annotated with universal part-of-speech tags, lemmas, and dependency relations, all following the UD annotation guidelines. The aim is to enable cross-linguistic comparison and support research in natural language processing, corpus linguistics, and language typology.

Content and format: UDcorpora encompasses a wide range of treebanks across many languages and dialects, including

Access and licensing: UDcorpora is released under an open license and freely accessible through the UD website

Usage and significance: The corpora are used to train and evaluate dependency parsers, perform cross-linguistic analyses,

Relation to UD resources: UDcorpora is integral to the UD project, designed to be compatible with UD

reproducibility

a

a

language-specific