treebank
A treebank is a linguistics resource—a corpus of text that has been annotated with syntactic structure. In most treebanks, sentences are annotated with either constituency parse trees or dependency graphs, often accompanied by part-of-speech tags, named entities, morphological features, and sometimes semantic roles. Treebanks are used to study syntax and to train and evaluate parsing algorithms.
The best known example is the Penn Treebank (PTB), created at the University of Pennsylvania and the
Treebanks differ in their annotation schemes, domains, sizes, and licensing. PTB uses bracketed constituency representations; PDT
Treebanks are created through manual annotation guided by formal guidelines, often aided by automatic pre-annotation followed