treebanks
Treebanks are linguistically annotated corpora in which sentences are paired with syntactic structure representations. Broadly, they come in two traditions: constituency treebanks, which mark hierarchical phrase structure (such as NP, VP, and S), and dependency treebanks, which encode head–dependent relations between words. Some projects provide both views for the same data. Treebanks are foundational resources in computational linguistics and natural language processing, enabling systematic study of syntax and training of parsing models.
The creation of a treebank typically involves manual annotation guided by formal schemes or guidelines. Annotators
Treebanks serve multiple purposes. They provide training data for syntactic parsers, serve as benchmarks for evaluating
Notable examples include the Penn Treebank for English constituency syntax, the Chinese Treebank, the Prague Dependency