Korpusteadust
Korpusteadust, corresponding to corpus linguistics, is a field within linguistics that studies language by analyzing large collections of authentic texts and spoken transcripts, known as corpora. It aims to reveal patterns of usage, frequency of forms, and the distribution of linguistic features across genres, registers, and time periods.
Core methods involve designing and building corpora with explicit sampling criteria, annotating texts with part-of-speech tags,
Historically, korpusteadust emerged with early computer-assisted text analysis in the mid-20th century and grew through large,
Applications span lexicography, language teaching, sociolinguistics, and natural language processing, including language model evaluation and search