Korpusbias
Korpusbias refers to systematic distortions or imbalances present within a language corpus. A corpus is a large, structured collection of texts, often used for linguistic research, natural language processing, and language model training. Korpusbias arises when the texts included in the corpus do not accurately reflect the full diversity and representativeness of a language as it is actually used or as it should be represented.
These biases can manifest in various ways. For instance, a corpus might overrepresent formal written language,
The consequences of korpusbias can be significant. When language models are trained on biased corpora, they