Subcorpora
Subcorpora are subsets of larger corpora selected for specific research questions or criteria. By focusing on a defined portion of data, subcorpora allow researchers to study language use within particular genres, time periods, registers, or linguistic features without the noise of the full corpus.
Subcorpora are created by applying filters or queries to the parent corpus. Selection can be based on
Common uses include estimating frequencies and collocations within a domain, comparing linguistic patterns across genres or
Practitioners must consider representativeness, size, metadata quality, and the potential for selection bias. Subcorpora should be