minicorpus
A minicorpus is a small, representative subset of a larger text corpus designed for rapid experimentation, teaching, and benchmarking. It aims to preserve essential linguistic properties and thematic coverage of the full collection while remaining manageable in size for quick processing.
Size and scope characteristics vary by field, but a minicorpus typically contains tens of thousands to a
Construction and use: Define the research or teaching objective, select candidate sources, sample texts, and remove
Limitations and caveats: Because of its small size, a minicorpus may introduce sampling bias and may not