ClbQs
ClbQs is a benchmark framework for evaluating AI systems on questions that require multi-step reasoning across diverse domains. The term denotes a family of clustered question sets designed to test generalization and problem-solving beyond single-fact recall. Each collection is organized into clusters that target specific domains or reasoning patterns.
Structure and content: Within a cluster, questions are grouped by topic and ordered by difficulty. Many items
Creation and curation: ClbQs is typically assembled from a mix of public-domain sources and vetted textbooks
Evaluation and usage: Researchers benchmark QA systems with ClbQs, reporting metrics such as accuracy and, when
History and impact: The concept emerged in AI evaluation literature in the early 2020s and has since
Limitations and related benchmarks: Like other datasets, ClbQs faces biases, coverage gaps, and risks of overfitting