BenchmarkingBias
BenchmarkingBias is a form of evaluation bias that arises when the process of benchmarking—such as the selection of benchmarks, datasets, metrics, and evaluation procedures—systematically distorts judgments about a system's capabilities. It can lead to overestimation or underestimation of performance relative to real-world tasks and to unfair comparisons between competing approaches.
Causes include selective benchmark choice that favors certain methods, overfitting to benchmarks through extensive tuning, data
Consequences include distorted conclusions about relative strengths, poor generalization to new data or domains, reduced reproducibility,
Examples: in machine learning, models tuned to maximize performance on a fixed benchmark may underperform on
Mitigation strategies include using diverse and representative benchmarks, preregistering evaluation plans, holding out true test sets,