accuracywhether
Accuracywhether is a proposed evaluation metric used to quantify the performance of systems that provide yes/no answers or yes/no components within broader queries. It focuses on how often a model’s predicted binary response matches the ground truth, providing a straightforward measure of binary decision accuracy.
Computation and variants: For a test set of N instances, each with a true label t_i in
Origins and usage: The term accuracywhether has appeared in informal discussions and some methodological write-ups as
Applications: Accuracywhether is applicable to evaluating chatbots, information retrieval systems that return binary decisions, medical decision-support
Limitations: As a binary-only metric, it does not capture calibration, confidence, or the distribution of errors
Example: A model answers 85 of 100 yes/no questions correctly, yielding an accuracywhether of 0.85.