macroF1

Macro F1, often written as macro F1 score, is an evaluation metric used to assess the performance of multi-class and multilabel classifiers. It measures the harmonic mean of precision and recall for each class, and then averages these per-class F1 scores to provide a single overall score that treats all classes equally.

To compute macro F1, for each class c, compute true positives, false positives, and false negatives using

Macro F1 is particularly useful when class distributions are imbalanced, as it gives equal weight to each

Implementation and interpretation notes: In common libraries, macro F1 is produced by average='macro' in scikit-learn, or

a

=

/

+

=

/

+

c

=

2

*

*

/

+

+

>

=

=

K

misclassifications