Home

macroF1

Macro F1, often written as macro F1 score, is an evaluation metric used to assess the performance of multi-class and multilabel classifiers. It measures the harmonic mean of precision and recall for each class, and then averages these per-class F1 scores to provide a single overall score that treats all classes equally.

To compute macro F1, for each class c, compute true positives, false positives, and false negatives using

Macro F1 is particularly useful when class distributions are imbalanced, as it gives equal weight to each

Implementation and interpretation notes: In common libraries, macro F1 is produced by average='macro' in scikit-learn, or

a
one-vs-rest
approach.
Precision_c
=
TP_c
/
(TP_c
+
FP_c),
Recall_c
=
TP_c
/
(TP_c
+
FN_c).
If
the
denominator
is
zero,
the
corresponding
value
is
defined
as
zero.
The
F1
for
class
c
is
F1_c
=
2
*
Precision_c
*
Recall_c
/
(Precision_c
+
Recall_c)
when
(Precision_c
+
Recall_c)
>
0;
otherwise
F1_c
=
0.
Macro
F1
is
the
average
of
F1_c
over
all
classes:
MacroF1
=
(1/K)
sum_c
F1_c,
where
K
is
the
number
of
classes.
class
rather
than
to
each
example.
It
complements
other
metrics
such
as
micro
F1,
which
aggregates
contributions
across
all
classes
before
computing
the
F1
score,
and
weighted
F1,
which
weights
class
F1
scores
by
class
support.
similar
options
in
other
frameworks.
While
easy
to
compute
and
interpretable,
macro
F1
can
be
influenced
by
rare
classes
with
unstable
estimates
and
does
not
capture
the
cost
or
importance
of
different
misclassifications
unless
explicitly
weighted.