Home

FMeasure

F-measure, also known as the F-score, is a metric that combines precision and recall into a single value to evaluate the accuracy of a classifier. The general form is Fβ = (1 + β^2) · (P · R) / (β^2 · P + R), where P is precision and R is recall. The parameter β determines the relative importance of precision versus recall.

Precision and recall are defined as P = TP / (TP + FP) and R = TP / (TP + FN), using

In multiclass and multilabel settings, F-measures are computed using averaging schemes. Micro-averaged F aggregates TP, FP,

Applications and interpretation: F-measure is widely used in information retrieval and machine learning for model comparison,

true
positives
(TP),
false
positives
(FP),
and
false
negatives
(FN).
For
β
=
1,
the
F1
score,
or
harmonic
mean
of
precision
and
recall,
is
F1
=
2PR
/
(P
+
R).
The
F-measure
emphasizes
a
balance
between
correctly
identified
positive
cases
and
the
completeness
of
those
identifications.
and
FN
across
all
classes
to
compute
a
global
precision
and
recall,
then
derives
Fβ
from
those
values.
Macro-averaged
F
computes
the
Fβ
score
for
each
class
in
a
one-vs-rest
fashion
and
then
averages
the
results;
weighted
macro
averaging
uses
class
support
as
weights.
especially
when
class
distribution
is
imbalanced
or
when
precision
and
recall
have
different
costs.
It
is
threshold-dependent
for
binary
classifiers
and
does
not
inherently
account
for
true
negatives.
While
useful
for
balancing
extraction
quality,
it
is
one
of
several
metrics
and
should
be
considered
alongside
other
measures
such
as
accuracy,
AUC,
or
the
precision-recall
curve.
The
F-measure
originated
in
information
retrieval,
with
the
F1
score
commonly
used
as
a
standard
benchmark.