Home

Outcomepurity

Outcomepurity is a metric used in data analysis to quantify the homogeneity of outcomes within predefined groups, such as clusters, bins, or leaves of a decision tree. The premise is that a group with a single dominant outcome is more predictable, whereas mixed outcomes reduce reliability.

Definition and calculation: For a dataset partitioned into K groups G1 to GK, and m possible outcome

Interpretation and relation to other measures: OC reflects the average probability that a randomly chosen item

Applications and limitations: It is used to evaluate and compare partitions produced by clustering, binning, or

Related metrics: purity, Gini impurity, and entropy. Outcomepurity is a conceptual measure with varying informal definitions

classes,
let
p_{i,c}
be
the
proportion
of
items
in
group
i
with
outcome
c.
The
group
purity
is
max_c
p_{i,c}.
Outcomepurity
is
defined
as
OC
=
sum_{i=1}^K
(|Gi|/N)
*
max_c
p_{i,c},
where
N
is
the
total
number
of
items.
The
value
ranges
from
1/m
(completely
mixed
within
every
group)
up
to
1
(each
group
contains
only
one
outcome).
from
a
group
belongs
to
the
group’s
majority
outcome.
Higher
values
indicate
greater
within-group
predictability.
Outcomepurity
is
related
to
traditional
purity
measures
used
in
clustering
and
classification,
but
it
emphasizes
outcome
homogeneity
within
partitions
rather
than
overall
error
rates.
decision-tree
pruning.
It
can
guide
feature
engineering
by
favoring
splits
that
increase
within-group
outcome
consistency.
Limitations
include
sensitivity
to
group
size,
potential
bias
with
imbalanced
classes,
and
a
focus
on
within-group
homogeneity
that
may
ignore
global
performance.
The
metric
also
depends
on
how
the
data
are
partitioned,
so
it
should
be
used
alongside
other
performance
indicators.
in
practice.