Home

Randindex

Randindex, or Rand index, is a measure of the similarity between two data clusterings of the same dataset. It assesses how well two partitions agree on the grouping of items, by considering all pairs of items and counting those that are either clustered together in both partitions or clustered apart in both partitions. The index ranges from 0 to 1, with 1 indicating perfect agreement.

To compute the Rand index, consider all unordered pairs of items. Classify each pair into four categories:

Limitations include its sensitivity to chance: even random labelings can yield nonzero values, especially with many

The Rand index was introduced by William M. Rand in 1971 and remains a foundational tool for

a
=
pairs
placed
in
the
same
cluster
in
both
partitions;
b
=
pairs
placed
in
different
clusters
in
both
partitions;
c
=
pairs
placed
in
the
same
cluster
in
the
first
partition
but
in
different
clusters
in
the
second;
d
=
pairs
placed
in
different
clusters
in
the
first
partition
but
in
the
same
cluster
in
the
second.
Let
N
be
the
total
number
of
pairs,
N
=
C(n,
2).
The
Rand
index
is
R
=
(a
+
b)
/
(a
+
b
+
c
+
d)
=
(a
+
b)
/
N.
A
related
computation
can
be
performed
using
the
contingency
table
of
the
two
clusterings,
but
the
pair-counting
formulation
is
most
straightforward.
clusters.
Therefore,
the
Adjusted
Rand
Index
(ARI)
is
often
preferred,
as
it
corrects
for
chance
and
yields
a
value
of
1
for
perfect
agreement,
near
0
for
random
labeling,
and
can
be
negative
in
some
cases.
evaluating
clustering
results.
It
is
widely
used
in
machine
learning,
bioinformatics,
image
analysis,
and
related
fields
to
compare
clustering
outputs
to
ground
truth
or
to
another
clustering.