Home

betweencluster

Betweencluster refers to the set of concepts and measures that quantify differences or separations between clusters in a dataset. It is used in cluster analysis to assess how distinct clusters are from one another, as opposed to within-cluster cohesion.

In standard partitioning methods, the between-cluster dispersion is formalized through the between-cluster sum of squares (SSB).

Applications of betweencluster measures include evaluating cluster separation, guiding the choice of the number of clusters

Limitations include sensitivity to the chosen distance metric, scaling, and cluster sizes; high-dimensional data can diminish

If
a
dataset
is
partitioned
into
K
clusters
with
centroids
c1,
…,
cK
and
an
overall
mean
m,
then
SSB
=
sum_{k=1}^K
n_k
||c_k
−
m||^2,
where
n_k
is
the
number
of
observations
in
cluster
k.
The
within-cluster
sum
of
squares
(SSW)
is
SSW
=
sum_k
sum_{x
in
cluster
k}
||x
−
c_k||^2,
and
SST
=
SSB
+
SSW,
where
SST
is
the
total
sum
of
squares.
Distances
between
cluster
centroids,
such
as
||c_i
−
c_j||,
also
quantify
between-cluster
separation.
In
linear
discriminant
analysis,
a
related
concept
appears
as
the
between-class
scatter
matrix
SB,
which
captures
separation
between
classes
or
clusters
in
a
feature
space.
(e.g.,
favoring
partitions
with
larger
SSB
relative
to
SST),
and
informing
visualization
and
interpretation
of
clustering
results.
They
are
commonly
used
alongside
within-cluster
metrics
like
WCSS
or
silhouette
scores.
the
informativeness
of
distances,
and
non-spherical
or
overlapping
clusters
may
yield
misleading
between-cluster
assessments.
See
also
within-cluster
dispersion,
cluster
validity
indices,
and
centroid-based
distance
measures.