Home

SilhouetteScore

Silhouette score is a metric used to evaluate the quality of a clustering result. For a given labeling of data into clusters, it assigns each sample a silhouette coefficient between -1 and 1. The coefficient reflects how similar the sample is to its own cluster compared with points in the nearest neighboring cluster. A higher score indicates better clustering, while negative values suggest possible misassignment.

For a sample i in cluster A, a(i) is the average distance from i to all other

Computation typically uses a distance metric, most commonly Euclidean distance, but any metric supported by the

Limitations include sensitivity to density variations and to clusters of different sizes, and it may be less

points
in
A.
For
each
other
cluster
B,
compute
the
average
distance
from
i
to
all
points
in
B;
b(i)
is
the
minimum
of
these
averages
across
all
such
clusters.
The
silhouette
coefficient
for
i
is
s(i)
=
(b(i)
-
a(i))
/
max(a(i),
b(i)).
The
overall
silhouette
score
is
the
mean
of
s(i)
over
all
samples.
Values
close
to
1
indicate
well-separated
clusters,
around
0
indicate
overlapping
clusters
or
ambiguous
assignments,
and
negative
values
indicate
potential
misclassifications.
clustering
method
can
be
used.
The
silhouette
score
can
be
computed
for
a
fixed
number
of
clusters,
and
is
often
used
to
select
a
suitable
k
by
comparing
scores
across
different
k
values.
Per-sample
silhouette
values
can
be
visualized
with
a
silhouette
plot
to
assess
cluster
separation
and
cohesion.
informative
for
non-convex
or
irregularly
shaped
clusters.
The
calculation
is
generally
O(n^2)
in
time
complexity
due
to
pairwise
distances,
making
it
expensive
for
large
datasets.
Implementations
exist
in
many
libraries,
such
as
scikit-learn,
as
silhouette_score
and
silhouette_samples.