Home

tSNE

tSNE, short for t-Distributed Stochastic Neighbor Embedding, is a nonlinear dimensionality reduction technique used primarily for data visualization. Developed by Laurens van der Maaten and Geoffrey Hinton in 2008, it aims to preserve local structure by modeling pairwise similarities between points in high-dimensional space and in a low-dimensional embedding.

In the high-dimensional space, the algorithm converts the distances between data points into conditional probabilities that

tSNE is commonly preceded by dimensionality reduction (often PCA) to reduce noise and improve speed. The method

Applications of tSNE span diverse domains, including visualization of image feature vectors, word or document representations,

reflect
neighborhood
relationships,
using
a
Gaussian
distribution
with
a
per-point
bandwidth
tuned
by
a
parameter
called
perplexity.
The
low-dimensional
embedding
uses
a
Student
t-distribution
to
compute
similar
pairwise
similarities,
which
helps
to
alleviate
the
crowding
problem
by
allowing
moderate–to–large
distances
to
be
represented
more
effectively.
The
two
sets
of
similarities
are
compared
via
a
cost
function
based
on
the
Kullback–Leibler
divergence,
and
the
embedding
is
found
by
gradient
descent
with
momentum
and
optional
early
exaggeration
to
form
tight
clusters
early
in
optimization.
is
computationally
intensive
for
large
datasets,
but
scalable
variants
such
as
Barnes–Hut
tSNE
reduce
complexity
substantially.
Other
extensions
include
parametric
tSNE,
which
learns
a
neural
network
to
map
new
data
into
the
embedding
space,
and
faster
implementations
such
as
FIt-SNE.
and
single-cell
RNA
sequencing
data.
Limitations
include
sensitivity
to
hyperparameters
(notably
perplexity
and
initialization),
non-deterministic
results,
difficulty
in
interpreting
global
structure,
and
the
need
to
re-compute
embeddings
for
new
data
in
non-parametric
variants.