Home

Gumbelsoftmax

The Gumbel-softmax, also known as the Concrete distribution in some contexts, is a differentiable relaxation of a categorical distribution that enables gradient-based optimization when working with discrete variables. It provides a continuous approximation to sampling from a finite set of categories, which is useful for neural networks trained with backpropagation.

Mechanism and formulation: Given a vector of unnormalized log-probabilities (logits) z = (z1, ..., zk) for a categorical

Relation to sampling tricks: The approach is closely related to the Gumbel-Max trick, which uses argmax of

Applications and limitations: It is used for training models with discrete latent variables, variational autoencoders with

variable,
one
samples
independent
Gumbel
noise
gi
~
Gumbel(0,
1)
for
each
category
and
computes
y_i
=
softmax((z_i
+
g_i)
/
tau),
where
tau
>
0
is
a
temperature
parameter.
The
resulting
vector
y
lies
in
the
simplex
and
is
differentiable
with
respect
to
z.
As
tau
approaches
zero,
the
distribution
of
y
becomes
increasingly
peaked
and
approaches
a
one-hot
vector,
approximating
a
discrete
sample.
For
larger
tau,
the
output
is
smoother
and
more
diffuse.
z_i
+
g_i
to
obtain
a
discrete
category.
The
Gumbel-softmax
replaces
the
non-differentiable
argmax
with
a
differentiable
softmax,
allowing
gradient
flow
during
training.
A
common
variant
is
the
straight-through
estimator,
which
uses
a
discrete
sample
in
the
forward
pass
but
preserves
a
differentiable
path
in
the
backward
pass.
categorical
latents,
differentiable
neural
architecture
search,
and
reinforcement
learning
with
discrete
actions.
Limitations
include
bias
from
the
relaxation
when
tau
is
not
very
small,
sensitivity
to
temperature
scheduling,
and
potential
gradient
variance.
Related
concepts
include
the
Gumbel
distribution,
the
reparameterization
trick,
and
the
Concrete
distribution.