Home

VAE

A variational autoencoder (VAE) is a generative model that combines neural networks with variational Bayesian methods to learn a latent representation of data. It aims to model the underlying distribution of observable data by introducing latent variables and learning to generate new samples from the learned distribution.

In a VAE, an encoder network maps input x to a distribution over latent variables z, typically

Training maximizes the evidence lower bound (ELBO) on the marginal log-likelihood log p(x). The ELBO comprises

Background: Variational autoencoders were introduced by Kingma and Welling in 2013 as a scalable approach to

Variants and extensions include beta-VAE for improved disentanglement (by weighting the KL term), conditional VAE (CVAE)

Applications include image and audio generation, unsupervised representation learning, and semi-supervised tasks. Limitations include blurry samples

q(z|x)
parameterized
as
a
diagonal
Gaussian
with
learned
mean
and
variance.
A
decoder
network
then
samples
z
from
q(z|x)
and
reconstructs
x
via
p(x|z).
The
prior
over
latent
variables,
p(z),
is
usually
a
standard
normal.
a
reconstruction
term
E_{q(z|x)}[log
p(x|z)]
and
a
regularization
term
-KL(q(z|x)
||
p(z)).
Because
direct
backprop
through
sampling
is
not
differentiable,
the
reparameterization
trick
expresses
z
as
z
=
mu
+
sigma
*
epsilon
with
epsilon
~
N(0,
I),
enabling
gradient-based
optimization.
variational
inference
with
neural
networks.
for
labeled
conditioning,
and
various
architectural
tweaks
such
as
more
powerful
decoders.
Discrete
latent
VAEs
and
VQ-VAE
are
related
approaches.
with
simple
decoders
and
issues
like
posterior
collapse,
where
the
encoder
ignores
the
latent
code.
Mitigation
strategies
include
model
design
choices,
KL
annealing,
and
more
expressive
priors
and
decoders.