Home

PixelRNN

PixelRNN is an autoregressive generative model for images introduced in 2016 by Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. It treats an image as a sequence of pixels and models the joint distribution P(X) as the product of conditional distributions P(xi | x1, x2, ..., x(i-1)) in raster-scan order. The approach aims to capture long-range dependencies in both spatial directions, enabling the generation of high-fidelity samples and providing a tractable density estimate for natural images.

The architecture relies on two-dimensional recurrent networks to propagate information across the image grid. It uses

Training and generation are performed by maximum likelihood estimation, using backpropagation through time to train the

Impact and scope: PixelRNN contributed to the development of autoregressive density estimators for images, influencing later

vertical
and
horizontal
recurrent
streams
(often
implemented
as
LSTM-based
units)
that
move
along
columns
and
rows,
respectively,
so
that
each
pixel
is
conditioned
on
all
previously
visited
pixels
to
its
left
and
above.
The
model
outputs
a
discrete
distribution
over
possible
pixel
values
for
each
location,
with
conditioning
that
can
be
applied
across
color
channels.
For
color
images,
the
conditional
distribution
is
typically
factorized
across
channels
or
modeled
jointly
in
a
multi-channel
output.
recurrent
connections.
Because
sampling
proceeds
pixel
by
pixel
in
raster
order,
generation
can
be
relatively
slow
compared
to
non-autoregressive
models.
PixelRNN
has
inspired
subsequent
variants
and
improvements,
including
PixelRNN++,
which
refines
the
architecture
and
training
for
better
efficiency
and
sample
quality,
and
it
stands
alongside
PixelCNN
as
a
foundational
approach
in
autoregressive
image
modeling.
work
in
PixelRNN++,
PixelCNN,
and
related
density-based
generative
models
for
natural
images.