Home

RNNs

Recurrent neural networks (RNNs) are a class of neural networks designed to process sequential data by maintaining a hidden state that carries information across time. At each time step t, the network receives an input x_t and updates its hidden state h_t based on the previous state h_{t-1} and the current input. The same parameters are applied at every time step, enabling the network to handle sequences of varying length and to model temporal dependencies.

The standard or vanilla RNN computes h_t = φ(W_hh h_{t-1} + W_xh x_t + b_h), where φ is an elementwise

Plain RNNs face challenges with long-range dependencies due to vanishing and exploding gradients, limiting their ability

Variants include bidirectional RNNs, which process sequences in both forward and backward directions, and encoder–decoder or

RNNs originated in the 1990s, with early work by Elman and the introduction of backpropagation through time.

activation
such
as
tanh
or
ReLU.
The
output
at
step
t
is
typically
y_t
=
W_hy
h_t
+
b_y.
During
training,
the
network
is
unrolled
in
time
and
optimized
with
backpropagation
through
time
(BPTT),
which
propagates
gradients
across
many
time
steps.
to
capture
information
from
distant
inputs.
To
mitigate
this,
gated
architectures
such
as
the
Long
Short-Term
Memory
(LSTM)
and
the
Gated
Recurrent
Unit
(GRU)
were
developed.
They
introduce
memory
cells
and
gates
to
regulate
the
flow
of
information.
sequence-to-sequence
models,
where
an
encoder
maps
an
input
sequence
to
a
fixed
representation
and
a
decoder
generates
an
output
sequence.
These
configurations
underpin
many
applications
in
natural
language
processing,
speech
recognition,
time-series
forecasting,
and
video
analysis.
They
remain
a
foundational
tool
for
sequence
modeling,
though
transformers
have
largely
supplanted
them
in
large-scale
tasks
due
to
better
parallelization
and
performance.