Home

ReLU

ReLU, short for rectified linear unit, is a widely used activation function in artificial neural networks. It is defined as f(x) = max(0, x), meaning it outputs x for positive inputs and 0 for negative inputs. The function is piecewise linear, and its derivative is 1 for x > 0 and 0 for x < 0; at x = 0 the derivative is not uniquely defined, but subgradients are typically used in training.

ReLU offers several practical advantages. It is computationally simple and fast to evaluate, which helps training

Despite its benefits, ReLU has limitations. Neurons can become “dead” if they consistently receive negative inputs,

Variants of ReLU address these issues. Leaky ReLU introduces a small slope for negative inputs; Parametric

ReLU remains a standard choice in many neural network designs and continues to be a baseline activation

efficiency.
It
introduces
sparsity
in
activations,
as
many
neurons
output
zero
for
negative
inputs.
For
positive
inputs,
it
preserves
the
gradient
and
acts
like
the
identity
function,
helping
to
mitigate
the
vanishing
gradient
problem
in
deep
networks.
ReLU
has
contributed
to
faster
convergence
in
many
architectures,
including
convolutional
neural
networks
and
deep
feedforward
networks,
and
is
commonly
paired
with
initialization
schemes
such
as
He
initialization.
yielding
zero
output
and
zero
gradient,
which
can
hinder
learning.
Because
it
is
not
differentiable
at
zero,
training
relies
on
subgradients.
In
some
settings,
the
nonlinearity
can
be
less
robust
to
certain
data
distributions,
prompting
exploration
of
alternative
activations.
ReLU
(PReLU)
learns
the
slope
during
training;
Randomized
ReLU
(RReLU)
uses
stochastic
slopes
during
training.
Other
related
activations,
such
as
ELU
and
SELU,
offer
different
trade-offs
but
are
not
ReLU
variants
per
se.
in
research
and
applications.