Home

Activations

Activation functions are mathematical functions applied to the output of a neuron in a neural network. They introduce nonlinearity, enabling networks to model complex, non-linear relationships. An activation is computed by applying a nonlinear function to the weighted sum of inputs plus a bias. The function’s properties influence training dynamics, gradient flow, and the ability to approximate functions.

Common activation functions include:

- Sigmoid: maps inputs to the range (0, 1); smooth but can suffer from vanishing gradients for extreme

- Tanh: maps to (-1, 1); zero-centered but also susceptible to gradient vanishing in some regimes.

- ReLU (Rectified Linear Unit): max(0, x); fast and simple but can produce dead neurons for negative

- Leaky ReLU and PReLU: allow a small, nonzero gradient for negative inputs to mitigate dead neurons.

- ELU and SELU: provide a smooth negative region, helping gradient flow and self-normalization in some networks.

- GELU and Swish: smooth, often providing improved performance on deep models.

Softmax is a special case used in the output layer for multi-class classification, converting a vector of

The choice of activation affects learning dynamics, including gradient propagation and convergence speed. Practical considerations include

In summary, activation functions are a core component of neural networks, shaping how information flows and

values.
inputs.
scores
into
a
probability
distribution.
initialization,
normalization,
and
architecture
design
(e.g.,
residual
connections)
to
stabilize
activations
across
layers.
There
is
also
interest
in
adaptive
or
learnable
activation
functions,
where
parameters
are
tuned
during
training
for
potentially
better
performance.
how
well
the
model
can
learn
complex
mappings
from
data.