Activations

Activation functions are mathematical functions applied to the output of a neuron in a neural network. They introduce nonlinearity, enabling networks to model complex, non-linear relationships. An activation is computed by applying a nonlinear function to the weighted sum of inputs plus a bias. The function’s properties influence training dynamics, gradient flow, and the ability to approximate functions.

Common activation functions include:

- Sigmoid: maps inputs to the range (0, 1); smooth but can suffer from vanishing gradients for extreme

- Tanh: maps to (-1, 1); zero-centered but also susceptible to gradient vanishing in some regimes.

- ReLU (Rectified Linear Unit): max(0, x); fast and simple but can produce dead neurons for negative

- Leaky ReLU and PReLU: allow a small, nonzero gradient for negative inputs to mitigate dead neurons.

- ELU and SELU: provide a smooth negative region, helping gradient flow and self-normalization in some networks.

- GELU and Swish: smooth, often providing improved performance on deep models.

Softmax is a special case used in the output layer for multi-class classification, converting a vector of

The choice of activation affects learning dynamics, including gradient propagation and convergence speed. Practical considerations include

In summary, activation functions are a core component of neural networks, shaping how information flows and

a

initialization,