Activations
Activation functions are mathematical functions applied to the output of a neuron in a neural network. They introduce nonlinearity, enabling networks to model complex, non-linear relationships. An activation is computed by applying a nonlinear function to the weighted sum of inputs plus a bias. The function’s properties influence training dynamics, gradient flow, and the ability to approximate functions.
Common activation functions include:
- Sigmoid: maps inputs to the range (0, 1); smooth but can suffer from vanishing gradients for extreme
- Tanh: maps to (-1, 1); zero-centered but also susceptible to gradient vanishing in some regimes.
- ReLU (Rectified Linear Unit): max(0, x); fast and simple but can produce dead neurons for negative
- Leaky ReLU and PReLU: allow a small, nonzero gradient for negative inputs to mitigate dead neurons.
- ELU and SELU: provide a smooth negative region, helping gradient flow and self-normalization in some networks.
- GELU and Swish: smooth, often providing improved performance on deep models.
Softmax is a special case used in the output layer for multi-class classification, converting a vector of
The choice of activation affects learning dynamics, including gradient propagation and convergence speed. Practical considerations include
In summary, activation functions are a core component of neural networks, shaping how information flows and