softmax
Softmax is a function that converts a vector of real numbers into a probability distribution over discrete classes. For a K-dimensional input z, the i-th output is sigma_i(z) = exp(z_i) / sum_{k=1}^K exp(z_k). The resulting values are nonnegative and sum to 1, making softmax a common final activation in multi-class classification. The two-class case reduces to the logistic function in a higher-dimensional form.
A common variant introduces a temperature parameter T: sigma_i(z; T) = exp(z_i / T) / sum_k exp(z_k / T). Higher
Numerical stability is important in practice. A standard trick is to subtract the maximum input: sigma_i(z) =
Key properties include invariance to adding a constant to all inputs, since exp(z_i + c) scales equally
Limitations include potential miscalibration of predicted probabilities and sensitivity to input scale; appropriate loss functions and