softmax

Softmax is a function that converts a vector of real numbers into a probability distribution over discrete classes. For a K-dimensional input z, the i-th output is sigma_i(z) = exp(z_i) / sum_{k=1}^K exp(z_k). The resulting values are nonnegative and sum to 1, making softmax a common final activation in multi-class classification. The two-class case reduces to the logistic function in a higher-dimensional form.

A common variant introduces a temperature parameter T: sigma_i(z; T) = exp(z_i / T) / sum_k exp(z_k / T). Higher

Numerical stability is important in practice. A standard trick is to subtract the maximum input: sigma_i(z) =

Key properties include invariance to adding a constant to all inputs, since exp(z_i + c) scales equally

Limitations include potential miscalibration of predicted probabilities and sensitivity to input scale; appropriate loss functions and

T

a

T

∂sigma_i/∂z_j

=

−

a

=

p

−

p

=

y

−

/

−

classification,

a