sparsemax
Sparsemax is a nonlinear activation and a sparse counterpart to softmax, introduced by André F. Martins and Francisco J. Astudillo in 2016. It is designed to produce probability-like outputs that are often sparse, meaning many components can be exactly zero. This makes sparsemax useful in neural networks and attention mechanisms where it is desirable to focus on a subset of elements.
Mathematically, for an input vector z in R^K, sparsemax(z) is defined as the Euclidean projection of z
sparsemax(z) = argmin_p ||p − z||^2 subject to p ≥ 0 and sum(p) = 1.
Equivalently, the result is a vector p with p_i = max(z_i − τ, 0) for all i, where τ is
Properties and usage: compared with softmax, sparsemax yields sparse outputs, which can enforce selectivity in attention