sparsemax - Infinite Lexicon - Infinite Lexicon

sparsemax

Sparsemax is a nonlinear activation and a sparse counterpart to softmax, introduced by André F. Martins and Francisco J. Astudillo in 2016. It is designed to produce probability-like outputs that are often sparse, meaning many components can be exactly zero. This makes sparsemax useful in neural networks and attention mechanisms where it is desirable to focus on a subset of elements.

Mathematically, for an input vector z in R^K, sparsemax(z) is defined as the Euclidean projection of z

sparsemax(z) = argmin_p ||p − z||^2 subject to p ≥ 0 and sum(p) = 1.

Equivalently, the result is a vector p with p_i = max(z_i − τ, 0) for all i, where τ is

Properties and usage: compared with softmax, sparsemax yields sparse outputs, which can enforce selectivity in attention

a

τ

z

k

>

−

τ

=

−

=

−

τ,

a

backpropagation,

a

>

a