Melspectrogram
A melspectrogram is a time-frequency representation of audio in which the frequency axis is mapped to the mel scale, a perceptual scale that spaces frequencies more densely at lower ranges to reflect human hearing. It is usually obtained by applying a mel-filter bank to the magnitude or power spectrum of the signal, so that each mel-bin aggregates energy from a range of Fourier components. The result is a two-dimensional array with time on one axis and mel bands on the other, representing the evolving energy distribution across perceptual frequency bands.
Compute it typically by first extracting a short-time Fourier transform (STFT) of the audio, optionally applying
Variants include the magnitude or power mel-spectrogram and the log-mel spectrogram, with choices about the number
Limitations include dependence on sampling rate and filter-bank design, potential loss of spectral details not well