Logmel

Logmel, short for log-mel spectrogram, is a time-frequency representation used in audio processing and machine learning. It is derived from an audio waveform by computing a spectrogram with a mel frequency scale and applying logarithmic compression to the magnitude (or power) values. The result highlights perceptually relevant energy patterns across time and frequency while reducing dynamic range.

Computation typically proceeds as follows: compute the short-time Fourier transform to obtain a magnitude spectrum; apply

Common parameter choices include a sampling rate of 16 kHz or 22.05 kHz, a window length of

Logmel features are widely used as inputs to neural networks for tasks such as automatic speech recognition,

a

a

mel-spectrogram;

+

a

two-dimensional

a

a

a

0

a

identification,

a

implementations