Home

Logmel

Logmel, short for log-mel spectrogram, is a time-frequency representation used in audio processing and machine learning. It is derived from an audio waveform by computing a spectrogram with a mel frequency scale and applying logarithmic compression to the magnitude (or power) values. The result highlights perceptually relevant energy patterns across time and frequency while reducing dynamic range.

Computation typically proceeds as follows: compute the short-time Fourier transform to obtain a magnitude spectrum; apply

Common parameter choices include a sampling rate of 16 kHz or 22.05 kHz, a window length of

Logmel features are widely used as inputs to neural networks for tasks such as automatic speech recognition,

a
mel
filter
bank
to
warp
frequencies
onto
the
mel
scale,
producing
a
mel-spectrogram;
apply
log
compression,
often
using
log(x
+
epsilon),
to
obtain
the
log-mel
spectrogram.
The
final
output
is
a
two-dimensional
matrix
with
time
frames
along
one
axis
and
mel
bands
along
the
other.
about
20–25
milliseconds,
a
hop
length
around
10
milliseconds,
and
a
number
of
mel
bands
typically
in
the
range
of
40–128.
The
mel
filter
bank
is
often
defined
over
a
frequency
range
from
0
up
to
either
a
chosen
max
frequency
or
the
Nyquist
limit,
depending
on
the
application.
The
exact
design
influences
perceptual
fidelity
and
downstream
performance.
speaker
identification,
and
music
tagging.
They
are
related
to
MFCCs,
which
are
derived
by
applying
a
discrete
cosine
transform
to
the
log-mel
energies;
thus
logmel
preserves
more
local
spectral
information
suitable
for
end-to-end
modeling.
Common
implementations
are
available
in
libraries
such
as
Librosa
and
torchaudio.