melspektrogram
A mel spectrogram, sometimes written as mel-spectrum or melspektrogram, is a time–frequency representation of an audio signal in which the frequency axis is mapped onto the mel scale. The mel scale is a perceptual scale that approximates how humans perceive pitch, placing more resolution at lower frequencies where auditory sensitivity is higher. This makes the mel spectrogram a compact and perceptually relevant alternative to a conventional spectrogram.
To compute a mel spectrogram, the audio signal is divided into short overlapping frames and transformed with
The mel axis is typically composed of a fixed number of bands, commonly ranging from about 40
Applications include speech recognition, speaker identification, music information retrieval, and general audio classification. They serve as