Melspektrogrammipohjaiset
Melspektrogram is a representation of the spectrum of a signal, such as audio, as it changes over time, with the frequency axis scaled according to the Mel scale. The Mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another. This scale is non-linear, meaning that equal distances on the Mel scale correspond to increasingly larger distances in actual frequency as frequency increases. This non-linearity is designed to better approximate human auditory perception, where our sensitivity to frequency changes is less precise at higher frequencies.
The process of creating a melspektrogram typically involves several steps. First, the audio signal is divided