MFCCs
MFCCs, or mel-frequency cepstral coefficients, are a widely used feature representation in speech and audio processing. They capture the spectral envelope of a sound by mapping the power spectrum onto a perceptually motivated mel scale and then decorrelating the result with a discrete cosine transform.
The computation of MFCCs typically involves several steps. The audio signal is pre-emphasized and divided into
Extensions and variants include adding delta and delta-delta (temporal derivative) features to capture dynamics, and applying
MFCCs are standard inputs for automatic speech recognition, speaker identification, and many audio classification tasks. They