WaveGlow - Infinite Lexicon - Infinite Lexicon

WaveGlow

WaveGlow is a flow-based generative model for speech synthesis developed by NVIDIA. Introduced in 2018, it functions as a neural vocoder that converts mel-spectrograms into time-domain audio waveforms. The model combines ideas from Glow, a flow-based framework using invertible 1x1 convolutions and affine coupling layers, with concepts from WaveNet to model complex audio distributions. WaveGlow enables high-quality speech synthesis without autoregressive sampling during generation.

Technically, WaveGlow uses a sequence of invertible 1x1 convolutions to permute data channels and a stack of

In a text-to-speech pipeline, WaveGlow serves as the vocoder component that converts predicted mel-spectrograms into waveform.

Limitations include substantial computational and memory requirements for training, as well as potential artifacts or noise

a

a

mel-spectrogram.

a

a

transformations

a

mel-spectrograms

a

implementations

hyperparameters.