WaveRNN
WaveRNN is a neural vocoder designed for speech synthesis and audio generation. It models raw audio waveforms autoregressively, conditioned on acoustic features such as mel-spectrograms. Unlike WaveNet’s dilated convolutions, WaveRNN uses a recurrent neural network core, which reduces computational load and memory usage and enables real-time synthesis on standard hardware. The model typically consists of a small recurrent network that processes upsampled conditioning frames and an autoregressive sampler that generates each audio sample conditioned on previous samples and the current features.
During training, WaveRNN learns to maximize the likelihood of the observed waveform given the conditioning sequence.
In evaluations, WaveRNN has demonstrated competitive audio quality with substantially lower computational cost and faster generation