Tacotron - Infinite Lexicon - Infinite Lexicon

Tacotron

Tacotron is a neural network architecture for end-to-end speech synthesis developed by researchers at Google. Introduced in 2017, it aims to convert text input directly into natural-sounding speech by predicting intermediate acoustic representations, typically mel-spectrograms, that are then converted into waveforms by a vocoder.

Tacotron is a sequence-to-sequence model with attention. The encoder converts input text (characters or phonemes) into

Training uses paired text-audio data, optimizing a loss that includes the mel-spectrogram error and postnet refinement

Tacotron inspired subsequent developments, most notably Tacotron 2, which integrates a revised encoder–decoder with a more

Limitations include reliance on large datasets for training, potential mispronunciations or misalignments, and the computational intensity

a

a

a

mel-spectrogram

A

a

a

mel-spectrogram,

a

a

a

a

a

a