CBHG
CBHG is a neural network module used in sequence modeling, particularly in neural speech synthesis. The name derives from its three main components: a Convolutional Bank, Highway networks, and a Gated Recurrent Unit (often bidirectional). The module is designed to convert input sequences such as characters, phonemes, or acoustic features into rich, high-level representations suitable for downstream decoding.
The Convolutional Bank consists of a set of one-dimensional convolutions with multiple kernel widths applied to
CBHG modules are used as feature extractors in Tacotron and related text-to-speech architectures. In Tacotron, a
Overall, the CBHG design combines multi-scale local feature extraction with gated information flow and temporal modeling