Stimmsynthese
Stimmsynthese, also known as speech synthesis or text-to-speech (TTS), is a technology that artificially generates human speech. It involves converting written text into spoken words. The process typically begins with text normalization, where abbreviations, numbers, and punctuation are converted into their spoken equivalents. This is followed by phonetization, where the normalized text is broken down into phonetic representations. Finally, prosody generation and waveform synthesis create the actual audio output.
There are two main approaches to stimmsynthese: concatenative and parametric. Concatenative synthesis uses pre-recorded speech units,
Recent advancements in deep learning, particularly neural networks, have significantly improved the quality and naturalness of