beszédkorpuszok
Beszédkorpuszok, Hungarian for speech corpora, are structured collections of recorded spoken language that are used for linguistic research, language technology development, and educational purposes. Each corpus typically contains audio recordings accompanied by transcriptions, phonetic annotations, and metadata such as speaker identity, gender, age, and speaking context. The most common types of speech corpora include read speech, in which speakers read prepared texts, conversational speech, comprising spontaneous dialogues between participants, and broadcast corpora that document radio, television, and online media content.
The earliest Hungarian speech corpora emerged in the 1990s, with the Hungarian Speech Corpus (HSC) providing
Collection methods vary: some corpora are recorded in controlled studio environments using high‑fidelity microphones, while others
The continued development and refinement of beszédkorpuszok support advancements in speech technology, linguistic theory, and language