äänikorpora
Äänikorpora, literally “audio corpora” in Finnish, are collections of recorded speech and related annotations used in linguistics and speech technology. They combine sound recordings with time-aligned transcripts and often multiple annotation layers to enable phonetic, prosodic, and pragmatic analyses as well as the development of automatic speech recognition (ASR) and speech synthesis systems.
A typical äänikorpora includes audio data (commonly WAV or FLAC), orthographic transcripts, and alignment information such
Creation and rights: materials are gathered with informed consent and managed under data protection and privacy
Uses: äänikorpora support research on Finnish phonetics and prosody, dialectology, language documentation, and the development and
Format and standards: common media are lossless audio alongside human- or machine-readable annotations. Alignments are time-stamped
Size and availability: äänikorpora range from tens of hours to thousands of hours of speech, depending on