Tõlkeandmed
Tõlkeandmed refers to the data used in the field of machine translation, particularly in the context of neural machine translation (NMT). It encompasses various forms of linguistic information that train and evaluate translation models. The most common type of tõlkeandmed is parallel corpora, which consist of pairs of sentences or documents that are translations of each other. These parallel texts are essential for supervised NMT models, as they provide the input-output examples the model learns from.
Beyond parallel corpora, tõlkeandmed can also include monolingual data, which is text in a single language.
The quality and quantity of tõlkeandmed significantly impact the performance of machine translation systems. Datasets that