wordalignment
Word alignment is the task of identifying word-level correspondences between sentences in a bilingual or multilingual parallel corpus. For a given pair of sentences, the goal is to pair each word in the source with its translation(s) in the target, allowing one-to-one, one-to-many, or many-to-one relations, and sometimes leaving words unaligned. Alignments are a central component of statistical machine translation and are used to induce translation models, create bilingual lexicons, and provide alignment-based features for downstream tasks.
Classical approaches treat word alignment as a generative modeling problem. The IBM alignment models (Model 1
Evaluation typically uses alignment error rate (AER) against a gold standard, or precision and recall of linked
Common challenges include handling non-literal translations, word order differences, polysemy, and many-to-many mappings. Word alignment relies