BLEURT - Infinite Lexicon - Infinite Lexicon

BLEURT

BLEURT is a learned evaluation metric for natural language generation that assigns a continuous, real-valued quality score to a candidate text with respect to a reference. It is designed to better reflect human judgments of translation, summarization, and other NLG outputs than traditional word-overlap metrics.

BLEURT builds on a pre-trained transformer encoder to obtain representations of the candidate and reference text.

Empirical evaluations report that BLEURT achieves higher correlations with human judgments than BLEU, ROUGE, and METEOR

Limitations include dependence on the domain and language of the training data, computational cost relative to

representations

a