IIT BombayIIT PatnaFeb 19, 2026arXiv:2602.17425

Evaluating Extremely Low-Resource Machine Translation: A Comparative Study of ChrF++ and BLEU Metrics

Sanjeev Kumar, Preethi Jyothi, Pushpak Bhattacharyya

AI Summary

This paper compares BLEU and ChrF++ metrics for evaluating machine translation quality in extremely low-resource language (ELRL) settings using outputs from LLMs and NMT systems. The study analyzes how each metric responds to common translation artifacts like hallucinations, repetitions, and source-text copying across three ELRLs: Magahi, Bhojpuri, and Chhattisgarhi. The results indicate that while ChrF++ is often favored, BLEU provides complementary lexical-precision insights, enhancing the interpretability of MT evaluation in ELRL scenarios.

Key Contribution

Don't ditch BLEU for ChrF++ just yet: in extremely low-resource MT, BLEU's lexical precision offers crucial insights that ChrF++ misses.

Abstract

Evaluating machine translation (MT) quality in extremely low-resource language (ELRL) scenarios poses unique challenges, as widely used metrics such as BLEU, effective in high-resource settings, often misrepresent quality in data-scarce contexts. This work presents a comparative analysis of BLEU, an n-gram-based metric, and ChrF++, a character-based metric, for MT evaluation in ELRL settings. We examine how each metric responds to translation artifacts, including hallucinations, repetition, source-text copying, and diacritic (\textit{matra}) variations across three ELRLs: Magahi, Bhojpuri, and Chhattisgarhi, with a focus on outputs from large language models (LLMs) and neural MT (NMT) systems. While recent work often relies solely on ChrF++, our findings show that BLEU, despite its lower absolute scores, provides complementary lexical-precision insights that improve interpretability.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evaluating Extremely Low-Resource Machine Translation: A Comparative Study of ChrF++ and BLEU Metrics

Related Papers