Mar 16, 2026arXiv:2603.15227

Bidirectional Chinese and English Passive Sentences Dataset for Machine Translation

Xinyue Ma, Pol Pastells, Mireia Farrús, Mariona Taulé

AI Summary

This paper introduces a new bidirectional Chinese-English dataset of passive sentences, extracted from five parallel corpora and automatically annotated with structure labels, to evaluate machine translation systems. The dataset comprises 73,965 parallel sentence pairs and includes a manually verified test set. Evaluation of state-of-the-art open-source and commercial MT systems reveals that models are overly influenced by the voice of the source text, maintaining passive voice even when human translators would not, although they show some awareness of the negative context associated with Chinese passives.

Key Contribution

MT models struggle to appropriately handle passive voice in Chinese-English translation, often mirroring the source text's voice even when human translators would diverge.

Abstract

Machine Translation (MT) evaluation has gone beyond metrics, towards more specific linguistic phenomena. Regarding English-Chinese language pairs, passive sentences are constructed and distributed differently due to language variation, thus need special attention in MT. This paper proposes a bidirectional multi-domain dataset of passive sentences, extracted from five Chinese-English parallel corpora and annotated automatically with structure labels according to human translation, and a test set with manually verified annotation. The dataset consists of 73,965 parallel sentence pairs (2,358,731 English words, 3,498,229 Chinese characters). We evaluate two state-of-the-art open-source MT systems with our dataset, and four commercial models with the test set. The results show that, unlike humans, models are more influenced by the voice of the source text rather than the general voice usage of the source language, and therefore tend to maintain the passive voice when translating a passive in either direction. However, models demonstrate some knowledge of the low frequency and predominantly negative context of Chinese passives, leading to higher voice consistency with human translators in English-to-Chinese translation than in Chinese-to-English translation. Commercial NMT models scored higher in metric evaluations, but LLMs showed a better ability to use diverse alternative translations. Datasets and annotation script will be shared upon request.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Bidirectional Chinese and English Passive Sentences Dataset for Machine Translation

Related Papers