Stanford HAIHarvardInstitute for the Study of Natural and ArtificialApr 6, 2026arXiv:2604.04723

Individual and Combined Effects of English as a Second Language and Typos on LLM Performance

Serena Liu, Yutong Yang, Prisha Sheth, Weixuan Dong, Mingjiao Diao, Xinru Zhu, Nikhil Banga, Oscar Melendez, Arnav Sharma, Minda Zhao, Marina Lin, Mengyu Wang

AI Summary

This paper investigates the individual and combined effects of English as a Second Language (ESL) variations and typographical errors on LLM performance using the Trans-EnV framework for ESL variation and MulTypo for typo injection. The study reveals that the combination of ESL and typos leads to larger performance drops compared to either factor alone, particularly on closed-ended tasks. The combined effect is not simply additive, highlighting the complex interaction between these two factors in degrading LLM performance.

Key Contribution

LLMs struggle even more when facing the double whammy of non-native English and typos, revealing that real-world performance is likely overestimated by standard English benchmarks.

Abstract

Large language models (LLMs) are used globally, and because much of their training data is in English, they typically perform best on English inputs. As a result, many non-native English speakers interact with them in English as a second language (ESL), and these inputs often contain typographical errors. Prior work has largely studied the effects of ESL variation and typographical errors separately, even though they often co-occur in real-world use. In this study, we use the Trans-EnV framework to transform standard English inputs into eight ESL variants and apply MulTypo to inject typos at three levels: low, moderate, and severe. We find that combining ESL variation and typos generally leads to larger performance drops than either factor alone, though the combined effect is not simply additive. This pattern is clearest on closed-ended tasks, where performance degradation can be characterized more consistently across ESL variants and typo levels, while results on open-ended tasks are more mixed. Overall, these findings suggest that evaluations on clean standard English may overestimate real-world model performance, and that evaluating ESL variation and typographical errors in isolation does not fully capture model behavior in realistic settings.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Individual and Combined Effects of English as a Second Language and Typos on LLM Performance

Related Papers