Tsinghua AIJun 8, 2026arXiv:2606.08994

Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning

Trapoom Ukarapol, Pakhapoom Sarapat, Nut Chukamphaeng

AI Summary

This paper introduces a tuning-free approach to mitigate language confusion in large language models (LLMs) when generating non-English text. The authors present two innovative methods, Language-Aware Token Boosting (LATB) and Adaptive Language-Aware Token Boosting (Adaptive-LATB), which enhance multilingual alignment by applying targeted perturbations to language-specific tokens and dynamically adjusting these perturbations based on model confidence. Experimental results reveal that these methods significantly reduce language confusion while preserving summarization quality, offering a compelling alternative to traditional fine-tuning techniques.

Key Contribution

Language confusion in LLMs can be effectively reduced without fine-tuning, enhancing multilingual performance while maintaining output quality.

Abstract

Large language models (LLMs) sometimes exhibit language confusion when generating non-English text. Existing approaches typically rely on fine-tuning to mitigate this issue. In contrast, we propose a tuning-free paradigm for reducing language confusion. Within this paradigm, we introduce two methods: Language-Aware Token Boosting (LATB), which applies targeted perturbations to tokens associated with the desired language, and Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts these perturbations based on the model's confidence in the intended language. Experiments demonstrate that our methods effectively improve multilingual alignment by reducing language confusion, while maintain the summarization quality without requiring any additional fine-tuning. Our code is publicly available. https://github.com/scbdatax/genai-datax-language-aware-token-boosting.

Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning

Related Papers