SNUYonseiApr 7, 2026arXiv:2604.05302

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification

Jinhong Jeong, Junghun Park, Youngjae Yu

AI Summary

The paper introduces Re-RIGHT, a reinforcement learning framework for multilingual text simplification tailored to specific language proficiency levels (CEFR, JLPT, TOPIK, HSK) without relying on parallel corpora. They demonstrate that prompting-based lexical simplification with large language models (GPT-5.2, Gemini 2.5) struggles at easier proficiency levels and in non-English languages. Re-RIGHT, trained on 43K vocabulary-level data across English, Japanese, Korean, and Chinese, uses a 4B policy model with vocabulary coverage, semantic preservation, and coherence rewards, outperforming LLM baselines in lexical coverage while maintaining meaning and fluency.

Key Contribution

Even the most powerful LLMs stumble when simplifying text for language learners, but a novel RL framework can bridge the gap.

Abstract

Text simplification supports second language (L2) learning by providing comprehensible input, consistent with the Input Hypothesis. However, constructing personalized parallel corpora is costly, while existing large language model (LLM)-based readability control methods rely on pre-labeled sentence corpora and primarily target English. We propose Re-RIGHT, a unified reinforcement learning framework for adaptive multilingual text simplification without parallel corpus supervision. We first show that prompting-based lexical simplification at target proficiency levels (CEFR, JLPT, TOPIK, and HSK) performs poorly at easier levels and for non-English languages, even with state-of-the-art LLMs such as GPT-5.2 and Gemini 2.5. To address this, we collect 43K vocabulary-level data across four languages (English, Japanese, Korean, and Chinese) and train a compact 4B policy model using Re-RIGHT, which integrates three reward modules: vocabulary coverage, semantic preservation, and coherence. Compared to the stronger LLM baselines, Re-RIGHT achieves higher lexical coverage at target proficiency levels while maintaining original meaning and fluency.

Data Curation & Synthetic Data Natural Language Processing RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification

Related Papers