Apr 2, 2026arXiv:2604.01779

Taming CATS: Controllable Automatic Text Simplification through Instruction Fine-Tuning with Control Tokens

Hanna Hubarava, Hanna Hubarava, Yingqiang Gao, Yingqiang Gao

AI Summary

This paper introduces a domain-agnostic, instruction fine-tuned framework (CATS) using discrete control tokens to steer open-source LLMs for controllable automatic text simplification (ATS). They find that while smaller models (1-3B) can be competitive, reliable controllability hinges on sufficient variation in the training data for the target attribute, with readability control proving more consistent than compression control. The authors also highlight the inadequacy of standard simplification metrics for measuring control, advocating for error-based measures and careful data splitting to avoid distributional mismatch.

Key Contribution

Smaller LLMs can achieve competitive controllable text simplification, but only if the training data adequately reflects the desired control attribute, revealing a critical data dependency often overlooked in ATS research.

Abstract

Controllable Automatic Text Simplification (CATS) produces user-tailored outputs, yet controllability is often treated as a decoding problem and evaluated with metrics that are not reflective to the measure of control. We observe that controllability in ATS is significantly constrained by data and evaluation. To this end, we introduce a domain-agnostic CATS framework based on instruction fine-tuning with discrete control tokens, steering open-source models to target readability levels and compression rates. Across three model families with different model sizes (Llama, Mistral, Qwen; 1-14B) and four domains (medicine, public administration, news, encyclopedic text), we find that smaller models (1-3B) can be competitive, but reliable controllability strongly depends on whether the training data encodes sufficient variation in the target attribute. Readability control (FKGL, ARI, Dale-Chall) is learned consistently, whereas compression control underperforms due to limited signal variability in the existing corpora. We further show that standard simplification and similarity metrics are insufficient for measuring control, motivating error-based measures for target-output alignment. Finally, our sampling and stratification experiments demonstrate that naive splits can introduce distributional mismatch that undermines both training and evaluation.

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References43

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Taming CATS: Controllable Automatic Text Simplification through Instruction Fine-Tuning with Control Tokens

Related Papers