SheffieldApr 17, 2026arXiv:2604.16027

Where does output diversity collapse in post-training?

AI Summary

This study investigates the phenomenon of output diversity collapse in post-trained language models, revealing that the loss of varied outputs is closely linked to the composition of training data rather than just the post-training methods employed. By analyzing three distinct post-training approaches—Olmo 3, Think, and Instruct—across multiple tasks, the authors find that the Think lineage experiences significant semantic diversity loss during supervised fine-tuning, while Instruct models exhibit a more pronounced effect from direct preference optimization (DPO). Ultimately, the research concludes that diversity collapse is a consequence of training data choices, emphasizing that addressing it requires intervention during the training phase rather than at inference time.

Key Contribution

Output diversity in post-trained models collapses due to training data composition, not just post-training methods, challenging assumptions about inference-time fixes.

Abstract

Post-trained language models produce less varied outputs than their base counterparts. This output diversity collapse undermines inference-time scaling methods that rely on varied samples, and risks homogenizing model outputs on creative and value-laden tasks. Prior work attributes collapse to specific post-training methods, without separating the role of training data composition from the method, or the generation format from the model weights. We trace output diversity through three parallel post-training lineages of Olmo 3, Think (chain-of-thought distillation), Instruct (broad multi-source data), and RL-Zero, across 15 tasks and four text diversity metrics. We find that the location of collapse co-varies with data composition: the Think lineage loses most semantic diversity at supervised fine-tuning, and the effect of DPO is larger in Instruct than in Think. Suppressing chain-of-thought reasoning at inference in Think models drops accuracy on hard tasks, yet leaves answer-level diversity unchanged, showing that the collapse is embedded in the model weights by training data, not imposed by the generation format. Decomposing diversity loss on six verifiable tasks into a quality-control component (removal of incorrect outputs) and a residual component (genuine narrowing among correct outputs) reveals that the split is task-dependent, and Think models retain more correct-answer diversity than Instruct despite collapsing more in aggregate. Our results indicate that diversity collapse is determined during training by data composition and cannot be addressed at inference time alone.

Data Curation & Synthetic Data Natural Language Processing Scaling Laws & Emergent Abilities

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Where does output diversity collapse in post-training?

Related Papers