Mar 9, 2026arXiv:2603.08398

Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective

Liyuan Mao, Le Yu, Jingren Zhou, Jing Zhou, Chujie Zheng, Bowen Yu, Chang Gao, Shixuan Liu, An Yang, Weinan Zhang, Junyang Lin

AI Summary

The paper demonstrates that LLMs exhibit behavioral plasticity, adapting their behavior based on token-conditional generation at inference time. They introduce Token-Conditioned Reinforcement Learning (ToCoRL) to stabilize this plasticity, transforming transient adaptations into learnable behavioral patterns. Experiments show ToCoRL enables precise behavioral control, adapting reasoning models to excel at factual question answering without capability degradation.

Key Contribution

LLMs can switch between reasoning and factual answering on the fly, without retraining, simply by conditioning on specific token prefixes.

Abstract

In this work, we reveal that Large Language Models (LLMs) possess intrinsic behavioral plasticity-akin to chameleons adapting their coloration to environmental cues-that can be exposed through token-conditional generation and stabilized via reinforcement learning. Specifically, by conditioning generation on carefully selected token prefixes sampled from responses exhibiting desired behaviors, LLMs seamlessly adapt their behavioral modes at inference time (e.g., switching from step-by-step reasoning to direct answering) without retraining. Based on this insight, we propose Token-Conditioned Reinforcement Learning (ToCoRL), a principled framework that leverages RL to internalize this chameleon-like plasticity, transforming transient inference-time adaptations into stable and learnable behavioral patterns. ToCoRL guides exploration with token-conditional generation and keep enhancing exploitation, enabling emergence of appropriate behaviors. Extensive experiments show that ToCoRL enables precise behavioral control without capability degradation. Notably, we show that large reasoning models, while performing strongly on complex mathematics, can be effectively adapted to excel at factual question answering, which was a capability previously hindered by their step-by-step reasoning patterns.

Natural Language Processing Reasoning & Chain-of-Thought RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective

Related Papers