Mar 17, 2026arXiv:2603.16856

Online Experiential Learning for Language Models

Tianzhu Ye, Tianzhu Ye, Li Dong, Li Dong, Qingxiu Dong, Qingxiu Dong, Xun Wu, Xun Wu, Shaohan Huang, Shaohan Huang, Furu Wei, Furu Wei

AI Summary

This paper introduces Online Experiential Learning (OEL), a framework for continuously improving language models by learning from their deployment experience. OEL extracts transferable knowledge from user interaction trajectories and consolidates it into model parameters via on-policy context distillation. Experiments on text-based games demonstrate that OEL consistently improves task accuracy and token efficiency while preserving out-of-distribution performance, highlighting the importance of on-policy consistency between the knowledge source and the policy model.

Key Contribution

Language models can learn directly from real-world user interactions, boosting performance without human annotations or simulated environments.

Abstract

The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected on the user side; second, this knowledge is consolidated into model parameters via on-policy context distillation, requiring no access to the user-side environment. The two stages are iterated to form an online learning loop, where the improved model collects higher-quality trajectories that yield richer experiential knowledge for subsequent rounds. We evaluate OEL on text-based game environments across multiple model scales and both thinking and non-thinking variants. OEL achieves consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. Our analysis further shows that extracted experiential knowledge is significantly more effective than raw trajectories, and that on-policy consistency between the knowledge source and the policy model is critical for effective learning.

Natural Language Processing RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References18

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Online Experiential Learning for Language Models

Related Papers