DeepMindFeb 18, 2026arXiv:2602.16488

Learning to Learn from Language Feedback with Social Meta-Learning

Jonathan Cook, Diego Antognini, Diego Antognini, Martin Klissarov, C. Musat, Martin Klissarov, Edward Grefenstette, Claudiu Musat, Edward Grefenstette

AI Summary

This paper introduces Social Meta-Learning (SML), a finetuning methodology that trains LLMs to proactively solicit and effectively learn from language feedback within simulated pedagogical dialogues. SML converts static tasks into interactive social learning problems, enabling models to leverage conversation for problem-solving beyond single-turn capabilities. The results demonstrate that SML generalizes across domains (math and coding) and improves performance on underspecified tasks by encouraging models to request necessary information and avoid premature answers.

Key Contribution

LLMs can be taught to proactively seek and effectively use conversational feedback, generalizing across tasks and improving their ability to handle ambiguity.

Abstract

Large language models (LLMs) often struggle to learn from corrective feedback within a conversational context. They are rarely proactive in soliciting this feedback, even when faced with ambiguity, which can make their dialogues feel static, one-sided, and lacking the adaptive qualities of human conversation. To address these limitations, we draw inspiration from social meta-learning (SML) in humans - the process of learning how to learn from others. We formulate SML as a finetuning methodology, training LLMs to solicit and learn from language feedback in simulated pedagogical dialogues, where static tasks are converted into interactive social learning problems. SML effectively teaches models to use conversation to solve problems they are unable to solve in a single turn. This capability generalises across domains; SML on math problems produces models that better use feedback to solve coding problems and vice versa. Furthermore, despite being trained only on fully-specified problems, these models are better able to solve underspecified tasks where critical information is revealed over multiple turns. When faced with this ambiguity, SML-trained models make fewer premature answer attempts and are more likely to ask for the information they need. This work presents a scalable approach to developing AI systems that effectively learn from language feedback.

Natural Language Processing RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning to Learn from Language Feedback with Social Meta-Learning

Related Papers