Stanford HAIPKUFeb 19, 2026arXiv:2602.17497

Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models

Wen-Tse Chen, Fahim Tajwar, Hao Zhu, Xintong Duan, Jeff Schneider

AI Summary

This paper introduces Retrospective In-Context Learning (RICL) to leverage LLMs for temporal credit assignment, transforming sparse rewards into dense advantage function estimates. RICL uses in-context learning to infer advantages from past trajectories, enabling the identification of critical states. The authors then propose RICOL, an online learning framework that iteratively refines policies based on RICL's credit assignment, demonstrating improved sample efficiency compared to traditional RL on BabyAI tasks.

Key Contribution

LLMs can turn sparse rewards into dense training signals for RL agents, achieving comparable performance with significantly higher sample efficiency.

Abstract

Learning from self-sampled data and sparse environmental feedback remains a fundamental challenge in training self-evolving agents. Temporal credit assignment mitigates this issue by transforming sparse feedback into dense supervision signals. However, previous approaches typically depend on learning task-specific value functions for credit assignment, which suffer from poor sample efficiency and limited generalization. In this work, we propose to leverage pretrained knowledge from large language models (LLMs) to transform sparse rewards into dense training signals (i.e., the advantage function) through retrospective in-context learning (RICL). We further propose an online learning framework, RICOL, which iteratively refines the policy based on the credit assignment results from RICL. We empirically demonstrate that RICL can accurately estimate the advantage function with limited samples and effectively identify critical states in the environment for temporal credit assignment. Extended evaluation on four BabyAI scenarios show that RICOL achieves comparable convergent performance with traditional online RL algorithms with significantly higher sample efficiency. Our findings highlight the potential of leveraging LLMs for temporal credit assignment, paving the way for more sample-efficient and generalizable RL paradigms.

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models

Related Papers