BUPTChina MobileApr 7, 2026arXiv:2604.05808

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

Shuai Zhen, Yanhua Yu, Ruopei Guo, Nan Cheng, Yang Deng

AI Summary

This paper introduces STEP-HRL, a hierarchical reinforcement learning framework for LLM agents that learns from augmented step-level transitions rather than full interaction histories. STEP-HRL uses a local progress module to summarize interaction history within subtasks, creating compact representations of progress. Experiments on ScienceWorld and ALFWorld show that STEP-HRL achieves better performance and generalization with reduced token usage compared to agents using full interaction histories.

Key Contribution

LLM agents can achieve superior performance and generalization in complex tasks while slashing token usage by learning from compact, step-level summaries of subtask progress.

Abstract

Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent global progress of overall task. By introducing a local progress module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies. Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage. Our code is available at https://github.com/TonyStark042/STEP-HRL.

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

Related Papers