UT DallasMar 17, 2026arXiv:2603.16060

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

AI Summary

ARISE is introduced as a hierarchical reinforcement learning framework that improves mathematical reasoning in language models by leveraging reusable strategies. It uses a shared policy to manage skills at a high-level (Skills Manager) and generate responses at a low-level (Worker). The Skills Manager maintains a tiered skill library by summarizing successful solution traces and retrieves relevant skills to condition future rollouts, guided by a hierarchical reward design.

Key Contribution

ARISE lets language models solve math problems better by learning and reusing successful solution strategies, outperforming existing RL methods, especially on harder, out-of-distribution problems.

Abstract

The dominant paradigm for improving mathematical reasoning in language models relies on Reinforcement Learning with verifiable rewards. Yet existing methods treat each problem instance in isolation without leveraging the reusable strategies that emerge and accumulate during training. To this end, we introduce ARISE (Agent Reasoning via Intrinsic Skill Evolution), a hierarchical reinforcement learning framework, in which a shared policy operates both to manage skills at high-level and to generate responses at low-level (denoted as a Skills Manager and a Worker, respectively). The Manager maintains a tiered skill library through a dedicated skill generation rollout that performs structured summarization of successful solution traces (after execution), while employing a policy-driven selection mechanism to retrieve relevant skills to condition future rollouts (before execution). A hierarchical reward design guides the co-evolution of reasoning ability and library quality. Experiments on two base models and seven benchmarks spanning both competition mathematics and Omni-MATH show that ARISE consistently outperforms GRPO-family algorithms and memory-augmented baselines, with particularly notable gains on out-of-distribution tasks. Ablation studies confirm that each component contributes to the observed improvements and that library quality and reasoning performance improve in tandem throughout training. Code is available at \href{https://github.com/Skylanding/ARISE}{https://github.com/Skylanding/ARISE}.

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

Related Papers