Apple MLMar 2, 2026arXiv:2603.02045

Expanding LLM Agent Boundaries with Strategy-Guided Exploration

Andrew Szot, Michael Kirchhof, Omar Attia, Alexander Toshev

AI Summary

This paper introduces Strategy-Guided Exploration (SGE), a novel reinforcement learning approach for LLM agents that addresses the exploration challenge in complex environments with sparse rewards. SGE leverages the LLM's planning and reasoning capabilities to generate high-level language strategies, conditioning action generation on these strategies to explore the strategy space rather than the action space. Results across UI interaction, tool-calling, coding, and embodied agent environments demonstrate that SGE outperforms exploration-focused RL baselines, improving learning efficiency and enabling the agent to solve tasks beyond the base model's capabilities.

Key Contribution

LLM agents can learn to solve tasks previously beyond their reach by exploring high-level language strategies instead of low-level actions, leading to more efficient and effective reinforcement learning.

Abstract

Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding. However, exploration remains a central challenge in RL for LLM agents, especially as they operate in language-action spaces with complex observations and sparse outcome rewards. In this work, we address exploration for LLM agents by leveraging the ability of LLMs to plan and reason in language about the environment to shift exploration from low-level actions to higher-level language strategies. We thus propose Strategy-Guided Exploration (SGE), which first generates a concise natural-language strategy that describes what to do to make progress toward the goal, and then generates environment actions conditioned on that strategy. By exploring in the space of strategies rather than the space of actions, SGE induces structured and diverse exploration that targets different environment outcomes. To increase strategy diversity during RL, SGE introduces mixed-temperature sampling, which explores diverse strategies in parallel, along with a strategy reflection process that grounds strategy generation on the outcomes of previous strategies in the environment. Across UI interaction, tool-calling, coding, and embodied agent environments, SGE consistently outperforms exploration-focused RL baselines, improving both learning efficiency and final performance. We show that SGE enables the agent to learn to solve tasks too difficult for the base model.

RLHF & Preference Learning Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Expanding LLM Agent Boundaries with Strategy-Guided Exploration

Related Papers