MPSTMEJul 26, 2025arXiv:2509.04518

Advancing SLM Tool-Use Capability Using Reinforcement Learning

Dhruvi Paprunia, Vansh Kharidia, Pankti Doshi

AI Summary

This paper investigates the use of Group Relative Policy Optimization (GRPO) to improve tool-use accuracy in Small Language Models (SLMs). A reward system was designed to reinforce structured JSON output, correct tool selection, and precise parameter usage during RL training. Results demonstrate that GRPO significantly enhances SLMs' tool-use capabilities, enabling more effective function calling and JSON output generation.

Key Contribution

SLMs can punch above their weight in tool use with the right RL training, rivaling LLMs in specific function-calling tasks.

Abstract

In an era where tool-augmented AI agents are becoming increasingly vital, our findings highlight the ability of Group Relative Policy Optimization (GRPO) to empower SLMs, which are traditionally constrained in tool use. The ability to use tools effectively has become a defining feature of Large Language Models (LLMs), allowing them to access external data and internal resources. As AI agents grow more sophisticated, tool-use capabilities have become indispensable. While LLMs have made significant progress in this area, Small Language Models (SLMs) still face challenges in accurately integrating tool use, especially in resource-constrained settings. This study investigates how Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), can enhance the tool-use accuracy of SLMs. By designing a well-defined reward system that reinforces structured JSON output, correct tool selection, and precise parameter usage, we demonstrate that GRPO enables SLMs to achieve significant improvements in tooluse capabilities (function calling/JSON output). Our approach provides a computationally efficient training method that enhances SLMs' practical deployment in real-world AI applications.

RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References21

Year2025

VenueInternational Workshop on Artificial Intelligence and Cognition

Related Papers

Finding related papers...

Search

Advancing SLM Tool-Use Capability Using Reinforcement Learning

Related Papers