Search papers, labs, and topics across Lattice.
The paper introduces ReDAct, a framework for LLM agents that strategically defers decisions from a small, cheap LLM to a larger, more reliable LLM based on the small model's predictive uncertainty. By calibrating a threshold for uncertainty, ReDAct minimizes the use of the expensive model while maintaining decision-making quality in sequential tasks. Experiments in ALFWorld and MiniGrid show that deferring only 15% of decisions can match the performance of using the large model exclusively, leading to significant cost savings.
Deferring to a larger LLM only when a smaller LLM is uncertain can match the performance of the larger model alone, while slashing inference costs.
Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that deferring only about 15% of decisions to the large model can match the quality of using it exclusively, while significantly reducing inference costs.