Apr 8, 2026arXiv:2604.06753

Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

Heng Zhou, Zelin Tan, Zhemeng Zhang, Yutao Fan, Yutao Fan, Yibing Lin, Li Kang, Li Kang, Xiufeng Song, Rui Li, Rui Li, Songtao Huang, Songtao Huang, Ao Yu, Yuchen Fan, Yanxu Chen, Kaixin Xu, Yiran Qin, Philip Torr, Philip Torr, Chen Zhang, Zhenfei Yin, Zhenfei Yin

AI Summary

The paper investigates the performance of six different inference-time reasoning paradigms (Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode) across various LLMs and benchmarks, finding that no single paradigm consistently outperforms others. To address this, they propose a "select-then-solve" approach where a learned embedding-based router selects the most suitable paradigm for each task. Experiments show that this router significantly improves average accuracy compared to using a fixed paradigm, suggesting the importance of adaptive paradigm selection.

Key Contribution

Stop hard-coding reasoning strategies for your LLM agent: a learned router that dynamically picks the best paradigm for each task boosts performance by up to 5.5%, beating even the best fixed strategy.

Abstract

When an LLM-based agent improves on a task, is the gain from the model itself or from the reasoning paradigm wrapped around it? We study this question by comparing six inference-time paradigms, namely Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode, across four frontier LLMs and ten benchmarks, yielding roughly 18,000 runs. We find that reasoning structure helps dramatically on some tasks but hurts on others: ReAct improves over Direct by 44pp on GAIA, while CoT degrades performance by 15pp on HumanEval. No single paradigm dominates, and oracle per-task selection beats the best fixed paradigm by 17.1pp on average. Motivated by this complementarity, we propose a select-then-solve approach: before answering each task, a lightweight embedding-based router selects the most suitable paradigm. Across four models, the router improves average accuracy from 47.6% to 53.1%, outperforming the best fixed paradigm at 50.3% by 2.8pp and recovering up to 37% of the oracle gap. In contrast, zero-shot self-routing only works for GPT-5 at 67.1% and fails for weaker models, all trailing the learned router. Our results argue that reasoning paradigm selection should be a per-task decision made by a learned router, not a fixed architectural choice.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

Related Papers