WHUApr 16, 2026arXiv:2604.14847

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

Yi Zhao, Yajuan Peng, Cam-Tu Nguyen, Z. Li, Zuchao Li, Xiaoliang Wang, Xiaoming Fu

AI Summary

The paper introduces TrigReason, a trigger-based collaborative framework that intelligently combines Small Reasoning Models (SRMs) and Large Reasoning Models (LRMs) to accelerate inference. TrigReason addresses limitations of SRMs by selectively activating LRM intervention for strategic planning, cognitive overload, and unproductive loops. Experiments on AIME24, AIME25, and GPQA-D demonstrate that TrigReason achieves comparable accuracy to full LRMs while significantly reducing latency and API costs by offloading reasoning steps to SRMs.

Key Contribution

Achieve LLaMA-level reasoning accuracy with 44% lower latency and 73% lower API costs by strategically offloading work from large to small models only when needed.

Abstract

Large Reasoning Models (LRMs) achieve strong performance on complex tasks through extended chains of thought but suffer from high inference latency due to autoregressive reasoning. Recent work explores using Small Reasoning Models (SRMs) to accelerate LRM inference. In this paper, we systematically characterize the capability boundaries of SRMs and identify three common types of reasoning risks: (1) path divergence, where SRMs lack the strategic ability to construct an initial plan, causing reasoning to deviate from the most probable path; (2) cognitive overload, where SRMs fail to solve particularly difficult steps; and (3) recovery inability, where SRMs lack robust self-reflection and error correction mechanisms. To address these challenges, we propose TrigReason, a trigger-based collaborative reasoning framework that replaces continuous polling with selective intervention. TrigReason delegates most reasoning to the SRM and activates LRM intervention only when necessary-during initial strategic planning (strategic priming trigger), upon detecting extraordinary overconfidence (cognitive offload trigger), or when reasoning falls into unproductive loops (intervention request trigger). The evaluation results on AIME24, AIME25, and GPQA-D indicate that TrigReason matches the accuracy of full LRMs and SpecReason, while offloading 1.70x - 4.79x more reasoning steps to SRMs. Under edge-cloud conditions, TrigReason reduces latency by 43.9\% and API cost by 73.3\%. Our code is available at \href{https://github.com/QQQ-yi/TrigReason}{https://github.com/QQQ-yi/TrigReason}

Inference & Quantization Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

Related Papers