Search papers, labs, and topics across Lattice.
The paper introduces TrigReason, a trigger-based collaborative framework that intelligently combines Small Reasoning Models (SRMs) and Large Reasoning Models (LRMs) to accelerate inference. TrigReason addresses limitations of SRMs by selectively activating LRM intervention for strategic planning, cognitive overload, and unproductive loops. Experiments on AIME24, AIME25, and GPQA-D demonstrate that TrigReason achieves comparable accuracy to full LRMs while significantly reducing latency and API costs by offloading reasoning steps to SRMs.
Achieve LLaMA-level reasoning accuracy with 44% lower latency and 73% lower API costs by strategically offloading work from large to small models only when needed.
Large Reasoning Models (LRMs) achieve strong performance on complex tasks through extended chains of thought but suffer from high inference latency due to autoregressive reasoning. Recent work explores using Small Reasoning Models (SRMs) to accelerate LRM inference. In this paper, we systematically characterize the capability boundaries of SRMs and identify three common types of reasoning risks: (1) path divergence, where SRMs lack the strategic ability to construct an initial plan, causing reasoning to deviate from the most probable path; (2) cognitive overload, where SRMs fail to solve particularly difficult steps; and (3) recovery inability, where SRMs lack robust self-reflection and error correction mechanisms. To address these challenges, we propose TrigReason, a trigger-based collaborative reasoning framework that replaces continuous polling with selective intervention. TrigReason delegates most reasoning to the SRM and activates LRM intervention only when necessary-during initial strategic planning (strategic priming trigger), upon detecting extraordinary overconfidence (cognitive offload trigger), or when reasoning falls into unproductive loops (intervention request trigger). The evaluation results on AIME24, AIME25, and GPQA-D indicate that TrigReason matches the accuracy of full LRMs and SpecReason, while offloading 1.70x - 4.79x more reasoning steps to SRMs. Under edge-cloud conditions, TrigReason reduces latency by 43.9\% and API cost by 73.3\%. Our code is available at \href{https://github.com/QQQ-yi/TrigReason}{https://github.com/QQQ-yi/TrigReason}