Search papers, labs, and topics across Lattice.
This paper introduces Dual-Track CoT, a novel inference-time method for small language models (SLMs) that employs a lightweight "guidance track" to monitor and refine the primary CoT reasoning process. The guidance track uses a budget-aware mechanism to control the generation of rationales, rejecting redundant steps and ensuring efficient token usage. Experiments demonstrate that Dual-Track CoT significantly improves the reasoning performance of SLMs (7-8B parameters) on multi-step tasks, achieving comparable or better results than larger models with similar token budgets.
Small language models can achieve reasoning performance rivaling larger models, even under tight token budgets, by using a lightweight "guidance track" to strategically prune and refine their chain-of-thought reasoning.
Large Language Models (LLMs) solve many reasoning tasks via chain-of-thought (CoT) prompting, but smaller models (about 7 to 8B parameters) still struggle with multi-step reasoning under tight compute and token budgets. Existing test time reasoning methods such as self consistency (sampling multiple rationales and voting), Tree-of-Thoughts (search over intermediate thoughts), and critique revise loops improve performance, but often at high token cost and without fine-grained step-level control. This project1 aims to address that gap: can Small Language Models (SLMs) reason reliably using the same or fewer tokens? This question is both scientific and practical. Scientifically, it probes whether process supervision and simple test-time controls (such as token budgets and rejection of redundant steps) can substitute for model scale or large sampling counts. Practically, many deployments (on-device, low-latency, or cost-constrained settings) cannot afford huge models or dozens of sampled rationales per query. A method that improves SLM reasoning at fixed cost would therefore be directly useful.