Microsoft ResearchVanderbiltMar 9, 2026arXiv:2603.08999

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

Juming Xiong, Kevin Guo, Congning Ni, Katherine Brown, Avinash Baidya, Bradley Marlin

AI Summary

This paper introduces a confidence-aware decision framework that adaptively chooses between single-path and multi-path Chain-of-Thought (CoT) reasoning based on features extracted from a single reasoning trajectory. The framework is trained on MedQA using sentence-level numeric and linguistic features and generalizes to other datasets without fine-tuning. Results show the method achieves comparable accuracy to multi-path CoT while using up to 80% fewer tokens, demonstrating the potential for uncertainty estimation in reasoning trajectories.

Key Contribution

LLMs can slash inference costs by 80% without sacrificing accuracy, simply by learning to recognize when their own reasoning is shaky and needs a second opinion.

Abstract

Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

Related Papers