Feb 23, 2026arXiv:2602.19519

Ada-RS: Adaptive Rejection Sampling for Selective Thinking

Yirou Ge, Yixi Li, Alec Chiu, Shivani Shekhar, Zijie Pan, Avinash Thangali, Yun-Shiuan Chuang, Chaitanya Kulkarni, Uma Kona, Linsey Pang, Prakhar Mehrotra

AI Summary

The paper introduces Adaptive Rejection Sampling (Ada-RS), a sample filtering framework for training LLMs to selectively engage in chain-of-thought reasoning based on context. Ada-RS scores multiple sampled completions with an adaptive length-penalized reward and uses stochastic rejection sampling to retain high-reward candidates for downstream optimization with methods like DPO or DAPO. Experiments using Qwen3-8B with LoRA on an e-commerce benchmark show Ada-RS improves the accuracy-efficiency frontier by reducing output tokens by up to 80% and thinking rate by up to 95% while maintaining or improving tool call accuracy.

Key Contribution

LLMs can slash token usage by 80% and "thinking rate" by 95% without sacrificing accuracy, simply by learning when *not* to reason.

Abstract

Large language models (LLMs) are increasingly being deployed in cost and latency-sensitive settings. While chain-of-thought improves reasoning, it can waste tokens on simple requests. We study selective thinking for tool-using LLMs and introduce Adaptive Rejection Sampling (Ada-RS), an algorithm-agnostic sample filtering framework for learning selective and efficient reasoning. For each given context, Ada-RS scores multiple sampled completions with an adaptive length-penalized reward then applies stochastic rejection sampling to retain only high-reward candidates (or preference pairs) for downstream optimization. We demonstrate how Ada-RS plugs into both preference pair (e.g. DPO) or grouped policy optimization strategies (e.g. DAPO). Using Qwen3-8B with LoRA on a synthetic tool call-oriented e-commerce benchmark, Ada-RS improves the accuracy-efficiency frontier over standard algorithms by reducing average output tokens by up to 80% and reducing thinking rate by up to 95% while maintaining or improving tool call accuracy. These results highlight that training-signal selection is a powerful lever for efficient reasoning in latency-sensitive deployments.

Inference & Quantization Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Ada-RS: Adaptive Rejection Sampling for Selective Thinking

Related Papers