Search papers, labs, and topics across Lattice.
The paper investigates "reasoning theater" in large language models, where models generate chain-of-thought (CoT) justifications that don't necessarily reflect their true beliefs. By comparing activation probing, early forced answering, and a CoT monitor on DeepSeek-R1 and GPT-OSS, the authors find that models often arrive at the final answer much earlier than their CoT suggests, especially on simpler tasks. Probe-guided early exit, leveraging these insights, achieves significant token reduction (up to 80% on MMLU) with minimal accuracy loss, demonstrating the potential for efficient adaptive computation.
LLMs often know the answer long before their "reasoning" suggests, wasting tokens on performative chain-of-thought.
We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal belief. Our analysis compares activation probing, early forced answering, and a CoT monitor across two large models (DeepSeek-R1 671B&GPT-OSS 120B) and find task difficulty-specific differences: The model's final answer is decodable from activations far earlier in CoT than a monitor is able to say, especially for easy recall-based MMLU questions. We contrast this with genuine reasoning in difficult multihop GPQA-Diamond questions. Despite this, inflection points (e.g., backtracking,'aha'moments) occur almost exclusively in responses where probes show large belief shifts, suggesting these behaviors track genuine uncertainty rather than learned"reasoning theater."Finally, probe-guided early exit reduces tokens by up to 80% on MMLU and 30% on GPQA-Diamond with similar accuracy, positioning attention probing as an efficient tool for detecting performative reasoning and enabling adaptive computation.