Search papers, labs, and topics across Lattice.
This paper investigates whether large language models (LLMs) decide on actions before or after generating chain-of-thought reasoning. The authors demonstrate that tool-calling decisions can be decoded from pre-generation activations using linear probes, even before any reasoning tokens are produced. Furthermore, activation steering reveals that perturbing these early decision directions causally influences subsequent deliberation and behavior, suggesting that LLMs can pre-encode action choices.
LLMs may "decide" before they "think": tool-calling decisions are encoded in pre-generation activations, shaping subsequent chain-of-thought reasoning.
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.