Search papers, labs, and topics across Lattice.
The paper introduces Directional Reasoning Trajectory Change (DRTC), a process-causal framework for identifying critical segments in long-form reasoning by detecting pivot decision points using uncertainty and distribution-shift signals. DRTC intervenes by blocking information flow from earlier chunks at these pivots and measures the redirection of the model's log-probability trajectory, providing a signed attribution score for each chunk. Experiments on four reasoning models and a scaling study on MATH problems demonstrate that DRTC identifies spans that significantly influence the reasoning trajectory, outperforming random spans in steering the model's output.
Uncover the hidden reasoning pivots in your LLM with DRTC, a new method that pinpoints the exact moments where context truly steers the model's train of thought.
Understanding how language models carry out long-horizon reasoning remains an open challenge. Existing interpretability methods often highlight tokens or spans correlated with an answer, but they rarely reveal where the model makes consequential reasoning turns, which earlier context causally triggers those turns, or whether the highlighted text actually steers the reasoning process. We introduce Directional Reasoning Trajectory Change (DRTC), a process-causal framework for interpreting long-form reasoning from a single on-policy rollout. DRTC detects pivot decision points using uncertainty and distribution-shift signals, then applies receiver-side interventions that preserve the realized rollout without resampling the continuation while blocking information flow from selected earlier chunks only at a pivot. It measures whether each intervention redirects the direction of the model's log-probability trajectory relative to the realized rollout direction, producing a signed per-chunk attribution score. We also compute turning-angle curvature changes on raw logits as a complementary diagnostic and introduce curvature signatures to summarize shared intervention-response geometry. Empirically, directional influence is sharply concentrated across four reasoning models (per-example |DRTC| shares yield Gini 0.50 to 0.58 and top-5 percent mass 0.23 to 0.28), and learned pivots induce stronger intervention magnitudes than matched random spans. In a scaling study on 500 MATH problems with R1-Distill-Qwen-1.5B, learned spans outperform matched random spans (median delta = 0.409, 355 of 500 positive; sign test p = 2.3e-21). Overall, DRTC provides a causally grounded, trajectory-level view of how specific context elements steer reasoning under on-policy dynamics.