Search papers, labs, and topics across Lattice.
This paper identifies logical connectives as key failure points in LLM reasoning chains, showing they act as high-entropy "forking points" where errors commonly occur. To address this, they introduce a multi-layered framework that intervenes specifically at these logic-critical junctions using gradient-based steering, localized branching, and targeted reinforcement learning to optimize token preferences. Experiments demonstrate that this targeted intervention improves reasoning accuracy with better efficiency compared to global methods like beam search.
LLMs' reasoning chains are surprisingly fragile at logical connectives, but targeted interventions at these "forking points" can dramatically improve accuracy more efficiently than brute-force methods.
While LLMs demonstrate impressive reasoning capabilities, they remain fragile in multi-step logical deduction, where a single transition error can propagate through the entire reasoning chain, leading to unstable performance. In this work, we identify logical connectives as primary points of this structural fragility. Through empirical analysis, we show that connective tokens function as high entropy forking points, at which models frequently struggle to determine the correct logical direction. Motivated by this observation, we hypothesize that intervening in logical connective selection can guide LLMs toward more correct logical direction, thereby improving the overall reasoning chain. To validate this hypothesis, we propose a multi-layered framework that intervenes specifically at these logic-critical junctions in the reasoning process. Our framework includes (1) Gradient-based Logical Steering to guide LLMs internal representations towards valid reasoning subspaces, (2) Localized Branching to resolve ambiguity via targeted look-ahead search, and (3) Targeted Transition Preference Optimization, a surgical reinforcement learning objective that selectively optimizes single-token preferences at logical pivots. Crucially, by concentrating intervention solely on logic-critical transitions, our framework achieves a favorable accuracy--efficiency trade-off compared to global inference time scaling methods like beam search and self-consistency.