Mar 30, 2026arXiv:2603.28248

Reasoning as Energy Minimization over Structured Latent Trajectories

AI Summary

The paper introduces Energy-Based Reasoning via Structured Latent Planning (EBRM), which frames reasoning as gradient-based optimization of a latent trajectory under a learned energy function. A key finding is that EBRM can degrade performance on CNF logic satisfaction tasks due to a distribution mismatch between the encoder-trained decoder and the planner-generated latent trajectories. To mitigate this, the authors propose dual-path decoder training and latent anchoring, along with a detailed ablation study to understand component contributions.

Key Contribution

Latent planning for reasoning can actually *hurt* performance due to decoder distribution shift, highlighting a critical challenge in bridging neural and symbolic reasoning.

Abstract

Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory $z_{1:T}$ under a learned energy function $E(h_x, z)$. The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encoder-decoder learning with contrastive energy shaping using hard negatives, while inference performs gradient descent or Langevin dynamics over $z$ and decodes from $z_T$. We identify a critical failure mode: on CNF logic satisfaction, latent planning reduces accuracy from $\approx 95\%$ to $\approx 56\%$. This degradation arises from a distribution mismatch, where the decoder is trained on encoder outputs $h_x$ but evaluated on planner outputs $z_T$ that drift into unseen latent regions. We analyze this behavior through per-step decoding, latent drift tracking, and gradient decomposition. To address it, we propose dual-path decoder training and latent anchoring. We further introduce a six-part ablation protocol covering component contributions, trajectory length, planner dynamics, initialization, decoder training distribution, and anchor weight. Experiments on three synthetic tasks show that energy decreases monotonically and induces structured latent trajectories on graph and logic tasks, while remaining flat on arithmetic ($r = 0.073$), indicating a negative result. Code is available at https://github.com/dkjo8/ebr-via-structured-latent-planning.

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reasoning as Energy Minimization over Structured Latent Trajectories

Related Papers