Search papers, labs, and topics across Lattice.
The paper introduces Geometric Latent Reasoning (GLR), a novel approach that formulates latent reasoning as a geometric path-approximation problem in the token-embedding space of large language models. By employing a lightweight transition head to predict iterative direction updates, GLR allows models to approximate reasoning trajectories with continuous deviations, leading to significantly shorter generation lengths without sacrificing accuracy. Evaluations on mathematical reasoning benchmarks demonstrate that GLR enables models to achieve correct answers with fewer generation steps, highlighting a new tradeoff between computation budget, output length, and reasoning accuracy.
Geometric Latent Reasoning reduces the length of reasoning chains in LLMs, achieving correct answers with fewer steps by leveraging continuous trajectories in embedding space.
Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.