MelbourneMonashFeb 15, 2026arXiv:2602.14077

GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler

Minghan Wang, Ye Bai, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari

AI Summary

This paper introduces the Gaussian Thought Sampler (GTS), a novel approach to inference-time scaling (ITS) in latent reasoning models that learns context-dependent perturbation distributions over continuous reasoning states. GTS is trained using GRPO-style policy optimization while keeping the backbone model frozen, allowing for structured exploration of latent thought trajectories. Experiments on GSM8K with two latent reasoning architectures demonstrate that GTS outperforms heuristic perturbation methods, suggesting that structured exploration is crucial for effective ITS.

Key Contribution

Forget random noise – teaching models *how* to explore their reasoning process yields more reliable inference-time scaling.

Abstract

Inference-time scaling (ITS) in latent reasoning models typically introduces stochasticity through heuristic perturbations, such as dropout or fixed Gaussian noise. While these methods increase trajectory diversity, their exploration behavior is not explicitly modeled and can be inefficient under finite sampling budgets. We observe that stronger perturbations do not necessarily translate into more effective candidate trajectories, as unguided noise may disrupt internal decision structure rather than steer it. To provide a more structured alternative, we model latent thought exploration as conditional sampling from learnable densities and instantiate this idea as a Gaussian Thought Sampler (GTS). GTS predicts context-dependent perturbation distributions over continuous reasoning states and is trained with GRPO-style policy optimization while keeping the backbone frozen. Experiments on GSM8K with two latent reasoning architectures show that GTS achieves more reliable inference-time scaling than heuristic baselines. These findings indicate that improving latent ITS requires structured and optimizable exploration mechanisms rather than simply amplifying stochasticity.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler

Related Papers