Shenzhen UniversityApr 7, 2026arXiv:2604.06155

Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Naipeng Chao

AI Summary

This paper analyzes Multi-Token Prediction (MTP) in LLMs, showing it encourages the formation of internal belief states through representational contractivity via gradient coupling. However, standard MTP is shown to suffer from structural hallucinations due to discrete token supervision creating shortcuts in the latent space. To mitigate this, they introduce Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories, improving representation alignment and robustness.

Key Contribution

LLMs can develop more consistent world models by predicting multiple tokens *and* anchoring those predictions to ground-truth hidden state trajectories, mitigating structural hallucinations.

Abstract

Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

Related Papers