World Models & Planning

Capabilities

Internal world models for prediction, model-based planning, simulation, and environment modeling in AI systems.

Keywords

world modelplanningmodel-basedsimulationenvironment modelpredictive modelMCTSworld simulation

Recent Papers

Mar 1, 2026

National Engineering Research Center for Robot Visual Perception and Control Technologyjust now

Efficient Robotic 3D Measurement Through Multi-DoF Reinforcement Learning for Continuous Viewpoint Planning

This paper introduces a multi-degree-of-freedom reinforcement learning framework for robotic 3D measurement, enabling continuous viewpoint planning to improve the reconstruction of complex geometries. The framework uses a voxel-based state representation with dynamic ray-traced coverage updates and a dual-objective reward function to balance overlap control and viewpoint minimization. Experimental results on industrial parts show the proposed method achieves superior overlap regulation and planning efficiency compared to existing techniques, leading to more accurate 3D reconstructions.

Introduces a novel multi-DoF reinforcement learning framework for robotic 3D measurement that optimizes viewpoint planning by dynamically balancing coverage, overlap, and robotic kinematics.

Jun Ye, Qiu Fang, Shi Wang +3

Robotics & Embodied AIRLHF & Preference LearningWorld Models & Planning

Feb 12, 2026

2d ago

Safety Beyond the Training Data: Robust Out-of-Distribution MPC via Conformalized System Level Synthesis

This paper introduces a novel control framework that combines conformal prediction (CP) and system level synthesis (SLS) to achieve robust out-of-distribution (OOD) planning and control with learned dynamics models. The method uses weighted CP with a learned covariance model to derive high-confidence model error bounds, which are then incorporated into an SLS-based robust nonlinear MPC formulation with volume-optimized reachable sets for constraint tightening. Empirical results on nonlinear systems like a 4D car and a 12D quadcopter demonstrate improved safety and robustness, particularly in OOD scenarios, compared to baselines.

Integrates conformal prediction with system level synthesis to create a robust MPC framework that provides safety guarantees for out-of-distribution planning and control using learned dynamics models.

Anutam Srinivasan, Antoine Leeman, Glen Chou2602.12047

Robotics & Embodied AIWorld Models & PlanningRed-Teaming & Adversarial Robustness

2d ago

From Path Signatures to Sequential Modeling: Incremental Signature Contributions for Offline RL

The paper introduces Incremental Signature Contribution (ISC), a method that decomposes truncated path signatures into a temporally ordered sequence of elements, preserving the algebraic structure and expressivity of signatures while exposing temporal evolution. This allows for processing signature-based representations using sequential models, addressing the limitation of standard path signatures which collapse temporal structure. The authors then introduce ISC-Transformer (ISCT), an offline RL model integrating ISC into a standard Transformer, and demonstrate its effectiveness on benchmark tasks, particularly in settings requiring temporal sensitivity.

Introduces Incremental Signature Contribution (ISC), a novel method to decompose path signatures into temporally ordered sequences for improved temporal sensitivity in sequential modeling tasks.

Ziyi Zhao, Qingchuan Li, Yuxuan Xu2602.11805

World Models & PlanningRobotics & Embodied AI

2d ago

Affordance-Graphed Task Worlds: Self-Evolving Task Generation for Scalable Embodied Learning

The paper introduces Affordance-Graphed Task Worlds (AGT-World), a framework that automatically generates interactive simulated environments and robot task policies from real-world observations by formalizing the task space as a structured graph. This graph-based approach allows for hierarchical decomposition of complex goals into atomic primitives, addressing the limitations of random proposal or static replication methods. The authors further incorporate a self-evolution mechanism with hybrid feedback, combining Vision-Language Model reasoning and geometric verification, to refine policies.

Introduces a self-evolving framework for generating simulated task environments and robot policies by structuring the task space as an affordance graph and using hybrid feedback for policy refinement.

Xiang Liu, Guocai Yao2602.12065

World Models & PlanningRobotics & Embodied AIData Curation & Synthetic Data

2d ago

Effective Task Planning with Missing Objects using Learning-Informed Object Search

This paper introduces a task planning framework that integrates Learning-Informed Object Search (LIOS) actions into high-level planning to address scenarios with missing objects. The framework models LIOS actions as deterministic, leveraging model-based calculations to estimate their cost and interleave search and execution steps. The approach demonstrates effective task planning with uncertainty, outperforming both non-learned and learned baselines in simulated ProcTHOR environments and real-world experiments involving retrieval and meal preparation tasks.

Introduces a novel planning framework that integrates learning-informed object search (LIOS) actions into task planning, enabling effective handling of missing objects by interleaving search and execution.

Raihan Islam Arnob, Max Merlin, Abhishek Paudel +32602.11468

World Models & PlanningTool Use & AgentsRobotics & Embodied AI

2d ago

VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

This paper introduces VLAW, an iterative algorithm for co-improving vision-language-action (VLA) policies and action-conditioned video generation world models using real-world rollouts. VLAW leverages real-world data to refine the world model, which is then used to generate synthetic data for further policy improvement, addressing the limitations of world models trained solely on demonstration datasets. Experiments on a real robot demonstrate a 39.2% absolute improvement in success rate over the base policy, highlighting the effectiveness of the iterative co-improvement strategy.

Introduces an iterative co-improvement algorithm, VLAW, that refines both a vision-language-action policy and an action-conditioned video generation world model through interleaved real-world data collection and synthetic data generation.

Tony Lee, L. Shi, Jianyu Chen +22602.12063

Multimodal ModelsWorld Models & PlanningRobotics & Embodied AI

2d ago

Towards Sustainable Investment Policies Informed by Opponent Shaping

This paper analyzes the InvestESG multi-agent simulation to characterize conditions leading to intertemporal social dilemmas where individual incentives conflict with collective welfare. It then applies Advantage Alignment, an opponent shaping algorithm, to influence agent learning within InvestESG, demonstrating its ability to systematically favor socially beneficial equilibria. The work provides theoretical justification for why Advantage Alignment promotes cooperation and shows that shaping agent learning can improve outcomes related to sustainability goals.

Demonstrates that Advantage Alignment can effectively shape agent learning in the InvestESG environment to promote socially beneficial equilibria and overcome intertemporal social dilemmas.

J. Duque, Razvan Ciuca, Ayoub Echchahed +22602.11829

World Models & PlanningConstitutional AI & AI Ethics

2d ago

3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting

The paper introduces 3DGSNav, a zero-shot object navigation (ZSON) framework that leverages 3D Gaussian Splatting (3DGS) as persistent memory for vision-language models (VLMs) to improve spatial reasoning. 3DGSNav actively constructs a 3DGS representation of the environment and uses trajectory-guided free-viewpoint rendering to generate frontier-aware first-person views, which are then combined with structured visual prompts and Chain-of-Thought prompting to enhance VLM reasoning. Experiments on multiple benchmarks and a quadruped robot show that 3DGSNav achieves competitive performance compared to existing methods.

Introduces a novel zero-shot object navigation framework that integrates 3D Gaussian Splatting as persistent memory for vision-language models, enabling trajectory-guided free-viewpoint rendering and enhanced spatial reasoning.

Wancai Zheng, Xianlong Lu, Linlin Ou +12602.12159

Multimodal ModelsRobotics & Embodied AIWorld Models & Planning

2d ago

Intelligent AI Delegation

The paper introduces a framework for intelligent AI delegation, enabling AI agents to decompose complex tasks and delegate sub-components to other AI agents or humans. This framework addresses limitations in current task decomposition methods by incorporating elements like authority transfer, accountability, and trust-building. The authors propose an adaptive approach applicable to both AI and human agents within complex delegation networks, contributing to the development of protocols for agentic systems.

Proposes a novel adaptive framework for intelligent AI delegation that incorporates key elements of human delegation such as authority transfer, accountability, and trust.

Nenad Tomavsev, Matija Franklin, Simon Osindero2602.11865

Tool Use & AgentsWorld Models & Planning

2d ago

V-SHiNE: A Virtual Smart Home Framework for Explainability Evaluation

The paper introduces V-SHiNE, a browser-based virtual smart home environment designed to facilitate the evaluation of explainable AI (XAI) methods in the context of smart home automation. V-SHiNE enables researchers to configure realistic smart home environments, simulate user behaviors, integrate custom explanation engines, and log user interactions. A user study with 159 participants demonstrates the framework's utility for assessing the impact and quality of different explanation strategies.

Introduces V-SHiNE, a novel browser-based simulation framework, to enable scalable and reproducible evaluation of XAI methods within virtual smart home environments.

Mersedeh Sadeghi, Simon Scholz, Max Unterbusch +12602.11775

Eval Frameworks & BenchmarksWorld Models & PlanningInterpretability & Mechanistic Interp

2d ago

Multi UAVs Preflight Planning in a Shared and Dynamic Airspace

The paper introduces DTAPP-IICR, a Delivery-Time Aware Prioritized Planning method with Incremental and Iterative Conflict Resolution, for preflight planning of large UAV fleets in dynamic airspaces with temporal No-Fly Zones and heterogeneous vehicle profiles. DTAPP-IICR uses a novel 4D single-agent planner (SFIPP-ST) to generate roundtrip trajectories while enforcing temporal NFZs and modeling inter-agent conflicts as soft constraints, followed by a Large Neighborhood Search guided by a geometric conflict graph. Experiments on benchmarks with up to 1,000 UAVs demonstrate near-100% success and up to 50% runtime reduction compared to batch Enhanced Conflict-Based Search, showcasing its scalability and practicality for dense urban airspace.

Introduces DTAPP-IICR, a scalable and practical preflight planning method for large UAV fleets that integrates delivery-time awareness, prioritized planning, and iterative conflict resolution within dynamic airspaces.

Amath Sow, Mauricio Rodriguez Cesen, Fabiola Martins Campos de Oliveira +52602.12055

World Models & PlanningRobotics & Embodied AI

2d ago

JEPA-VLA: Video Predictive Embedding is Needed for VLA Models

The paper identifies limitations in current Vision-Language-Action (VLA) models stemming from inadequate visual representations learned through language-image contrastive learning or image-based self-supervised learning. It proposes JEPA-VLA, a method that integrates video predictive embeddings (specifically V-JEPA 2) into VLAs to improve environment understanding and policy priors. Experiments on benchmarks like LIBERO and real-robot tasks demonstrate that JEPA-VLA significantly improves performance by leveraging the ability of video predictive embeddings to encode task-relevant temporal dynamics.

Introduces JEPA-VLA, a novel approach that adaptively integrates video predictive embeddings into existing VLAs to enhance environment understanding and policy priors.

Shangchen Miao, Ningya Feng, Jialong Wu +32602.11832

Multimodal ModelsRobotics & Embodied AIWorld Models & PlanningComputer Vision

2d ago

LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion

The paper introduces LDA-1B, a robot foundation model that scales to 1B parameters by learning dynamics, policy, and visual forecasting from a new 30k-hour embodied interaction dataset (EI-30k) comprising diverse human and robot trajectories. LDA-1B leverages a structured DINO latent space for dynamics prediction to avoid pixel-space modeling and employs a multi-modal diffusion transformer to handle asynchronous vision and action streams. Experimental results demonstrate that LDA-1B outperforms existing methods on contact-rich, dexterous, and long-horizon tasks, while also enabling data-efficient fine-tuning by effectively utilizing low-quality trajectories.

Introduces a scalable robot foundation model, LDA-1B, capable of learning from diverse embodied data by predicting in a structured latent space and employing a multi-modal diffusion transformer.

Jiangran Lyu, Xuheng Zhang, Yusen Feng +92602.12215

Robotics & Embodied AIWorld Models & PlanningData Curation & Synthetic Data

2d ago

Code2Worlds: Empowering Coding LLMs for 4D World Generation

The paper introduces Code2Worlds, a framework for generating 4D dynamic scenes by formulating the task as language-to-simulation code generation. It addresses the challenges of multi-scale context entanglement and the semantic-physical execution gap by using a dual-stream architecture for disentangled object and environment generation, combined with a physics-aware closed-loop mechanism involving a PostProcess Agent and VLM-Motion Critic. Experiments on the Code4D benchmark demonstrate that Code2Worlds significantly outperforms existing methods in scene generation score (SGS) and richness, while also generating more physically plausible dynamics.

Introduces a novel framework, Code2Worlds, that leverages coding LLMs to generate physically plausible 4D dynamic scenes through a dual-stream architecture and physics-aware closed-loop refinement.

Yi Zhang, Yunshuang Wang2602.11757

Code Generation & Program SynthesisWorld Models & PlanningMultimodal Models

2d ago

HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model

The paper introduces HAIC, a framework for humanoid robots to interact with underactuated objects having independent dynamics, addressing limitations of prior HOI methods focused on rigidly coupled objects. HAIC uses a dynamics predictor to estimate high-order object states from proprioceptive history, projecting these onto geometric priors to create a dynamic occupancy map for collision avoidance and contact affordance inference. Through asymmetric fine-tuning of a world model, HAIC achieves robust performance on agile manipulation tasks like skateboarding and cart pushing, as well as long-horizon multi-object tasks.

Introduces a dynamics predictor that estimates high-order object states from proprioceptive history and projects them onto geometric priors to create a dynamic occupancy map for robust humanoid-object interaction.

Dongting Li, Hanyu Wu, Guoyao Zhang +42602.11758

Robotics & Embodied AIWorld Models & Planning

2d ago

Counterfactual Conditional Likelihood Rewards for Multiagent Exploration

This paper introduces Counterfactual Conditional Likelihood (CCL) rewards to address redundant exploration in multiagent systems by scoring each agent's unique contribution to team exploration. CCL rewards agents for observations that are informative with respect to the joint exploration of the team, rather than solely for individual novelty. Experiments in continuous multiagent domains demonstrate that CCL accelerates learning in sparse reward environments requiring tight coordination.

Introduces Counterfactual Conditional Likelihood (CCL) rewards to incentivize efficient team exploration by rewarding agents based on their unique contribution to the team's joint exploration.

Ayhan Alp Aydeniz, R. Loftin, Kagan Tumer2602.11740

Tool Use & AgentsRobotics & Embodied AIWorld Models & Planning

2d ago

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

The paper introduces GigaBrain-0.5M*, a vision-language-action (VLA) model trained using world model-based reinforcement learning to improve multi-step action prediction. They leverage the spatiotemporal reasoning capabilities of video world models pre-trained on large video datasets to enhance VLA learning. By integrating world model-based reinforcement learning via RAMP (Reinforcement leArning via world Model-conditioned Policy), GigaBrain-0.5M* achieves significant performance gains (approximately 30%) over the RECAP baseline on complex manipulation tasks and demonstrates reliable long-horizon execution in real-world deployments.

Demonstrates that integrating world model-based reinforcement learning via RAMP into a VLA model significantly improves performance and long-horizon execution on complex manipulation tasks.

GigaBrain Team Boyuan Wang, Chaojun Ni, Hao Li +102602.12099

Multimodal ModelsWorld Models & PlanningRobotics & Embodied AI

2d ago

Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL

This paper introduces a method for learning structured latent representations in RL where distances reflect transition costs, providing a geometric interpretation of uncertainty without explicit probabilistic modeling. They achieve this with a multimodal latent transition model and inverse distance weighting for sensor fusion, enabling adaptive integration of multiple sensor modalities. Empirical validation on multimodal RL tasks demonstrates improved robustness to sensor noise, superior state estimation, and enhanced RL agent performance compared to baselines, eliminating the need for noise augmentation.

Introduces a novel metric space formulation for state estimation in RL that learns a transition-aware latent representation, enabling a geometric interpretation of uncertainty and adaptive sensor fusion.

Alfredo Reichlin, Adriano Pacciarelli, Danica Kragic +12602.12087

Robotics & Embodied AIMultimodal ModelsWorld Models & Planning

2d ago

RF-Modulated Adaptive Communication Improves Multi-Agent Robotic Exploration

This paper introduces Adaptive-RF Transmission (ART), a communication-aware planning algorithm for multi-agent robotic exploration that modulates transmission location based on signal strength and data payload size. ART aims to improve coordination and efficiency in communication-limited environments by enabling heterogeneous robot teams to share information without excessive backtracking. Simulation results across cave-inspired environments show that ART and its extension, ART-SST, outperform existing strategies, achieving significant reductions in distance traveled and exploration time.

Introduces a novel communication-aware planning algorithm, Adaptive-RF Transmission (ART), that dynamically adjusts transmission location based on signal strength and data payload size for efficient multi-agent robotic exploration.

Lorin Achey, Breanne Crockett, Christoffer Heckman +12602.12074

Robotics & Embodied AITool Use & AgentsWorld Models & Planning

2d ago

Adaptive-Horizon Conflict-Based Search for Closed-Loop Multi-Agent Path Finding

This paper introduces Adaptive-Horizon Conflict-Based Search (ACCBS), a closed-loop multi-agent path finding algorithm that addresses the limitations of open-loop planners and closed-loop heuristics in MAPF. ACCBS employs a finite-horizon CBS variant with a horizon-changing mechanism inspired by iterative deepening MPC, dynamically adjusting the planning horizon based on computational budget. The algorithm reuses a single constraint tree to enable seamless transitions between horizons, achieving anytime behavior and asymptotic optimality.

Introduces ACCBS, a novel closed-loop MAPF algorithm that combines finite-horizon planning with dynamic horizon adjustment for improved robustness and performance guarantees.

Jiarui Li, Federico Pecora, Runyu Zhang +12602.12024

Robotics & Embodied AIWorld Models & Planning

2d ago

The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics

The paper investigates whether neural world models truly learn physical laws or rely on statistical shortcuts, particularly under out-of-distribution shifts. They introduce PhyIP, a non-invasive evaluation protocol that assesses the linear decodability of physical quantities from frozen latent representations, contrasting it with adaptation-based methods. Their results show that when self-supervised learning achieves low error, latent physical structures are linearly accessible and robust to OOD shifts, while adaptation-based evaluations can collapse this structure, suggesting that non-invasive probes are more accurate for evaluating physical world models.

Introduces PhyIP, a non-invasive evaluation protocol, to accurately assess the linear accessibility of physical quantities in frozen latent representations of world models, demonstrating its superiority over adaptation-based methods.

Christian Internò, Jumpei Yamaguchi, Loren K. Amdahl-Culleton +32602.12218

World Models & PlanningInterpretability & Mechanistic InterpEval Frameworks & Benchmarks

2d ago

Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces

This paper connects Joint-Embedding Predictive Architectures (JEPAs) with Quasimetric Reinforcement Learning (QRL) by focusing on a specific class of JEPA energy functions: intrinsic (least-action) energies defined as infima of accumulated local effort. It demonstrates that under closure and additivity assumptions, intrinsic energies are quasimetrics, aligning JEPAs trained on these energies with the quasimetric value functions used in QRL for goal-reaching control. The work highlights the structural mismatch between symmetric energies and one-way reachability, advocating for asymmetric (quasimetric) energies in scenarios where directionality is important.

Establishes a formal connection between intrinsic energy functions in Joint-Embedding Predictive Architectures and quasimetrics used in Quasimetric Reinforcement Learning.

Anthony Kobanda, Waris Radji2602.12245

World Models & PlanningRobotics & Embodied AIArchitecture Design (Transformers, SSMs, MoE)

2d ago

Where Bits Matter in World Model Planning: A Paired Mixed-Bit Study for Efficient Spatial Reasoning

This paper investigates the impact of bit allocation strategies on the performance of world model-based planning, specifically using DINO-WM on the Wall planning task. The study compares uniform, mixed, asymmetric, and layerwise quantization schemes under different planner budgets to identify critical bitwidth thresholds. Results show a sensitivity to bit allocation in the 4-bit regime, with encoder precision being particularly important for maintaining performance, suggesting the need for module-aware quantization policies.

Demonstrates that, in low-bit world model planning, performance is sensitive to bit allocation, particularly in the encoder, and identifies a critical 4-bit transition regime where module-aware quantization becomes crucial.

Suraj Ranganath, Anish Patnaik, Vaishak Menon2602.11882

World Models & PlanningTraining Efficiency & OptimizationInference & Quantization

2d ago

Multi Graph Search for High-Dimensional Robot Motion Planning

The paper introduces Multi-Graph Search (MGS), a novel search-based motion planning algorithm for high-dimensional robotic systems that addresses the limitations of existing methods in terms of motion consistency and computational cost. MGS maintains and expands multiple implicit graphs, focusing exploration on promising regions and merging disconnected subgraphs as needed. The authors prove completeness and bounded suboptimality of MGS and demonstrate its effectiveness on manipulation and mobile manipulation tasks.

Introduces Multi-Graph Search (MGS), a complete and bounded-suboptimal motion planning algorithm that generalizes unidirectional and bidirectional search to a multi-graph setting for improved efficiency in high-dimensional spaces.

Itamar Mishani, Maxim Likhachev2602.12096

Robotics & Embodied AIWorld Models & Planning

2d ago

LLM-Driven 3D Scene Generation of Agricultural Simulation Environments

This paper introduces a modular multi-LLM pipeline for generating agricultural simulation environments in Unreal Engine from natural language prompts, addressing limitations of existing LLM-based 3D scene generation approaches. The pipeline incorporates 3D asset retrieval, domain knowledge injection, and code generation, enhanced by LLM optimization techniques like few-shot prompting, RAG, and finetuning. Experiments demonstrate the system's effectiveness in creating realistic and semantically accurate agricultural environments, offering significant time savings compared to manual design.

Introduces a modular, multi-LLM pipeline that integrates 3D asset retrieval, domain knowledge injection, and code generation to create realistic agricultural simulation environments from natural language prompts.

Arafa Yoncalik, W. Jansen, Nico Huebel +22602.11706

World Models & PlanningCode Generation & Program SynthesisMultimodal Models

2d ago

Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks

The paper introduces Any House Any Task (AHAT), a household task planner designed for long-horizon planning in large environments with ambiguous instructions. AHAT trains an LLM to map task instructions and textual scene graphs into PDDL subgoals, which are then solved using symbolic reasoning for optimal plan generation. To improve decomposition of complex intentions, they propose TGPO, a reinforcement learning algorithm integrating external correction of intermediate reasoning traces into Group Relative Policy Optimization (GRPO), leading to significant performance gains.

Introduces a novel household task planner, AHAT, that leverages LLMs and symbolic reasoning with a new reinforcement learning algorithm, TGPO, to achieve superior long-horizon planning performance in complex, ambiguous environments.

Zhihong Liu, Cewu Lu, Panpan Cai2602.12244

Tool Use & AgentsWorld Models & PlanningRobotics & Embodied AI

2d ago

ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles

The paper introduces ReaDy-Go, a real-to-sim pipeline that generates photorealistic dynamic scenarios using 3D Gaussian Splatting (GS) to train visual navigation policies robust to the sim-to-real gap and moving obstacles. ReaDy-Go combines a static GS scene with dynamic human GS avatars driven by plausible motions derived from 2D trajectories, and uses a robot expert planner designed for dynamic GS representations to generate navigation datasets. Experiments demonstrate that policies trained with ReaDy-Go outperform baselines in both simulation and real-world environments, exhibiting improved navigation performance and generalization.

Introduces a real-to-sim dynamic 3D Gaussian Splatting simulation pipeline, ReaDy-Go, for training visual navigation policies robust to the sim-to-real gap and moving obstacles.

Dabin Kim, Seungwoo Jung2602.11575

World Models & PlanningRobotics & Embodied AIComputer Vision

2d ago

WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains

The paper introduces WorldTree, a novel framework for dynamic scene reconstruction from monocular video that addresses limitations in spatiotemporal decomposition. WorldTree uses a Temporal Partition Tree (TPT) for coarse-to-fine temporal optimization and Spatial Ancestral Chains (SAC) for hierarchical spatial dynamics. Experiments demonstrate that WorldTree achieves state-of-the-art performance, improving LPIPS by 8.26% on NVIDIA-LS and mLPIPS by 9.09% on DyCheck compared to existing methods.

Introduces a unified spatiotemporal decomposition framework, WorldTree, for dynamic scene reconstruction from monocular video, using Temporal Partition Trees and Spatial Ancestral Chains.

Qisen Wang, Yifan Zhao, Jia Li2602.11845

Computer VisionWorld Models & Planning

2d ago

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use

This paper introduces INTENT, a novel inference-time planning framework for budget-constrained, tool-augmented LLMs that addresses the challenge of costly tool use in sequential decision-making. INTENT uses an intention-aware hierarchical world model to anticipate future tool usage and risk-calibrated costs, enabling more effective online decision-making. Experiments on a cost-augmented StableToolBench demonstrate that INTENT achieves superior task success while strictly adhering to budget constraints, even under dynamic market conditions.

Introduces INTENT, an inference-time planning framework that leverages an intention-aware hierarchical world model for budget-constrained tool use in LLMs.

Nan An, Qi Qi2602.11541

Tool Use & AgentsWorld Models & PlanningReasoning & Chain-of-Thought

2d ago

Amortized Molecular Optimization via Group Relative Policy Optimization

The paper introduces GRXForm, a Graph Transformer model for amortized molecular optimization that sequentially adds atoms and bonds to a molecule. To improve generalization, the authors identify and address the high variance in rewards caused by heterogeneous starting structures by using Group Relative Policy Optimization (GRPO). GRXForm demonstrates strong generalization to out-of-distribution molecular scaffolds, achieving competitive performance with instance optimizers in multi-objective optimization without requiring inference-time oracle calls or refinement.

Introduces Group Relative Policy Optimization (GRPO) to normalize rewards relative to the starting structure, thereby mitigating variance and improving generalization in amortized molecular optimization.

Muhammad bin Javaid, Hasham Hussain, Ashima Khanna +52602.12162

Scientific Discovery & Drug DesignTraining Efficiency & OptimizationWorld Models & Planning

2d ago

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents

The paper introduces Trajectory-Search Rollouts (TSR), a training-time method that uses lightweight tree search to improve the quality of rollouts in multi-turn reinforcement learning for LLM agents. TSR selects high-scoring actions at each turn during rollout generation using task-specific feedback, leading to more informative training trajectories. Experiments on Sokoban, FrozenLake, and WebShop demonstrate that TSR, when combined with PPO and GRPO, achieves up to 15% performance gains and more stable learning.

Introduces a novel training-time trajectory generation method, TSR, that leverages lightweight tree search to construct higher-quality rollouts for multi-turn RL of LLM agents.

Aladin Djuhera, S. Kadhe, Holger Boche2602.11767

RLHF & Preference LearningTool Use & AgentsWorld Models & Planning

2d ago

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

The paper introduces Gaia2, a benchmark designed to evaluate LLM agents in dynamic, asynchronous environments where the environment evolves independently of agent actions. Gaia2 features scenarios requiring agents to handle temporal constraints, adapt to noisy events, resolve ambiguity, and collaborate, coupled with write-action verifiers for fine-grained evaluation. Evaluations of state-of-the-art models reveal trade-offs between reasoning, efficiency, and robustness, with GPT-5 achieving the highest overall score (42% pass@1) but struggling with time-sensitive tasks.

Introduces Gaia2, a novel benchmark for evaluating LLM agents in realistic, asynchronous environments with action-level verification.

Romain Froger, Pierre Andrews, Matteo Bettini +212602.11964

Eval Frameworks & BenchmarksTool Use & AgentsWorld Models & Planning

Feb 11, 2026

3d ago

RISE: Self-Improving Robot Policy with Compositional World Model

The paper introduces RISE, a robotic reinforcement learning framework that leverages a Compositional World Model to improve policy learning in simulation. This world model predicts multi-view futures using a controllable dynamics model and evaluates outcomes with a progress value model, generating advantages for policy improvement. By training the policy entirely in a closed-loop, self-improving imaginary environment, RISE achieves significant performance gains over existing methods in dynamic manipulation tasks.

Introduces a compositional world model architecture for robotic reinforcement learning that separates state and value representation, enabling tailored architectures and objectives for each.

Jiazhi Yang, Jinwei Li, Wencong Zhang +72602.11075

World Models & PlanningRobotics & Embodied AIMultimodal Models

3d ago

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

The authors introduce MolmoSpaces, a large-scale, open-source ecosystem comprising over 230k diverse indoor environments and 130k richly annotated object assets, designed to address the limitations of existing robot benchmarks in capturing the long tail of real-world scenarios. This simulator-agnostic ecosystem supports a wide range of embodied tasks, including navigation, manipulation, and long-horizon planning, and includes MolmoSpaces-Bench, a benchmark suite of 8 tasks. Experiments demonstrate strong sim-to-real correlation and highlight sensitivities to factors like prompt phrasing and camera occlusion, establishing MolmoSpaces as a valuable resource for scalable robot learning research.

Introduces a large-scale, simulator-agnostic, and open-source ecosystem for robot learning, featuring diverse indoor environments and richly annotated objects, to facilitate more robust and generalizable robot policies.

Wilbert Pumacay, Omar Rayyan, Max Argus +192602.11337

Robotics & Embodied AIEval Frameworks & BenchmarksWorld Models & Planning

Feb 9, 2026

5d ago

$\chi_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

This paper introduces $\chi_{0}$, a resource-efficient framework designed to enhance the robustness of long-horizon robotic manipulation by addressing distributional inconsistencies between demonstration, policy learning, and execution. The framework incorporates model arithmetic for merging diverse demonstration distributions, stage-aware advantage estimation for stable progress signals, and train-deploy alignment techniques using augmentation and DAgger. Experimental results demonstrate that $\chi_{0}$ significantly outperforms existing methods in garment manipulation tasks, achieving a 250% improvement in success rate compared to $\pi_{0.5}$ with limited data and compute resources.

This paper introduces a novel framework, $\chi_{0}$, that tames distributional inconsistencies in robotic manipulation through model arithmetic, stage advantage estimation, and train-deploy alignment, enabling robust long-horizon task execution with limited resources.

Chonghao Sima, Hai Zhang, Haoguang Mai +122602.09021

Robotics & Embodied AIWorld Models & PlanningTraining Efficiency & Optimization

5d ago

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

The paper introduces Dreaming in Code (DiCode), a framework that uses foundation models to generate executable environment code variations for curriculum learning in open-ended environments. DiCode addresses the challenge of discovering learnable sequences of experiences in complex environments by "dreaming" code-level variations of the world to scaffold learning. Experiments in the Craftax environment demonstrate that DiCode enables agents to acquire long-horizon skills, achieving a 16% improvement in mean return over the strongest baseline and success on late-game combat tasks where prior methods fail.

Introduces DiCode, a novel framework leveraging foundation models to synthesize executable environment code for curriculum learning, enabling agents to acquire complex skills in open-ended environments.

Konstantinos Mitsides, Maxence Faldor, Antoine Cully2602.08194

Code Generation & Program SynthesisTool Use & AgentsWorld Models & Planning

Feb 5, 2026

1w ago

HiCrowd: Hierarchical Crowd Flow Alignment for Dense Human Environments

The paper introduces HiCrowd, a hierarchical framework combining reinforcement learning (RL) and model predictive control (MPC) to improve robot navigation in dense crowds. A high-level RL policy selects a "follow point" to align the robot with compatible crowd flows, while a low-level MPC tracks this point with short-horizon planning for safety. Experiments on real-world and synthetic datasets demonstrate that HiCrowd outperforms reactive and learning-based baselines in navigation efficiency, safety, and reducing freezing behaviors.

Introduces a hierarchical RL-MPC framework (HiCrowd) that leverages pedestrian motion as guidance for robot navigation in dense crowds, improving efficiency and safety compared to existing methods.

Yufei Zhu, Shih-Min Yang, Martin Magnusson +12602.05608

Robotics & Embodied AIWorld Models & PlanningRLHF & Preference Learning

Feb 4, 2026

1w ago

Dual Mind World Model Inspired Network Digital Twin for Access Scheduling

This paper introduces a Digital Twin-enabled access scheduling framework inspired by the Dual Mind World Model (DMWM) architecture for intelligent network control in dynamic environments. The DMWM framework combines short-horizon predictive planning with symbolic model-based rollout, allowing the scheduler to anticipate network states and optimize transmission decisions. Experimental results demonstrate that DMWM outperforms traditional heuristics and reinforcement learning baselines in bursty, interference-limited, and deadline-sensitive environments, while also improving interpretability and sample efficiency.

Introduces a novel Dual Mind World Model (DMWM) architecture for network digital twins that combines predictive planning with symbolic model-based rollout to improve access scheduling.

Hrishikesh Dutta, Roberto Minerva, Noël Crespi2602.04566

World Models & PlanningRobotics & Embodied AI

Feb 1, 2026

Automotive Engineering Research Institute1w ago

End-to-End Autonomous Driving: From Classic Paradigm to Large Model Empowerment—A Comprehensive Survey

This survey paper reviews the evolution of end-to-end autonomous driving, contrasting traditional modular systems with emerging end-to-end approaches based on imitation learning, reinforcement learning, and foundation models. It categorizes recent advances in planning, reasoning, data generation, and scene understanding within the context of large language models (LLMs) and vision-language models (VLMs). The paper also addresses challenges like multimodal fusion complexity and safety risks, and outlines future research directions including world models and modular foundation model architectures.

Systematically categorizes and synthesizes recent advancements in end-to-end autonomous driving, particularly focusing on the integration of large language models and vision-language models for planning, reasoning, data generation, and scene understanding.

Wei Dong, Sikai Lu, Xinhe Chen +6

Robotics & Embodied AIComputer VisionWorld Models & Planning

Jan 31, 2026

2w ago

NetWorld: Communication-Based Diffusion World Model for Multi-Agent Reinforcement Learning in Wireless Networks

The paper introduces NetWorld, a Communication-based Diffusion World Model, to improve few-shot generalization across heterogeneous MARL tasks in wireless networks. NetWorld pre-trains a classifier-guided conditional diffusion model on multi-task offline datasets and performs trajectory planning within the learned world model, avoiding online interaction. The model incorporates a mean-field communication mechanism to address non-stationarity and promote coordination.

Introduces a communication-based diffusion world model (NetWorld) that enables few-shot generalization across heterogeneous MARL tasks in wireless networks by learning from offline data and planning within the learned environment.

Kechen Meng, Rongpeng Li, Yansha Deng +22602.00558

World Models & PlanningTraining Efficiency & OptimizationDistributed Systems & HardwareRobotics & Embodied AI

Jan 28, 2026

2w ago

Dynamic Weighted Spherical Particle Swarm Optimization for UAV Path Planning in Complex Environments

This paper introduces a dynamic weighted spherical particle swarm optimization (DW-SPSO) algorithm to address the challenges of UAV path planning in complex environments, specifically the issues of high-dimensional search spaces, local optima, and real-world constraints. DW-SPSO employs a dual Sigmoid-based adaptive weight adjustment mechanism to balance exploration and exploitation, along with lens-based opposition learning to enhance search flexibility. Experiments on real digital elevation models demonstrate that DW-SPSO outperforms state-of-the-art PSO variants regarding path safety, smoothness, and convergence speed, validated by the Wilcoxon signed-rank test.

Introduces a dynamic weighted spherical particle swarm optimization (DW-SPSO) algorithm featuring a dual Sigmoid-based adaptive weight adjustment and lens-based opposition learning to improve UAV path planning.

Rui Yao, Yuye Wang, Fei Yu +2

World Models & PlanningRobotics & Embodied AITraining Efficiency & Optimization

Jan 23, 2026

Longmen Laboratory3w ago

A Bionic Memory Replay Model Based on Hippocampal-Prefrontal Collaboration: Design, Mechanism, and Robot Navigation Verification

This paper introduces a novel memory replay model (MPR) inspired by hippocampal-prefrontal collaboration in the human brain, aiming to improve robot navigation in complex environments. The model integrates meta-reinforcement learning with model-based policy optimization, creating a closed-loop interaction between hippocampal planning replay and prefrontal policy evaluation networks. Experimental results demonstrate that MPR achieves a 14% reduction in navigation path length and a 10% higher success rate compared to other models, along with significantly improved learning efficiency.

Proposes a bionic memory replay model (MPR) that unifies dynamic programming and strategy optimization by dynamically coupling experience playback and prospective simulation, thereby enhancing learning efficiency and adaptability in complex environments.

Xudong Lv, Dongshu Wang

World Models & PlanningRobotics & Embodied AI

Jan 21, 2026

Skolkovo Institute of Science3w ago

HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV

The paper introduces HumanDiffusion, an image-conditioned diffusion-based trajectory planner for UAVs in search and rescue, enabling navigation towards humans detected via YOLO-V3. By predicting trajectories directly in pixel space from RGB images, the system avoids reliance on pre-existing maps or computationally expensive planning. Experiments in simulation and real-world scenarios demonstrate a mean squared error of 0.02 in pixel-space trajectory reconstruction and an 80% mission success rate, highlighting the method's effectiveness for human-aware navigation.

Introduces a novel image-conditioned diffusion model, HumanDiffusion, to generate human-aware UAV trajectories directly from RGB images for search and rescue tasks.

Faryal Batool, Iana Zhura, Valerii Serpiva +42601.14973

Robotics & Embodied AIComputer VisionWorld Models & Planning

Jan 20, 2026

3w ago

"Just in Time"World Modeling Supports Human Planning and Reasoning

The paper introduces a "Just-in-Time" (JiT) framework for simulation-based reasoning, positing that humans construct simplified environment representations online to overcome computational limitations. The JiT model interleaves simulation, visual search, and representation modification, using ongoing simulations to guide visual attention and identify relevant objects for encoding. Empirical validation in grid-world planning and physical reasoning tasks demonstrates the model's ability to make accurate predictions while encoding only a small subset of objects, supporting the idea that humans create reduced representations for efficient mental simulation.

Introduces and validates a "Just-in-Time" framework that interleaves simulation, visual search, and representation modification to construct simplified environment representations for efficient human reasoning.

Tony Chen, Samuel J. Cheyette, Kelsey Allen +22601.14514

Reasoning & Chain-of-ThoughtWorld Models & Planning

Jan 19, 2026

3w ago

Active Inference-Driven World Modeling for Adaptive UAV Swarm Trajectory Design

This paper introduces an Active Inference framework for decentralized UAV swarm trajectory design, aiming to improve adaptability and safety in dynamic environments. A hierarchical World Model is trained on expert trajectories generated by a Genetic Algorithm with Repulsion Forces (GA-RF) to represent swarm behavior at different levels of abstraction. The UAVs then use active inference to select actions that minimize the divergence between their beliefs and the World Model's predictions.

Introduces an active inference-based approach for UAV swarm trajectory design that learns from expert trajectories and enables adaptive behavior in dynamic environments.

Kaleem Arshid, Ali Krayani, L. Marcenaro +22601.12939

World Models & PlanningRobotics & Embodied AITool Use & Agents

Jan 14, 2026

DreamWaQ++: Obstacle-Aware Quadrupedal Locomotion With Resilient Multimodal Reinforcement Learning

This paper introduces DreamWaQ++, a multimodal reinforcement learning framework that fuses proprioceptive and exteroceptive information for robust quadrupedal locomotion in complex environments. The approach trains a controller capable of agile navigation across challenging terrains like rough ground, steep slopes, and high stairs, while also exhibiting resilience to out-of-distribution scenarios. Key to the success is the fusion of proprioceptive feedback with exteroceptive data to enable obstacle avoidance and adaptive gait planning.

Introduces a resilient multimodal reinforcement learning framework, DreamWaQ++, that effectively fuses proprioception and exteroception for robust quadrupedal locomotion in challenging environments.

I. M. A. Nahrendra, Byeong-Uk Yu, Mi-Suk Oh +5

Robotics & Embodied AIMultimodal ModelsWorld Models & Planning

Jan 13, 2026

SLBJan 13, 2026

Accelerating Reservoir Simulation to Shorten the Field Development Lifecycle: Innovations in AI, Multiscale Physics, and Scalable Compute

This paper introduces a multifaceted approach to accelerate reservoir simulation by combining advanced software techniques, AI/ML-based enhancements, and scalable hardware solutions. The software innovations include multiscale SFI methods for black-oil models and AI/ML for phase labeling and saturation pressure prediction in compositional models, while hardware strategies involve CPU, GPU, and cloud-based execution. Applied to real-world, high-resolution reservoirs, the combined approach achieved up to 4x runtime reduction using multiscale SFI, up to 2x speedups with full-GPU execution, and up to 4x improvements with AI/ML, significantly compressing field development planning timelines.

Demonstrates a holistic approach to reservoir simulation acceleration by integrating multiscale physics, AI/ML-driven enhancements, and scalable compute infrastructure, resulting in significant runtime reductions and faster field development planning.

S. Ramatullayev, S. Tahir, K. Mansour +3

World Models & PlanningDistributed Systems & HardwareTraining Efficiency & Optimization

Jan 12, 2026

School of Mechanical EngineeringJan 12, 2026

A Dynamic Path Planning and Tracking Control of Autonomous Vehicles: An Integrated Approach Using Improved A*, Fuzzy DWA, and Fuzzy PID

The paper introduces an integrated path planning and trajectory tracking system for autonomous vehicles, combining an improved A* algorithm for global planning, a fuzzy dynamic window approach (DWA) for local planning, and a fuzzy PID controller for trajectory tracking. The improved A* enhances search efficiency and path quality through an enhanced heuristic function, redundant node removal, and path smoothing. The fuzzy DWA enables smooth obstacle avoidance, and the fuzzy PID controller adaptively adjusts parameters for precise path following. Real-world experiments on a ROS-based vehicle demonstrate superior performance compared to conventional approaches in path planning efficiency, obstacle avoidance, and path smoothness.

Integrates an improved A*, fuzzy DWA, and fuzzy PID control into a cohesive system for autonomous vehicle navigation, demonstrating enhanced performance in real-world scenarios.

Hao Chen, Xiuyang Wang, Chongfeng Wei +2

World Models & PlanningRobotics & Embodied AI

Jan 12, 2026

LOONG: Online Time-Optimal Autonomous Flight for MAVs in Cluttered Environments

This paper introduces LOONG, a novel planning and control framework for time-optimal MAV flight in cluttered environments. The framework combines imitation learning to accelerate time allocation for polynomial trajectory generation with a time-optimal model predictive contouring control (MPCC) that incorporates safe flight corridor (SFC) constraints. Experimental results on a LiDAR-based MAV platform demonstrate superior aggressiveness and a peak speed of 18 m/s in real-world cluttered environments, showcasing the framework's robustness.

Introduces a time-optimal model predictive contouring control (MPCC) method with variable horizon steps and safe flight corridor (SFC) constraints for aggressive and safe MAV maneuvering.

Xin Guan, Fangguo Zhao, Qianyi Wang +32601.07434

World Models & PlanningRobotics & Embodied AI

Jan 6, 2026

In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior

The paper introduces SPICE, a Bayesian in-context reinforcement learning (ICRL) method that learns a prior over Q-values using a deep ensemble and updates it at test-time via Bayesian updates using in-context information. This approach addresses limitations of existing ICRL methods by enabling improvement beyond the training distribution and robustness to suboptimal training data. SPICE achieves regret-optimal behavior in stochastic bandits and finite-horizon MDPs, even when pre-trained on suboptimal trajectories, and demonstrates superior empirical performance compared to existing ICRL and meta-RL methods on bandit and control benchmarks.

Proposes SPICE, a Bayesian ICRL method that fuses a learned Q-value prior with in-context information through Bayesian updates and employs an Upper-Confidence Bound rule for online inference to achieve regret-optimal behavior even with suboptimal pretraining data.

Anaïs Berkes, Vincent Taboga, Donna Vakalis +22601.03015

World Models & PlanningRobotics & Embodied AI

Lattice is designed for desktop

World Models & Planning

Keywords

Top Labs in This Topic

Recent Papers