March 11 – March 18, 2026

World Models & Planning - Weekly Roundup

100 papers published across 9 labs.

Selected Labs publishing this week

Tsinghua AI6 CMU ML4 NVIDIA4 Stanford HAI3 MIT CSAIL1

Top Papers

Mar 16, 2026

Seth Karten +342w ago·also Gwangju Institute of Science and Technology

The PokeAgent Challenge: Competitive and Long-Context Learning at Scale

Pokemon, not just a childhood game, emerges as a surprisingly effective benchmark for AI, revealing critical gaps in LLMs and RL agents that existing benchmarks miss.

Seth Karten, Jake Grigsby, Tersoo Upaa +32

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

2w ago

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Forget painstakingly crafting rewards and curricula – this new RL framework learns surprisingly dexterous manipulation skills just by resetting the simulator in diverse ways.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Mar 18, 2026

Lars Bartels +42w ago

Real-Time Online Learning for Model Predictive Control using a Spatio-Temporal Gaussian Process Approximation

Achieve real-time online learning for model predictive control with a novel spatio-temporal Gaussian Process approximation that maintains constant computational complexity.

Lars Bartels, Amon Lahr, Andrea Carron +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Tsinghua AI2w ago·also PKU

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

By iteratively reasoning over video snippets with a Chain-of-Thought, $\text{R}^2$VLM achieves state-of-the-art long-horizon task progress estimation without needing to process entire videos at once.

Yuelin Zhang, Sijie Cheng, Zongzhao Li +2

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.

Dusit Niyato, Changyuan Zhao, Jiawen Kang +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

All Papers (100)

Mar 18, 2026

Lars Bartels +42w ago

Real-Time Online Learning for Model Predictive Control using a Spatio-Temporal Gaussian Process Approximation

Achieve real-time online learning for model predictive control with a novel spatio-temporal Gaussian Process approximation that maintains constant computational complexity.

Lars Bartels, Amon Lahr, Andrea Carron +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Tsinghua AI2w ago·also PKU

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

By iteratively reasoning over video snippets with a Chain-of-Thought, $\text{R}^2$VLM achieves state-of-the-art long-horizon task progress estimation without needing to process entire videos at once.

Yuelin Zhang, Sijie Cheng, Zongzhao Li +2

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.

Dusit Niyato, Changyuan Zhao, Jiawen Kang +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Seongrae Noh +32w ago

Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

By treating 3D scene editing as goal-regressive planning rather than pure generation, Edit-As-Act achieves instruction fidelity, semantic consistency, and physical plausibility that existing methods miss.

Seongrae Noh, SeungWon Seo, Gyeong-Moon Park +1

Computer Vision Robotics & Embodied AI World Models & Planning

Abhijeet M. Kulkarni +42w ago

Proprioceptive-only State Estimation for Legged Robots with Set-Coverage Measurements of Learned Dynamics

Legged robots can navigate more reliably with noisy sensors thanks to a new state estimator that avoids Gaussian noise assumptions.

Abhijeet M. Kulkarni, Abhijeet M. Kulkarni, Ioannis Poulakakis +2

Robotics & Embodied AI World Models & Planning

Chaokang Jiang +32w ago

VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs

Achieve stable, real-time kilometer-scale autonomous driving simulations by generating vector-graph tiles incrementally using a novel diffusion flow approach.

Chaokang Jiang, Desen Zhou, Jiuming Liu +1

Computer Vision Robotics & Embodied AI World Models & Planning

Petros Ellinas +32w ago

Verification and Validation of Physics-Informed Surrogate Component Models for Dynamic Power-System Simulation

Seemingly accurate physics-informed surrogates can fail spectacularly when integrated into power system simulations, especially under stress, highlighting the need for rigorous in-simulator validation.

Petros Ellinas, Indrajit Chaudhuri, Johanna Vorwerk +1

Scientific Discovery & Drug Design World Models & Planning

Yang-Tian Sun +62w ago

Stereo World Model: Camera-Guided Stereo Video Generation

Generate consistent stereo videos directly from RGB data, bypassing depth estimation and monocular-to-stereo conversion, with StereoWorld's novel camera-aware attention mechanisms.

Yang-Tian Sun, Zehuan Huang, Yifan Niu +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision World Models & Planning

Jinyu Miao +122w ago

Physics-informed Deep Mixture-of-Koopmans Vehicle Dynamics Model with Dual-branch Encoder for Distributed Electric-drive Trucks

Representing highly nonlinear vehicle dynamics in a lifted linear space via Koopman operator theory enables state-of-the-art long-term state estimation for complex electric trucks.

Jinyu Miao, Pu Zhang, Rujun Yan +10

Robotics & Embodied AI World Models & Planning

Yaozhong Shi +72w ago

Large-Scale 3D Ground-Motion Synthesis with Physics-Inspired Latent Operator Flow Matching

Simulate earthquake ground motion 10,000x faster with a new latent operator flow matching method, opening the door to real-time risk assessment for critical infrastructure.

Yaozhong Shi, Grigorios Lavrentiadis, Konstantinos Tsalouchidis +5

Scientific Discovery & Drug Design World Models & Planning

Xinyang Gong +52w ago

ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling

Forget rigid physics engines, this badminton RL environment uses real player data to simulate realistic rallies and strategic gameplay.

Xinyang Gong, Bozhou Chen, Yunlong Lu +3

Robotics & Embodied AI Tool Use & Agents World Models & Planning

MIT CSAIL2w ago

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

Heuristic maritime routes lead to extreme fuel waste in nearly 5% of voyages, but this RL approach cuts that risk by almost 10x.

Aniruddha Bora, J. Chalfant, C. Chryssostomidis

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

CMU ML2w ago·also NII

RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.

Zhenhang Yuan, Shenghai Yuan, Lihua Xie

Robotics & Embodied AI Tool Use & Agents World Models & Planning

2w ago

P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

VLN agents can navigate more effectively by predicting their future states and proactively planning based on forecasted semantic map cues, rather than relying solely on historical context.

Tianfu Li, Tian Li, Wenbo Chen +4

Multimodal Models Robotics & Embodied AI World Models & Planning

Stanford HAI2w ago

Rapid Adaptation of Particle Dynamics for Generalized Deformable Object Mobile Manipulation

Encoding deformable object dynamics with particle positions unlocks sim-to-real transfer for manipulation tasks, achieving impressive real-world success rates.

Bohan Wu, Roberto Mart'in-Mart'in

Robotics & Embodied AI World Models & Planning

2w ago

SafeLand: Safe Autonomous Landing in Unknown Environments with Bayesian Semantic Mapping

Drones can now land safely in complex, unknown environments using only a camera, thanks to a new system that dynamically maps and reacts to surroundings in real-time.

Markus Gross, Andreas Greiner, Sai Bharadhwaj Matha +5

Computer Vision Robotics & Embodied AI World Models & Planning

NVIDIA2w ago

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Ditch fixed compute budgets: this new flow-matching method for robotic control adaptively allocates computation, speeding up simple tasks and focusing on complex ones.

Zunzhe Zhang, R. Huang, Runhan Huang +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Gaotian Wang +32w ago

ManiDreams: An Open-Source Library for Robust Object Manipulation via Uncertainty-aware Task-specific Intuitive Physics

ManiDreams lets robots handle real-world uncertainty in manipulation tasks without retraining, outperforming standard RL baselines under various perturbations.

Gaotian Wang, Kejia Ren, Andrew S. Morgan +1

Open-Source Models & Weights Robotics & Embodied AI World Models & Planning

Ruixiang Wang +52w ago

EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards

Robot world models can be significantly improved by directly rewarding them for generating videos that lead to physically plausible robot actions, even if the videos themselves contain visual artifacts.

Ruixiang Wang, Qingming Liu, Yueci Deng +3

Computer Vision Robotics & Embodied AI World Models & Planning

Adam Dai +62w ago

Full Stack Navigation, Mapping, and Planning for the Lunar Autonomy Challenge

A complete autonomy stack enables centimeter-level localization and mapping on the moon, even without GPS.

Adam Dai, Asta Wu, Keidai Iiyama +4

Computer Vision Robotics & Embodied AI World Models & Planning

Sinan Ibrahim +52w ago·also Research Center for Digital Engineering

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

Finally, a rigorous RL benchmark: generate environments with *provably* optimal policies, enabling controlled algorithm evaluation against ground truth.

Sinan Ibrahim, Grégoire Ouerdane, Hadi Salloum +3

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

2w ago

End-to-end data-driven prediction of urban airflow and pollutant dispersion

Accurately predict urban pollutant dispersion in real-time with a novel data-driven model that's orders of magnitude faster than traditional CFD.

Nishant Kumar, Franck Kerhervé, Lionel Agostini +1

Scientific Discovery & Drug Design World Models & Planning

Felix Schur2w ago

Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity

Demonstrator diversity unlocks the ability to learn latent actions and dynamics from offline RL data, even without explicit action labels.

Felix Schur

Robotics & Embodied AI World Models & Planning

Yusen Wu +22w ago

MALLES: A Multi-agent LLMs-based Economic Sandbox with Consumer Preference Alignment

LLMs can be economically aligned to real-world consumer preferences via post-training on transaction data, enabling more accurate and stable economic simulations.

Yusen Wu, Yiran Liu, Xiaotie Deng

Tool Use & Agents World Models & Planning

2w ago·also COWARobot Co. Ltd, Hohai

VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm

By cleverly turning novel view synthesis into a self-supervised inpainting problem, VisionNVS eliminates the need for ground truth images of novel views, outperforming LiDAR-dependent baselines.

Hongbo Lu, Chenghao He, Fan Liu +3

Computer Vision Robotics & Embodied AI World Models & Planning

DeepMind2w ago

Versatile Editing of Video Content, Actions, and Dynamics without Training

Forget finetuning: DynaEdit unlocks complex video edits like action modification and object insertion, all without training, using clever manipulation of pretrained text-to-video models.

Vladimir Kulikov, Roni Paiss, Andrey Voynov +3

Computer Vision Multimodal Models World Models & Planning

Computational Neuroscience Unit2w ago·also Ospedale Santa Lucia, Sheffield

Unified Policy Value Decomposition for Rapid Adaptation

Achieve zero-shot adaptation to new tasks in complex control environments by learning a shared low-dimensional goal embedding that unifies policy and value function representations.

Cristiano Capone, Luca Falorsi, Andrea Ciardiello +1

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Adam Dai +22w ago

Neural Radiance Maps for Extraterrestrial Navigation and Path Planning

NeRFs can now guide extraterrestrial rovers around unexpected obstacles, thanks to a novel planning framework that blends local observations with global terrain understanding.

Adam Dai, Shubh Gupta, Grace Gao

Computer Vision Robotics & Embodied AI World Models & Planning

Nicola J. Müller +42w ago

Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)

Q-value policies, traditionally outperformed by state-value policies in planning, can surpass them with the right regularization, offering a faster alternative for policy evaluation.

Nicola J. Müller, Moritz Oster, Isabel Valera +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Angen Ye +232w ago

GigaWorld-Policy: An Efficient Action-Centered World--Action Model

Robots can now plan 9x faster and achieve significantly higher success rates by decoupling action prediction from video generation in World-Action Models.

Angen Ye, Boyuan Wang, Chaojun Ni +21

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Tsinghua AI2w ago

Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow

A new mixed reality testbed lets you plug real human drivers into a CAV simulation, offering unprecedented realism for testing autonomous vehicle interactions.

Jianghong Dong, Jiawei Wang, Chunying Yang +5

Robotics & Embodied AI World Models & Planning

2w ago

Specification-Aware Distribution Shaping for Robotics Foundation Models

Guaranteeing robot safety and task completion just got easier: this method enforces complex temporal logic constraints on pre-trained robotics models without any fine-tuning.

Sadik Bera Yuksel, Sadık Bera Yüksel, Derya Aksaray

Natural Language Processing Robotics & Embodied AI World Models & Planning

Tsinghua AI2w ago

From Optimizable to Interactable: Mixed Digital Twin-Empowered Testing of Vehicle-Infrastructure Cooperation Systems

Human unpredictability is now a feature, not a bug: a mixed-reality testing framework leverages human interaction to generate high-quality corner cases for vehicle-infrastructure cooperation systems.

Jianghong Dong, Chunying Yang, Mengchi Cai +4

Robotics & Embodied AI World Models & Planning

2w ago

Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems

Autoregressive neural surrogates can now simulate dynamical systems for infinitely long horizons, thanks to a novel self-refining diffusion model that avoids error compounding.

Qi Liu, Laure Zanna, Joan Bruna

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization World Models & Planning

Meta AI2w ago

R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Ditch the data augmentation and decoders: R2-Dreamer's Barlow Twins-inspired objective delivers faster, more versatile MBRL, especially when spotting the small stuff matters.

N. Morihira, Amal Nahar, K. Bharadwaj +6

Data Curation & Synthetic Data Training Efficiency & Optimization World Models & Planning

Mar 17, 2026

Mutian Xu +42w ago

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Kinema4D unlocks zero-shot transfer in embodied AI by simulating physically plausible 4D robot-world interactions, moving beyond rigid 2D constraints.

Mutian Xu, Tianbao Zhang, Tianqi Liu +2

Computer Vision Robotics & Embodied AI World Models & Planning

2w ago·also TRI

DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models

Fine-tuning Vision-Language Model planners for robotic manipulation is now significantly more efficient and safer thanks to a novel framework that leverages video world models to simulate real-world physics.

Emily Yue-Ting Jia, E. Jia, Weiduo Yuan +5

Multimodal Models Robotics & Embodied AI World Models & Planning

Joshua Raymond Bettles +72w ago

Coverage First Next Best View for Inspection of Cluttered Pipe Networks Using Mobile Manipulators

Autonomous robots can now more safely and effectively inspect cluttered, radioactive environments by combining information gain-based planning with stochastic obstacle avoidance.

Joshua Raymond Bettles, Joshua Bettles, Jiaxu Wu +5

Computer Vision Robotics & Embodied AI World Models & Planning

2w ago

Contingency-Aware Planning via Certified Neural Hamilton-Jacobi Reachability

Neural approximations of Hamilton-Jacobi reachability can now be formally certified for safety, enabling provably safe robot navigation in unknown environments.

Kasidit Muenprasitivej, Derya Aksaray

Robotics & Embodied AI World Models & Planning

2w ago·also Lancaster University

PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics

PyPhonPlan offers a new open-source toolkit to simulate speech dynamics with neurally-grounded representations, enabling researchers to model interactive speech production and perception loops.

Sam Kirkham

Robotics & Embodied AI Speech & Audio World Models & Planning

Tsinghua AI2w ago

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

LLM-based simulations of public opinion suffer from "Diversity Collapse," but injecting explicit social identity representations into hidden states can fix it.

Hexi Wang, Yujia Zhou, Bangde Du +2

Constitutional AI & AI Ethics Natural Language Processing World Models & Planning

E. Daneshmand +42w ago

SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion

Rank-1 LoRA fine-tuning can safely and efficiently adapt simulated locomotion policies to real-world robots, slashing fine-tuning time by nearly half while maintaining safety.

E. Daneshmand, Shafeef Omar, Glen Berseth +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

CMU ML2w ago·also Keio

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

By fusing IMU and insole pressure data within a physics simulation, GRIP achieves more physically plausible human motion capture than IMU-only methods.

Ryosuke Hori, Jyun-Ting Song, Zhengyi Luo +4

Robotics & Embodied AI World Models & Planning

CMU ML2w ago

BrickSim: A Physics-Based Simulator for Manipulating Interlocking Brick Assemblies

Accurately simulating the snap-fit mechanics of interlocking bricks, BrickSim unlocks a new level of realism for robotic manipulation research involving complex assemblies.

Haowei Wen, Hao Wen, Ruixuan Liu +5

Robotics & Embodied AI World Models & Planning

2w ago

Scalable Inspection Planning via Flow-based Mixed Integer Linear Programming

By reframing robot inspection planning as a network flow problem, this work achieves a 30-50% reduction in optimality gaps and scales to instances previously intractable for state-of-the-art methods.

Adir Morgan, Kiril Solovey, Oren Salzman

Robotics & Embodied AI World Models & Planning

Shuo Shao +42w ago

ADAPT: Adaptive Dual-projection Architecture for Perceptive Traversal

Humanoid robots can now nimbly navigate complex terrain with drastically reduced computational cost thanks to a novel adaptive sensing architecture.

Shuo Shao, S. Shao, Tianchen Huang +2

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Alexander Prutsch +22w ago

ASCENT: Transformer-Based Aircraft Trajectory Prediction in Non-Towered Terminal Airspace

A lightweight transformer can accurately forecast diverse aircraft trajectories in complex airspace, outperforming prior methods and enabling real-time safety applications.

Alexander Prutsch, David Schinagl, Horst Possegger

Architecture Design (Transformers, SSMs, MoE)World Models & Planning

Mengze Tian +62w ago

Learning Whole-Body Control for a Salamander Robot

Reinforcement learning can now orchestrate the complex, whole-body movements of salamander robots, enabling seamless transitions between walking and swimming.

Mengze Tian, Qiyuan Fu, Chuanfang Ning +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Oscar Pang +52w ago

Onboard MuJoCo-based Model Predictive Control for Shipboard Crane with Double-Pendulum Sway Suppression

A MuJoCo-based MPC can effectively control shipboard cranes in real-time, even with double-pendulum sway and external perturbations, outperforming traditional PID and RL methods on embedded hardware.

Oscar Pang, Lisa Coiffard, Paul Templier +3

Robotics & Embodied AI World Models & Planning

Jiahua Hu +22w ago

Impacts of Electric Vehicle Charging Regimes and Infrastructure Deployments on System Performance: An Agent-Based Study

Smarter placement of slow chargers can significantly reduce the need for expensive en-route EV charging, leading to lower overall system costs.

Jiahua Hu, Hai L.Vu, W. Griggs

Tool Use & Agents World Models & Planning

2w ago·also Adobe Research, Ohio State, State University of New York at Buffalo

Anticipatory Planning for Multimodal AI Agents

Multimodal agents can now plan more coherently and solve complex tasks thanks to a new anticipatory reasoning framework that forecasts short-horizon trajectories before acting.

Yongyuan Liang, Shijie Zhou, Yuxuan Gu +8

Multimodal Models Tool Use & Agents World Models & Planning

2w ago·also D Consist.

OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder

Generating 3D scenes with diffusion models just got a whole lot more consistent across views, thanks to a new 3D-native approach that skips the 2D latent space bottleneck.

Sensen Gao, Zhaoqing Wang, Qihang Cao +4

Computer Vision Multimodal Models World Models & Planning

Stanford HAI2w ago·also Sydney

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

Stochastic resetting—randomly teleporting RL agents back to the start—surprisingly speeds up learning, even when it wouldn't help a non-learning agent.

Jello Zhou, Vudtiwat Ngampruetikorn, David J. Schwab

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Nora Schneider +22w ago

Conservative Continuous-Time Treatment Optimization

By penalizing treatment plans that lead to trajectory distributions far from observed patient data, this method provides a more robust approach to treatment optimization than standard model-based methods.

Nora Schneider, Georg Manten, Niki Kilbertus

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

Sasha Brenner +22w ago

Grid-World Representations in Transformers Reflect Predictive Geometry

Transformers trained on a simple grid-world learn hidden representations that directly reflect the underlying predictive geometry, offering a glimpse into how neural networks internalize structural constraints.

Sasha Brenner, Thomas R. Knösche, Nico Scherf

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp World Models & Planning

Rui Ge +52w ago

Internalizing Agency from Reflective Experience

LLMs can learn to recover from mistakes more effectively by reflecting on past failures and internalizing actionable feedback, leading to significant gains in long-horizon problem-solving.

Rui Ge, Yichao Fu, Yuyang Qian +3

RLHF & Preference Learning Tool Use & Agents World Models & Planning

2w ago

Data-driven generalized perimeter control: Zürich case study

Skip the expensive modeling step: this data-driven approach to traffic light control directly optimizes traffic flow using real-world data, slashing travel times and emissions in a massive Zürich simulation.

Alessio Rimoldi, Carlo Cenedese, Alberto Padoan +2

Robotics & Embodied AI World Models & Planning

2w ago

Ultrafast Sampling-based Kinodynamic Planning via Differential Flatness

Kinodynamic motion planning just got a whole lot faster: AkinoPDF achieves microsecond-level planning times by exploiting differential flatness for analytical solutions.

Thai Duong, T. Duong, Clayton W. Ramsey +7

Robotics & Embodied AI World Models & Planning

2w ago·also TU Munich

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Ditch slow, multi-step video generation: S-VAM distills the structured generative priors of multi-step denoising into a single forward pass for real-time robot action prediction.

Haodong Yan, Zhide Zhong, Jiaguan Zhu +12

Computer Vision Robotics & Embodied AI World Models & Planning

Tsinghua AI2w ago·also TAU

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

By treating camera pose as a unifying geometric representation, WorldCam achieves significantly improved action controllability and long-horizon 3D consistency in interactive gaming world models compared to prior video diffusion transformer approaches.

Jisu Nam, Yicong Hong, Chun-Hao Paul Huang +7

Computer Vision Robotics & Embodied AI World Models & Planning

Arno Strouwen +12w ago

Deep Adaptive Model-Based Design of Experiments

By amortizing sequential design into a neural network, this method achieves real-time model-based design of experiments, unlocking new possibilities for efficient parameter estimation in complex dynamical systems.

Arno Strouwen, Sebastian Miclucta-Campeanu

Scientific Discovery & Drug Design Training Efficiency & Optimization World Models & Planning

AI22w ago·also CMU ML, NVIDIA

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

Forget expensive real-world data collection: a massive, diverse synthetic dataset enables surprisingly effective zero-shot transfer for robotic manipulation.

Abhay Deshpande, Maya Guru, Rose Hendrix +22

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Lukas Höllein +12w ago

World Reconstruction From Inconsistent Views

Turn inconsistent video diffusion models into surprisingly coherent 3D world generators with a novel alignment and rendering approach.

Lukas Höllein, Matthias Nießner

Computer Vision Multimodal Models World Models & Planning

2w ago

Routing and Control for Marine Oil-Spill Cleanup with a Boom-Towing Vessel Fleet

Coordinating fleets of autonomous vessels to clean up multiple oil spills can be near-optimized in minutes using a hybrid optimization approach, enabling rapid, risk-aware responses to large-scale disasters.

Snir Carmeli, Adir Morgan, Kiril Solovey

Robotics & Embodied AI World Models & Planning

2w ago

Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement

Reinforcement learning can effectively control collective animal behavior in the real world, even when individuals frequently ignore the artificial stimulus.

Yusuke Nishii, Hiroaki Kawashima

RLHF & Preference Learning Robotics & Embodied AI World Models & Planning

2w ago·also NII, SOKENDAI, The Graduate University for Advanced, UofT

Domain-Independent Dynamic Programming with Constraint Propagation

Constraint propagation can significantly boost dynamic programming by pruning states and transitions, but the overhead needs further optimization.

Imko Marijnissen, J. Christopher Beck, Emir Demirović +1

Reasoning & Chain-of-Thought Tool Use & Agents World Models & Planning

NVIDIA2w ago·also AgiBot

Fast-WAM: Do World Action Models Need Test-time Future Imagination?

World Action Models can ditch the slow, iterative "imagine-then-execute" loop at test time without sacrificing performance, achieving a 4x speedup.

Zibin Dong, Yicheng Liu

Computer Vision Robotics & Embodied AI World Models & Planning

Vassilios Tsounis +112w ago

Kamino: GPU-based Massively Parallel Simulation of Multi-Body Systems with Challenging Topologies

Forget kinematic tree approximations: Kamino unlocks high-fidelity, massively parallel robot simulations with closed kinematic chains directly on GPUs.

Vassilios Tsounis, Guirec Maloisel, Christian Schumacher +9

Distributed Systems & Hardware Robotics & Embodied AI World Models & Planning

Timothée Gavin +32w ago

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Competitive reinforcement learning enables agile drone interception with higher catch rates and lower crash rates compared to heuristic baselines, even in real-world scenarios.

Timothée Gavin, T. Gavin, Simon Lacroix +1

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Yishuai Cai +62w ago

CABTO: Context-Aware Behavior Tree Grounding for Robot Manipulation

Skip the manual effort: CABTO uses large models to automatically generate complete and consistent behavior tree systems for robot manipulation.

Yishuai Cai, Xinglin Chen, Yunxin Mao +4

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Quanhao Ren +22w ago

PanguMotion: Continuous Driving Motion Forecasting with Pangu Transformers

LLM Transformers can be effectively repurposed to enhance motion forecasting in autonomous driving by capturing temporal context in continuous driving scenarios.

Quanhao Ren, Yicheng Li, Nan Song

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Hugo Math2w ago

Learning to Predict, Discover, and Reason in High-Dimensional Discrete Event Sequences

Automating vehicle fault diagnostics by treating error codes as a language unlocks scalable predictive maintenance and causal understanding in complex automotive systems.

Hugo Math

Reasoning & Chain-of-Thought World Models & Planning

2w ago

NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote Sensing

Current MLLMs struggle with even basic route planning in remote sensing, highlighting a critical gap in their ability to translate perception into action in complex, real-world scenarios.

Zhi Zhou, Shi-Yu Tian, Kun-Yang Yu +2

Eval Frameworks & Benchmarks Multimodal Models World Models & Planning

Adrien Jacquet Crétides +42w ago

Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy

Robots can now dynamically adjust their movements for legibility versus efficiency on the fly, without retraining, by using a lightweight module that detects environmental ambiguity and modulates a diffusion policy.

Adrien Jacquet Crétides, Adrien Jacquet Cr'etides, Mouad Abrini +2

Robotics & Embodied AI World Models & Planning

E. Nebot +22w ago

The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models

End-to-end autonomous driving systems, like Tesla's FSD, are proving commercially viable by effectively handling the long tail of real-world driving scenarios, signaling a major shift from rule-based approaches.

E. Nebot, J. S. B. Perez, Julie Stephany Berrio Perez

Computer Vision Robotics & Embodied AI World Models & Planning

Al Jaber Mahmud +12w ago

Geometry-Aligned LLM Fine-Tuning for Sequential Narrow-Opening Planning

LLMs can now plan complex, sequential robotic maneuvers through narrow spaces by learning from human demos and refining with geometric rewards, outperforming traditional methods.

Al Jaber Mahmud, Xuan Wang

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

Jinlu Zhang +72w ago·also Xiamen University

Persistent Story World Simulation with Continuous Character Customization

Forget finetuning a new LoRA for every character: EverTale introduces a single LoRA that adapts to *all* characters in a story, enabling continuous character customization with improved fidelity and efficiency.

Jinlu Zhang, Qiyun Wang, Baoxiang Du +5

Computer Vision Multimodal Models World Models & Planning

Amazon Science2w ago

MosaicMem: Hybrid Spatial Memory for Controllable Video World Models

Achieve minute-level navigable video world models by combining the strengths of explicit 3D patch memory with implicit generative modeling.

Wei Yu, Runjia Qian, Yumeng Li +9

Computer Vision Multimodal Models World Models & Planning

2w ago

Efficient and Reliable Teleoperation through Real-to-Sim-to-Real Shared Autonomy

Just five minutes of real-world teleoperation data is enough to train a copilot that significantly boosts both novice and expert performance on complex manipulation tasks.

Shuo Sha, Yixuan Wang, Binghao Huang +2

Robotics & Embodied AI World Models & Planning

2w ago

Featurized Occupation Measures for Structured Global Search in Numerical Optimal Control

Unlock globally optimal control policies in high-dimensional systems by unifying trajectory optimization with Hamilton-Jacobi-Bellman methods via a novel "Featurized Occupation Measure" framework.

Qi Wei, Qi Wei, Jianfeng Tao +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Tsinghua AI2w ago·also UMD

When AI Navigates the Fog of War

LLMs can exhibit surprising "strategic realism" when analyzing an ongoing geopolitical conflict, but their reasoning falters in politically ambiguous situations, revealing critical domain-specific limitations.

Ming Li, Xirui Li, Tianyi Zhou

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought World Models & Planning

2w ago

Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

Humanoid robots can now learn to walk with provably direction-dependent compliance, thanks to a new anisotropic Lipschitz constraint on RL policies.

Zewen He, Yoshihiko Nakamura

Robotics & Embodied AI World Models & Planning

2w ago·also Stanford HAI, HKU

RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Stability of LLM Agents in Realistic Retail Environments

LLM agents struggle to maintain coherent decision-making in realistic retail environments over long horizons, even with a novel framework for adaptive strategy evolution.

Linghua Zhang, Jingtong Wu, Zhisong Zhang

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Mar 16, 2026

2w ago

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Finally, a method exists to create 3D human-scene interaction models from casual captures that are stable enough for use in physics simulations and deployment on real-world robots.

Yukang Cao, Haozhe Xie, Fangzhou Hong +4

Computer Vision Robotics & Embodied AI World Models & Planning

NVIDIA2w ago·also Institute of Artificial Intelligence (TeleAI), SYSU, ZJU

HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation

Humanoid robots can now handle heavy, unknown payloads in the real world thanks to a system that identifies mass distribution via differentiable simulation.

Xingyi Wang, Chenyun Zhang, Weiji Xie +3

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

2w ago

Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning

Overcome communication bottlenecks in multi-agent RL by selectively communicating with reachable agents and predicting interference to optimize partner choice.

Ziyu Cheng, Jinsheng Ren, Zhouxian Jiang +4

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Jacob Elskamp +42w ago

EAAE: Energy-Aware Autonomous Exploration for UAVs in Unknown 3D Environments

UAVs can explore longer and more efficiently by explicitly optimizing for energy consumption, as demonstrated by a new frontier exploration framework that reduces energy use without sacrificing speed or map quality.

Jacob Elskamp, Moji Shi, L. Bauersfeld +2

Robotics & Embodied AI World Models & Planning

Seth Karten +342w ago·also Gwangju Institute of Science and Technology

The PokeAgent Challenge: Competitive and Long-Context Learning at Scale

Pokemon, not just a childhood game, emerges as a surprisingly effective benchmark for AI, revealing critical gaps in LLMs and RL agents that existing benchmarks miss.

Seth Karten, Jake Grigsby, Tersoo Upaa +32

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Health Information Center of Zhejiang Province2w ago

CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

Autonomous driving planners can now explicitly self-correct unsafe actions by generating motion-token traces conditioned on a learned collision critic, leading to significant safety improvements.

Yi Guo, Dongqiang Ye, Sijia Chen +2

RLHF & Preference Learning Robotics & Embodied AI World Models & Planning

2w ago

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Forget painstakingly crafting rewards and curricula – this new RL framework learns surprisingly dexterous manipulation skills just by resetting the simulator in diverse ways.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

2w ago·also NAVER Labs, TAU

Grounding World Simulation Models in a Real-World Metropolis

Imagine a world model that doesn't just dream up environments, but flawlessly renders a real city like Seoul, complete with text-prompted scenarios and diverse camera movements.

Junyoung Seo, Hyunwook Choi, Minkyung Kwon +9

Computer Vision Multimodal Models World Models & Planning

2w ago·also Max Planck

Scalable Simulation-Based Model Inference with Test-Time Complexity Control

Finally, a scalable method lets you explore billions of scientific models and their parameters, all while interactively tuning model complexity *after* seeing the data.

Manuel Gloeckler, J. P. Manzano-Patrón, Stamatios N. Sotiropoulos +2

Scientific Discovery & Drug Design Training Efficiency & Optimization World Models & Planning

Gran Sasso Science Institute2w ago·also Gran Sasso Science Institute (GSSI)

Formalisms for Robotic Mission Specification and Execution: A Comparative Analysis

Choosing the right formalism for robot mission specification—Behavior Trees, State Machines, HTNs, or BPMN—can make or break your robot's ability to handle real-world complexity.

Gianluca Filippone, Sara Pettinari, Patrizio Pelliccione

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Issa Nakamura +62w ago

ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning

CRL struggles with hard-to-reach goals, but ViSA, a new data augmentation technique, solves this by generating synthetic states and regularizing the embedding space, leading to better value estimation.

Issa Nakamura, Tomoya Yamanokuchi, Yuki Kadokawa +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Xingtai Gui +62w ago

Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation

WorldDrive achieves leading autonomous driving performance by unifying visual scene generation and motion planning, demonstrating that a shared representation space significantly improves both prediction accuracy and planning robustness.

Xingtai Gui, Meijie Zhang, Tianyi Yan +4

Computer Vision Robotics & Embodied AI World Models & Planning

Yiming Huang +52w ago

LiDAR-EVS: Enhance Extrapolated View Synthesis for 3D Gaussian Splatting with Pseudo-LiDAR Supervision

Achieve SOTA extrapolated-view LiDAR synthesis by fusing multi-frame LiDAR data and spatially-constrained dropout regularization, enabling robust autonomous driving simulation without multi-pass data.

Yiming Huang, Xin Kang, Sipeng Zhang +3

Computer Vision Robotics & Embodied AI World Models & Planning

2w ago·also Huawei

FAR-Drive: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving

Achieve state-of-the-art closed-loop autonomous driving simulation with sub-second latency using a novel frame-autoregressive video generation framework.

Yaoru Li, Federico Landi, Marco Godi +4

Computer Vision Robotics & Embodied AI World Models & Planning

Johannes Schmalz +12w ago

Algorithms for Deciding the Safety of States in Fully Observable Non-deterministic Problems: Technical Report

A new policy iteration algorithm, iPI, closes the gap between existing safety verification methods by matching the best-case runtime of TarjanSafe while guaranteeing polynomial worst-case scaling.

Johannes Schmalz, Chaahat Jain

Robotics & Embodied AI World Models & Planning

Alma Lago2w ago

Mechanistic Foundations of Goal-Directed Control

Infant motor learning reveals a sharp phase transition in control strategy arbitration, governed by context window size and predictable via a closed-form exponential moving average.

Alma Lago

Interpretability & Mechanistic Interp Robotics & Embodied AI World Models & Planning

Xiaoyi Wei +72w ago

KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

Quadruped robots can now learn diverse skills and adapt to complex terrains without expert datasets, thanks to a novel keyframe-guided self-imitation learning framework.

Xiaoyi Wei, Peng Zhai, Jiaxin Tu +5

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning