Explicitly reconstructing 3D scenes with Gaussian Splatting unlocks state-of-the-art BEV perception, proving that geometric understanding is key to accurate spatial reasoning.

Yiren Lu, Xin Ye, Burhaneddin Yaman +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Adrien Bolland +21w ago

Maximum-Entropy Exploration with Future State-Action Visitation Measures

Maximizing entropy of future state-action visitations boosts feature coverage within single RL trajectories, offering a new exploration strategy.

Adrien Bolland, Gaspard Lambrechts, Damien Ernst

Robotics & Embodied AI World Models & Planning

Anastasios Manganaris +31w ago

Graph-of-Constraints Model Predictive Control for Reactive Multi-agent Task and Motion Planning

Coordinating multi-robot teams to complete manipulation tasks just got easier: GoC-MPC handles dynamic task assignments and disturbances without training data or environment models.

Anastasios Manganaris, Jeremy Lu, A. H. Qureshi +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

All Papers (100)

Mar 19, 2026

Bin Cao +51w ago

OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

A million-sequence, high-quality, open-source motion dataset finally lets text-to-motion models generalize beyond toy benchmarks.

Bin Cao, Sipeng Zheng, Hao Luo +3

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

1w ago

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

Explicitly reconstructing 3D scenes with Gaussian Splatting unlocks state-of-the-art BEV perception, proving that geometric understanding is key to accurate spatial reasoning.

Yiren Lu, Xin Ye, Burhaneddin Yaman +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Adrien Bolland +21w ago

Maximum-Entropy Exploration with Future State-Action Visitation Measures

Maximizing entropy of future state-action visitations boosts feature coverage within single RL trajectories, offering a new exploration strategy.

Adrien Bolland, Gaspard Lambrechts, Damien Ernst

Robotics & Embodied AI World Models & Planning

Anastasios Manganaris +31w ago

Graph-of-Constraints Model Predictive Control for Reactive Multi-agent Task and Motion Planning

Coordinating multi-robot teams to complete manipulation tasks just got easier: GoC-MPC handles dynamic task assignments and disturbances without training data or environment models.

Anastasios Manganaris, Jeremy Lu, A. H. Qureshi +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Department of Mechanical Engineering1w ago·also Stanford HAI

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

VLAs aren't just memorizing training data; sparse autoencoders reveal a hidden layer of generalizable motion primitives that can be steered to control robot behavior across tasks.

Aiden Swann, Aiden Swann, Lachlain McGranahan +7

Interpretability & Mechanistic Interp Multimodal Models Robotics & Embodied AI

Tsinghua AI1w ago·also IEEE

ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning

Forget brittle, hand-coded robot assembly routines: ATG-MoE learns complex, multi-skill manipulation directly from visual and language inputs, achieving impressive success rates in both simulation and real-world industrial tasks.

Weihang Huang, Chaoran Zhang, Xiao Deng +6

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Tool Use & Agents

Songjia He +81w ago

V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors

Forget hand-crafted assets and heuristics: V-Dreamer uses video generation models to automatically create diverse, physically plausible robotic simulation environments and trajectories directly from language.

Songjia He, Songjiang He, Zixuan Chen +6

Computer Vision Robotics & Embodied AI World Models & Planning

Neil Fernandes +71w ago

"You've got a friend in me": Co-Designing a Peer Social Robot for Young Newcomers'Language and Cultural Learning

A peer-like social robot can effectively augment literacy tutor support for newcomer children, offering personalized language and cultural learning in resource-constrained community settings.

Neil Fernandes, Cheng Tang, Tehniyat Shahbaz +5

Natural Language Processing Robotics & Embodied AI Tool Use & Agents

Haohua Chen +31w ago

CSSDF-Net: Safe Motion Planning Based on Neural Implicit Representations of Configuration Space Distance Field

Differentiable collision checking in configuration space, previously a major hurdle, is now achievable with zero-shot generalization thanks to CSSDF-Net.

Haohua Chen, Yixuan Zhou, Yifan Zhou +1

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization+1

Yunsong Zhang +41w ago

Inductance-Based Force Self-Sensing in Fiber-Reinforced Pneumatic Twisted-and-Coiled Actuators

Forget external sensors: embedding a simple nickel wire into a pneumatic actuator unlocks surprisingly accurate force sensing via inductance, even with hysteresis.

Yunsong Zhang, Tianli Li, Tianlin Li +2

Robotics & Embodied AI

Vincent Pacelli +11w ago

Fundamental Limits for Sensor-Based Control via the Gibbs Variational Principle

Information-theoretic limits on control performance are now computable even when feedback matters most, thanks to a new bound that self-consistently accounts for the controller's impact on sensor information.

Vincent Pacelli, Evangelos A. Theodorou

Robotics & Embodied AI Scientific Discovery & Drug Design

Chengxiao He +101w ago

Contact Status Recognition and Slip Detection with a Bio-inspired Tactile Hand

Ditch the threshold: this tactile-sensing robotic hand uses contact status recognition to detect slip with 96% accuracy, even on new materials.

Chengxiao He, Chengxiao He, Wenhui Yang +8

Robotics & Embodied AI

NVIDIA1w ago

PRIOR: Perceptive Learning for Humanoid Locomotion with Reference Gait Priors

Humanoid robots can now traverse complex terrains with human-like gaits, thanks to a surprisingly simple and efficient framework that eschews adversarial training.

Chenxi Han, Shilu He, Yixiao Cheng +3

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Haitian Li +101w ago

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Unlock real-time 3D understanding: MonoArt achieves state-of-the-art monocular articulated object reconstruction without relying on multi-view data or external motion templates.

Haitian Li, Haozhe Xie, Haozhe Xie +8

Computer Vision Reasoning & Chain-of-Thought Robotics & Embodied AI

Chenyang Gu +111w ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Achieve 9x lower trajectory error and 3x better FID in motion generation by using a diffusion-based discrete motion tokenizer that elegantly handles both semantic and kinematic constraints.

Chenyang Gu, Chenyang Gu, Mingyuan Zhang +9

Computer Vision Robotics & Embodied AI World Models & Planning

Swagat Padhan +61w ago

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation

VLMs struggle with spatial reasoning, but a clever decomposition into sub-problems and probabilistic recombination unlocks significantly better metric-semantic grounding.

Swagat Padhan, Lakshya Jain, Bhavya Minesh Shah +4

Computer Vision Multimodal Models Robotics & Embodied AI

Jiacheng Tang +51w ago

CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention

Autonomous driving models can be made significantly more robust and safe by explicitly de-confounding their training via causal intervention, eliminating reliance on spurious correlations.

Jiacheng Tang, Zhiyuan Zhou, Zhuolin He +3

Computer Vision Robotics & Embodied AI World Models & Planning

Khushiyant +11w ago

HEP Statistical Inference for UAV Fault Detection: CLs, LRT, and SBI Applied to Blade Damage

Particle physics techniques can give your drone superhuman senses: statistical methods from CERN enable UAVs to detect subtle blade damage with calibrated uncertainty, outperforming standard anomaly detection methods.

Khushiyant, Khushiyant

Robotics & Embodied AI Scientific Discovery & Drug Design

Mohamed Youssef +41w ago

Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

Encoding realism as a knowledge graph of interpretable traits unlocks zero-shot sim2real image translation that outperforms state-of-the-art diffusion methods.

Mohamed Youssef, Mayar Elfares, Anna-Maria Meer +2

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Jiayi Yuan +31w ago

VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation

Achieve state-of-the-art panoramic depth estimation without any training by cleverly exploiting the 3D consistency priors embedded within existing vision foundation models.

Jiayi Yuan, Haobo Jiang, De Wen Soh +1

Computer Vision Multimodal Models Robotics & Embodied AI

Bryce Grant +31w ago

Not All Features Are Created Equal: A Mechanistic Study of Vision-Language-Action Models

Turns out, VLA models are mostly just looking at the scene: visual pathways dominate action generation, and language only matters when the visuals are ambiguous.

Bryce Grant, Bryce Grant, Xijia Zhao +1

Interpretability & Mechanistic Interp Multimodal Models Robotics & Embodied AI

Peter Stadler +51w ago

Lightweight Model Predictive Control for Spacecraft Rendezvous Attitude Synchronization

Ditch the heavyweight controllers: these lightweight MPC approaches bring real-time attitude synchronization to resource-constrained spacecraft.

Peter Stadler, Alexander Meinert, N. Baldauf +3

Robotics & Embodied AI World Models & Planning

1w ago

ROFT-VINS: Robust Feature Tracking-based Visual-Inertial State Estimation for Harsh Environment

Deep learning can rescue VIO from textureless environments and rapid lighting changes.

Sanghyun Park, Soohee Han

Computer Vision Robotics & Embodied AI

Harald Minde Hansen +71w ago

Tendon-Actuated Robots with a Tapered, Flexible Polymer Backbone: Design, Fabrication, and Modeling

Tapered backbones in 3D-printed continuum robots unlock enhanced compliance and manipulability, all while slashing costs and assembly time.

Harald Minde Hansen, Nandita Gallacher, Nicholas B. Andrews +5

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Victor Nikhil Antony +51w ago

Introducing M: A Modular, Modifiable Social Robot

Democratizing social robotics research, M offers a low-cost, open-source platform that's easy to reproduce, modify, and deploy in real-world settings.

Victor Nikhil Antony, V. Antony, Zhili Gong +3

Open-Source Models & Weights Robotics & Embodied AI

Joel J Nellikkunnel +11w ago

TiBCLaG: A Trigger-induced Bistable Compliant Laparoscopic Grasper

A single 3D-printed part can replace complex multi-link laparoscopic graspers, slashing manufacturing costs while maintaining reliable bistable actuation.

Joel J Nellikkunnel, Prabhat Kumar

Robotics & Embodied AI Scientific Discovery & Drug Design

Dong Zhuo +121w ago

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding

DriveTok achieves unified multi-view reconstruction and understanding by learning scene tokens that integrate semantic, geometric, and textural information, outperforming existing 2D tokenizers in autonomous driving scenarios.

Dong Zhuo, Dong Zhuo, Wenzhao Zheng +10

Computer Vision Multimodal Models Robotics & Embodied AI

Moyang Li +31w ago

DROID-SLAM in the Wild

DROID-SLAM achieves robust real-time RGB SLAM in dynamic environments by explicitly modeling per-pixel uncertainty, outperforming existing methods that struggle with unknown dynamic objects and cluttered scenes.

Moyang Li, Zihan Zhu, Marc Pollefeys +1

Computer Vision Robotics & Embodied AI

Xiucheng Wang +21w ago

Learn for Variation: Variationally Guided AAV Trajectory Learning in Differentiable Environments

Differentiable environments and backpropagation offer a surprisingly effective alternative to reinforcement learning for AAV trajectory optimization, sidestepping credit assignment problems.

Xiucheng Wang, Zhenye Chen, Nan Cheng

Robotics & Embodied AI World Models & Planning

Yicheng Zeng +71w ago

ADMM-Based Distributed MPC with Control Barrier Functions for Safe Multi-Robot Quadrupedal Locomotion

Decentralized MPC with control barrier functions lets multi-robot quadrupeds safely navigate complex environments in real-time, achieving performance on par with centralized approaches but with significantly reduced computation.

Yicheng Zeng, Ruturaj S. Sambhus, B. Imran +5

Distributed Systems & Hardware Robotics & Embodied AI World Models & Planning

Mohammadhossein Homaei +91w ago·also University of Extremadura

Cyber-Resilient Digital Twins: Discriminating Attacks for Safe Critical Infrastructure Control

Digital twins can now discriminate between different types of cyberattacks on critical infrastructure, enabling targeted responses instead of costly full shutdowns.

Mohammadhossein Homaei, MohammadHossein Homaei, Iman Khazrak +7

Red-Teaming & Adversarial Robustness Robotics & Embodied AI World Models & Planning

Hongjia Zhai +71w ago

OnlinePG: Online Open-Vocabulary Panoptic Mapping with 3D Gaussian Splatting

Real-time robotic perception just got a major upgrade: OnlinePG achieves open-vocabulary panoptic mapping with 3D Gaussian Splatting, enabling robots to understand and interact with environments in a way that was previously impossible.

Hongjia Zhai, Qi Zhang, Xiaokun Pan +5

Computer Vision Multimodal Models Robotics & Embodied AI

Hanwen Wang +41w ago

Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization

Quadrupedal robots can now skate circles around traditional designs, thanks to a co-design approach that unlocks dynamic maneuvers like hockey stops and self-alignment.

Hanwen Wang, Zhenlong Fang, Z. Fang +2

Robotics & Embodied AI Training Efficiency & Optimization

Shuqi Xiao +31w ago

REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation

LLMs can navigate more efficiently in unfamiliar environments by reasoning over a tree of possible paths, not just isolated waypoints, enabling them to consider en-route information gain and prune unpromising branches.

Shuqi Xiao, Maani Ghaffari, Chengzhong Xu +1

Computer Vision Robotics & Embodied AI World Models & Planning

Sangwoo Shin +41w ago

Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning

Robots can learn faster and generalize better by encoding dynamics directly into their neural network architecture, outperforming standard transformers and GNNs.

Sangwoo Shin, Kunzhao Ren, Xiaobin Xiong +2

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Training Efficiency & Optimization+1

Damyon Kim +101w ago

A Passive Elastic-Folding Mechanism for Stackable Airdrop Sensors

Ditch the power-hungry actuators: this passive elastic-folding mechanism lets you stack and airdrop sensors that reliably self-deploy into 3D structures.

Damyon Kim, Yuichi Honjo, Tatsuya Iizuka +8

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

1w ago·also UPM Saudi Arabia

GHOST: Fast Category-agnostic Hand-Object Interaction Reconstruction from RGB Videos using Gaussian Splatting

Reconstructing realistic hand-object interactions from video just got an order of magnitude faster, thanks to a novel Gaussian Splatting approach that ensures physical consistency.

Ahmed Tawfik Aboukhadra, Marcel Rogge, Nadia Robertini +5

Computer Vision Multimodal Models Robotics & Embodied AI

Hui Yang +91w ago

Generalized Hand-Object Pose Estimation with Occlusion Awareness

Overcoming occlusion in hand-object pose estimation just got easier: GenHOI leverages hierarchical semantic knowledge and hand priors to achieve state-of-the-art results on challenging benchmarks.

Hui Yang, Wei Sun, Jian Liu +7

Computer Vision Multimodal Models Robotics & Embodied AI

Eunseong Choi +111w ago·also School of Mechanical Engineering, Sejong University

Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Hybrid LiDAR-inertial-visual odometry (LIVO) robustly handles visually challenging conditions, outperforming sparse-direct methods by combining direct photometric methods with learning-based feature descriptors.

Eunseong Choi, Junwoo Hong, Daehan Lee +9

Computer Vision Eval Frameworks & Benchmarks Robotics & Embodied AI

Fuze Sun +41w ago

Empathetic Motion Generation for Humanoid Educational Robots via Reasoning-Guided Vision--Language--Motion Diffusion Architecture

Humanoid robots can now generate more empathetic and instruction-aware gestures thanks to a new diffusion framework conditioned on affective estimation and pedagogical reasoning.

Fuze Sun, Lingyu Li, Lekan Dai +2

Multimodal Models Robotics & Embodied AI Speech & Audio

Andrew Choi +31w ago

Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

Forget painstakingly designing simulation environments: generative 3D world models let you RL-fine-tune robot VLAs with massive scene diversity, boosting real-world transfer by 3x.

Andrew Choi, Xinjie Wang, Zhizhong Su +1

Multimodal Models Robotics & Embodied AI World Models & Planning

Julián Martínez +21w ago

Computationally Efficient Density-Driven Optimal Control via Analytical KKT Reduction and Contractive MPC

Unlock real-time control for massive multi-agent swarms: this method slashes computation from cubic to linear with horizon length, making long-horizon density-driven control practical.

Julián Martínez, Julian Martinez, Kooktae Lee

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

1w ago·also CUHK, HKUST, Shanghai AI Lab

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Decomposing GUI agent trajectories into verifiable milestones and auditing the evidence chain yields a 10% boost in RL training performance, outperforming single-judge reward systems.

Zehao Li, Zhenyu Wu, Zhenyu Wu +23

RLHF & Preference Learning Robotics & Embodied AI Tool Use & Agents

Yuxiang Lu +131w ago·also Corresponding author

FASTER: Rethinking Real-Time Flow VLAs

Flow-based VLAs can react to environmental changes ten times faster by adaptively prioritizing near-term actions during sampling, unlocking unprecedented real-time responsiveness.

Yuxiang Lu, Yuxiang Lu, Zhe Liu +11

Inference & Quantization Multimodal Models Robotics & Embodied AI

Zhuofan Li +91w ago

From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models

Seemingly efficient VLA models can be surprisingly inefficient when deployed on robots, highlighting the need to move beyond standard metrics like FLOPs and parameters.

Zhuofan Li, Zhuo Li, Hongkun Yang +7

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Fengxiaoxiao Li +81w ago

CAMO: A Conditional Neural Solver for the Multi-objective Multiple Traveling Salesman Problem

Neural solvers can now effectively handle the complexities of multi-agent coordination and multi-objective trade-offs in routing problems, outperforming traditional heuristics.

Fengxiaoxiao Li, Xiao Mao, Mingfeng Fan +6

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Tsinghua AI1w ago

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

Embodied navigation agents, already struggling, fall apart when faced with the kinds of messy, real-world sensor and instruction corruptions that NavTrust now exposes.

Huaide Jiang, Huai-Zhou Jiang, Yash Chaudhary +11

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Xuemian Wu +21w ago

Conflict-Based Search for Multi Agent Path Finding with Asynchronous Actions

Optimal multi-agent path planning with asynchronous actions is now provably complete, sidestepping the theoretical incompleteness of prior continuous-time approaches.

Xuemian Wu, Shizhe Zhao, Zhongqiang Ren

Robotics & Embodied AI World Models & Planning

Gaoxiang Cao +61w ago

Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs

Forget blind exploration: injecting LLM-derived semantic understanding into DRL dramatically boosts UAV-aided network connectivity and slashes energy consumption.

Gaoxiang Cao, Wenke Yuan, Huasen He +4

Robotics & Embodied AI Tool Use & Agents

Jiatong Xia +41w ago

Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors

Unlock geometry-precise 3D generation by directly conditioning diffusion models on readily available point cloud priors, outperforming existing image- or text-conditioned methods.

Jiatong Xia, Zicheng Duan, A. Hengel +2

Computer Vision Multimodal Models Robotics & Embodied AI

Alexander Meinert +51w ago

Safety-Guaranteed Imitation Learning from Nonlinear Model Predictive Control for Spacecraft Close Proximity Operations

Guaranteeing safety in spacecraft autonomy is now more tractable: a CBF-CLF informed imitation learning approach achieves NMPC-level performance with real-time feasibility on commodity hardware.

Alexander Meinert, Niklas Baldauf, N. Baldauf +3

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Yiren Lu +51w ago

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

Agents can now "hallucinate" optimal viewpoints for reasoning by storing and re-rendering scenes with 3D Gaussian Splatting, enabling recovery from initial observation failures.

Yiren Lu, Yi Du, Disheng Liu +3

Computer Vision Robotics & Embodied AI World Models & Planning

†Corresponding author1w ago

MemoAct: Atkinson-Shiffrin-Inspired Memory-Augmented Visuomotor Policy for Robotic Manipulation

Hierarchical memory, inspired by human cognition, beats standard approaches in robotic manipulation tasks requiring both precise tracking and long-term retention.

Liufan Tan, Jiale Li, Gang Jing

Multimodal Models Robotics & Embodied AI World Models & Planning

NVIDIA1w ago·also CAS, PKU, Zhongguancun Academy

OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic Manipulation

Robots can now manipulate objects with greater dexterity and adaptability thanks to a new world model that leverages both vision and high-frequency tactile feedback to predict and react to contact dynamics.

Yuhang Zheng, Songen Gu, Weize Li +13

Multimodal Models Robotics & Embodied AI World Models & Planning

Yongqiang Zhao +61w ago

ViTac-Tracing: Visual-Tactile Imitation Learning of Deformable Object Tracing

Tactile sensing closes the sim2real gap for deformable object tracing, enabling a single imitation learning model to achieve impressive generalization across diverse objects.

Yongqiang Zhao, Haining Luo, Yupeng Wang +4

Computer Vision Multimodal Models Robotics & Embodied AI

Yifan Zhang +21w ago

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

Standard DRL collapses in volatile environments because it mistakes irreducible noise for a lack of data, but RE-SAC fixes this by explicitly separating these uncertainties.

Yifan Zhang, Yifan Zhang, Liang Zheng

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Anton R. Wagner +61w ago

Fire as a Service: Augmenting Robot Simulators with Thermally and Visually Accurate Fire Dynamics

Robots can now train in realistic, thermally-accurate simulated fires, paving the way for safer and more reliable real-world firefighting deployments.

Anton R. Wagner, A. Wagner, Madhan Balaji Rao +4

Computer Vision Robotics & Embodied AI World Models & Planning

Zachery Allen +41w ago

Robotic Agentic Platform for Intelligent Electric Vehicle Disassembly

LLMs can control robots for complex disassembly tasks, but only if you give them structured APIs – otherwise, expect a 43% failure rate.

Zachery Allen, Max Conway, Lyle Antieau +2

Computer Vision Robotics & Embodied AI Tool Use & Agents

Joerg Deigmoeller +111w ago

MERGE: Guided Vision-Language Models for Multi-Actor Event Reasoning and Grounding in Human-Robot Interaction

Even the most advanced VLMs like GPT-4o, GPT-5 and Gemini 2.5 Flash are outperformed in multi-actor human-robot interaction grounding by a system that selectively invokes VLMs based on a lightweight perception pipeline.

Joerg Deigmoeller, Nakul Agarwal, Stephan Hasler +9

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 18, 2026

Lars Bartels +42w ago

Real-Time Online Learning for Model Predictive Control using a Spatio-Temporal Gaussian Process Approximation

Achieve real-time online learning for model predictive control with a novel spatio-temporal Gaussian Process approximation that maintains constant computational complexity.

Lars Bartels, Amon Lahr, Andrea Carron +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

NVIDIA2w ago·also HUST

VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

By explicitly reasoning in 3D, VolumeDP leaps ahead of 2D-based imitation learning methods, achieving a remarkable 14.8% improvement on the LIBERO benchmark and robust real-world generalization.

Tianxing Zhou, Fei Xue, Feiyang Xue +4

Computer Vision Multimodal Models Robotics & Embodied AI

Tsinghua AI2w ago·also PKU

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

By iteratively reasoning over video snippets with a Chain-of-Thought, $\text{R}^2$VLM achieves state-of-the-art long-horizon task progress estimation without needing to process entire videos at once.

Yuelin Zhang, Sijie Cheng, Zongzhao Li +2

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.

Dusit Niyato, Changyuan Zhao, Jiawen Kang +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

2w ago·also B LLM consistently underperforming

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

LLMs can be prompted to generate part-aware instructions that substantially improve open-vocabulary 3D affordance grounding by linking semantically similar affordances and refining geometric differentiation.

Dongqiang Gou, Xuming He

Computer Vision Multimodal Models Robotics & Embodied AI

Qianpu Chen +32w ago

In Trust We Survive: Emergent Trust Learning

Forget complex communication protocols – this trust-based algorithm lets agents learn to cooperate in competitive environments with minimal overhead.

Qianpu Chen, Giulio Barbero, Mike Preuss +1

Robotics & Embodied AI Tool Use & Agents

Seongrae Noh +32w ago

Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

By treating 3D scene editing as goal-regressive planning rather than pure generation, Edit-As-Act achieves instruction fidelity, semantic consistency, and physical plausibility that existing methods miss.

Seongrae Noh, SeungWon Seo, Gyeong-Moon Park +1

Computer Vision Robotics & Embodied AI World Models & Planning

Abhijeet M. Kulkarni +42w ago

Proprioceptive-only State Estimation for Legged Robots with Set-Coverage Measurements of Learned Dynamics

Legged robots can navigate more reliably with noisy sensors thanks to a new state estimator that avoids Gaussian noise assumptions.

Abhijeet M. Kulkarni, Abhijeet M. Kulkarni, Ioannis Poulakakis +2

Robotics & Embodied AI World Models & Planning

Chaokang Jiang +32w ago

VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs

Achieve stable, real-time kilometer-scale autonomous driving simulations by generating vector-graph tiles incrementally using a novel diffusion flow approach.

Chaokang Jiang, Desen Zhou, Jiuming Liu +1

Computer Vision Robotics & Embodied AI World Models & Planning

Kehan Chen +72w ago

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Forget verbose instructions: this new VLN paradigm uses floor plans to guide navigation with concise commands, boosting success rates by 60%.

Kehan Chen, Yan Huang, Dong An +5

Computer Vision Multimodal Models Robotics & Embodied AI

Tharun Sethuraman +52w ago

Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation

Robots can now navigate based on your spoken preferences and visual context, thanks to a clever fusion of VLMs, LLMs, and multi-objective RL.

Tharun Sethuraman, Subham Agrawal, Nils Dengler +3

Natural Language Processing RLHF & Preference Learning Robotics & Embodied AI

Daisuke Yasui +22w ago

Uncovering Latent Phase Structures and Branching Logic in Locomotion Policies: A Case Study on HalfCheetah

Locomotion policies, often considered black boxes, can autonomously learn interpretable phase structures and branching logic, revealing a hidden order in their decision-making.

Daisuke Yasui, Toshitaka Matsuki, Hiroshi Sato

Interpretability & Mechanistic Interp Robotics & Embodied AI

2w ago

Bringing Network Coding into Multi-Robot Systems: Interplay Study for Autonomous Systems over Wireless Communications

Network coding, often overlooked in robotics, can drastically improve the reliability and timeliness of multi-robot communication, outperforming traditional retransmission methods in safety-critical scenarios.

Anil Zaher, Kiril Solovey, Alejandro Cohen

Distributed Systems & Hardware Natural Language Processing Robotics & Embodied AI

2w ago

Manufacturing Micro-Patterned Surfaces with Multi-Robot Systems

Ergodic control lets swarms of robots cooperatively manufacture micro-patterned surfaces, unlocking scalable production of materials with enhanced physical properties.

Annalisa T. Taylor, Malachi Landis, Ping Guo +1

Distributed Systems & Hardware Robotics & Embodied AI

Alvin Zhu +232w ago

DexEXO: A Wearability-First Dexterous Exoskeleton for Operator-Agnostic Demonstration and Learning

A wearable hand exoskeleton that prioritizes comfort and adaptability unlocks scalable robot learning by enabling direct policy training from raw visual data, bypassing complex post-processing.

Alvin Zhu, Alvin Zhu, Mingzhang Zhu +21

Robotics & Embodied AI

Stanford HAI2w ago

ReSteer: Quantifying and Refining the Steerability of Multitask Robot Policies

Robots often ignore your commands mid-task, but ReSteer offers a way to fix this by pinpointing and patching the "blind spots" in their training data.

Zhenyang Chen, Alan Tian, Alan Tian +12

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

Marijn Ruiter +42w ago

RHYME-XT: A Neural Operator for Spatiotemporal Control Systems

Ditch costly PIDE integration: RHYME-XT learns the flow map directly, offering a continuous-time, discretization-invariant representation that beats state-of-the-art neural operators.

Marijn Ruiter, Miguel Aguiar, Jake Rap +2

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Scientific Discovery & Drug Design

2w ago

AERR-Nav: Adaptive Exploration-Recovery-Reminiscing Strategy for Zero-Shot Object Navigation

Robots can now nimbly navigate complex, multi-floor environments without prior training, thanks to a new strategy that dynamically switches between exploration, recovery, and memory recall.

Jing Huang, Jingzhi Huang, Junkai Huang +4

Multimodal Models Robotics & Embodied AI Tool Use & Agents

2w ago

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Legged robots can now perform robust parkour with a 1-meter visual blind zone, thanks to a novel architecture that tightly couples vision, proprioception, and physics-based state estimation.

Jialong Liu, Dehan Shen, Yanbo Wen +2

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

A. Humnabadkar +52w ago

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

Synthetic data and virtual environments are rapidly becoming indispensable for autonomous driving, but realizing their full potential requires tackling challenges like Sim2Real transfer and scalable safety validation.

A. Humnabadkar, A. Sikdar, B. Cave +3

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

2w ago

UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

Achieve state-of-the-art semantic 3D reconstruction from sparse views by intelligently pruning redundant Gaussians and blending 2D and 3D semantic cues.

Guibiao Liao, Kaimin Liao, Hua Wang +3

Computer Vision Multimodal Models Robotics & Embodied AI

2w ago

GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes

Synthesizing realistic 6-DOF object manipulation trajectories in complex 3D environments just got a whole lot better with GMT, a multimodal transformer that substantially outperforms existing methods.

Huajian Zeng, Huajian Zeng, Abhishek Saroha +2

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Xingyu Chen +32w ago

Consistency-Driven Dual LSTM Models for Kinematic Control of a Wearable Soft Robotic Arm

Cycle consistency training unlocks stable and accurate inverse kinematics for wearable soft robots, even with their inherent nonlinearities and hysteresis.

Xingyu Chen, Yifu Xiong, Yi Xiong +1

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Jinyu Miao +122w ago

Physics-informed Deep Mixture-of-Koopmans Vehicle Dynamics Model with Dual-branch Encoder for Distributed Electric-drive Trucks

Representing highly nonlinear vehicle dynamics in a lifted linear space via Koopman operator theory enables state-of-the-art long-term state estimation for complex electric trucks.

Jinyu Miao, Pu Zhang, Rujun Yan +10

Robotics & Embodied AI World Models & Planning

Hao Ma +32w ago

Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control

LLMs can act as effective action-level supervisors in reinforcement learning, dramatically boosting the sample efficiency of SAC without sacrificing convergence guarantees.

Hao Ma, Zhiqiang Pu, Xiaolin Ai +1

RLHF & Preference Learning Robotics & Embodied AI Tool Use & Agents

Xinyang Gong +52w ago

ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling

Forget rigid physics engines, this badminton RL environment uses real player data to simulate realistic rallies and strategic gameplay.

Xinyang Gong, Bozhou Chen, Yunlong Lu +3

Robotics & Embodied AI Tool Use & Agents World Models & Planning

MIT CSAIL2w ago

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

Heuristic maritime routes lead to extreme fuel waste in nearly 5% of voyages, but this RL approach cuts that risk by almost 10x.

Aniruddha Bora, J. Chalfant, C. Chryssostomidis

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

CMU ML2w ago·also NII

RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.

Zhenhang Yuan, Shenghai Yuan, Lihua Xie

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Risham Sidhu +32w ago

Grid Spatial Understanding: A Dataset for Textual Spatial Reasoning over Grids, Embodied Settings, and Coordinate Structures

LLMs struggle with spatial reasoning in embodied settings and 3D structure identification even when exposed to visual modalities, but fine-tuning smaller models offers a surprisingly effective alternative to brute-force scaling.

Risham Sidhu, Risham Sidhu, Julia Hockenmaier +1

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Robotics & Embodied AI

2w ago·also Tsinghua AI, INRIA

DancingBox: A Lightweight MoCap System for Character Animation from Physical Proxies

Animate 3D characters using bananas and plush toys – DancingBox turns everyday objects into motion capture proxies, making animation accessible to novices.

Haocheng Yuan, Adrien Bousseau, Hao Pan +2

Computer Vision Robotics & Embodied AI

2w ago

P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

VLN agents can navigate more effectively by predicting their future states and proactively planning based on forecasted semantic map cues, rather than relying solely on historical context.

Tian Li, Tianfu Li, Wenbo Chen +4

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System

Forget training wheels: GoalVLM lets multi-agent robots navigate to any object you describe, no pre-programmed categories needed.

MoniJesu James, M. James, Amir Atef Habel +5

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Stanford HAI2w ago

Rapid Adaptation of Particle Dynamics for Generalized Deformable Object Mobile Manipulation

Encoding deformable object dynamics with particle positions unlocks sim-to-real transfer for manipulation tasks, achieving impressive real-world success rates.

Bohan Wu, Roberto Mart'in-Mart'in

Robotics & Embodied AI World Models & Planning

2w ago

SafeLand: Safe Autonomous Landing in Unknown Environments with Bayesian Semantic Mapping

Drones can now land safely in complex, unknown environments using only a camera, thanks to a new system that dynamically maps and reacts to surroundings in real-time.

Markus Gross, Andreas Greiner, S. B. Matha +5

Computer Vision Robotics & Embodied AI World Models & Planning

NVIDIA2w ago

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Ditch fixed compute budgets: this new flow-matching method for robotic control adaptively allocates computation, speeding up simple tasks and focusing on complex ones.

Zunzhe Zhang, R. Huang, Runhan Huang +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Akshat Rana +32w ago

SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations

Scene graphs plus LLMs let robots ask clarifying questions, boosting multi-agent task success by 15%.

Akshat Rana, Peeyush Agarwal, Krishan Rana +1

Reasoning & Chain-of-Thought Robotics & Embodied AI Tool Use & Agents

Gaotian Wang +32w ago

ManiDreams: An Open-Source Library for Robust Object Manipulation via Uncertainty-aware Task-specific Intuitive Physics

ManiDreams lets robots handle real-world uncertainty in manipulation tasks without retraining, outperforming standard RL baselines under various perturbations.

Gaotian Wang, Kejia Ren, Andrew S. Morgan +1

Open-Source Models & Weights Robotics & Embodied AI World Models & Planning

Lukas Cha +52w ago

Multi-material Direct Ink Writing and Embroidery for Stretchable Wearable Sensors

Forget rigid circuits - this new method seamlessly weaves stretchable sensors directly into clothing using a clever combo of 3D printing and embroidery.

Lukas Cha, Ryman Hashem, R. Prakash +3

Robotics & Embodied AI Scientific Discovery & Drug Design

Nikhil Gosala +32w ago

Sparse3DTrack: Monocular 3D Object Tracking Using Sparse Supervision

Unlock accurate monocular 3D object tracking with minimal annotation: Sparse3DTrack achieves state-of-the-art performance using only a handful of labels per track.

Nikhil Gosala, B. Kiran, S. Yogamani +1

Computer Vision Robotics & Embodied AI

Ruixiang Wang +52w ago

EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards

Robot world models can be significantly improved by directly rewarding them for generating videos that lead to physically plausible robot actions, even if the videos themselves contain visual artifacts.

Ruixiang Wang, Qingming Liu, Yueci Deng +3

Computer Vision Robotics & Embodied AI World Models & Planning

Juan Wachs2w ago

Final Report for the Workshop on Robotics&AI in Medicine

A national center focused on AI and robotics in medicine could be the key to unlocking the transformative potential of these technologies in healthcare.

Juan Wachs

Natural Language Processing Robotics & Embodied AI Scientific Discovery & Drug Design