Robotics & Embodied AI - Weekly Roundup

Real-Time Online Learning for Model Predictive Control using a Spatio-Temporal Gaussian Process Approximation

Mar 18, 2026

Lars Bartels +42w ago

Achieve real-time online learning for model predictive control with a novel spatio-temporal Gaussian Process approximation that maintains constant computational complexity.

Lars Bartels, Amon Lahr, Andrea Carron +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

NVIDIA2w ago·also HUST

VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

By explicitly reasoning in 3D, VolumeDP leaps ahead of 2D-based imitation learning methods, achieving a remarkable 14.8% improvement on the LIBERO benchmark and robust real-world generalization.

Tianxing Zhou, Fei Xue, Feiyang Xue +4

Computer Vision Multimodal Models Robotics & Embodied AI

Tsinghua AI2w ago·also PKU

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

By iteratively reasoning over video snippets with a Chain-of-Thought, $\text{R}^2$VLM achieves state-of-the-art long-horizon task progress estimation without needing to process entire videos at once.

Yuelin Zhang, Sijie Cheng, Zongzhao Li +2

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.

Dusit Niyato, Changyuan Zhao, Jiawen Kang +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

All Papers (100)

Mar 18, 2026

Lars Bartels +42w ago

Real-Time Online Learning for Model Predictive Control using a Spatio-Temporal Gaussian Process Approximation

Achieve real-time online learning for model predictive control with a novel spatio-temporal Gaussian Process approximation that maintains constant computational complexity.

Lars Bartels, Amon Lahr, Andrea Carron +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

NVIDIA2w ago·also HUST

VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

By explicitly reasoning in 3D, VolumeDP leaps ahead of 2D-based imitation learning methods, achieving a remarkable 14.8% improvement on the LIBERO benchmark and robust real-world generalization.

Tianxing Zhou, Fei Xue, Feiyang Xue +4

Computer Vision Multimodal Models Robotics & Embodied AI

Tsinghua AI2w ago·also PKU

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

By iteratively reasoning over video snippets with a Chain-of-Thought, $\text{R}^2$VLM achieves state-of-the-art long-horizon task progress estimation without needing to process entire videos at once.

Yuelin Zhang, Sijie Cheng, Zongzhao Li +2

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.

Dusit Niyato, Changyuan Zhao, Jiawen Kang +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

2w ago·also B LLM consistently underperforming

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

LLMs can be prompted to generate part-aware instructions that substantially improve open-vocabulary 3D affordance grounding by linking semantically similar affordances and refining geometric differentiation.

Dongqiang Gou, Xuming He

Computer Vision Multimodal Models Robotics & Embodied AI

Qianpu Chen +32w ago

In Trust We Survive: Emergent Trust Learning

Forget complex communication protocols – this trust-based algorithm lets agents learn to cooperate in competitive environments with minimal overhead.

Qianpu Chen, Giulio Barbero, Mike Preuss +1

Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

Seongrae Noh +32w ago

By treating 3D scene editing as goal-regressive planning rather than pure generation, Edit-As-Act achieves instruction fidelity, semantic consistency, and physical plausibility that existing methods miss.

Seongrae Noh, SeungWon Seo, Gyeong-Moon Park +1

Computer Vision Robotics & Embodied AI World Models & Planning

Abhijeet M. Kulkarni +42w ago

Proprioceptive-only State Estimation for Legged Robots with Set-Coverage Measurements of Learned Dynamics

Legged robots can navigate more reliably with noisy sensors thanks to a new state estimator that avoids Gaussian noise assumptions.

Abhijeet M. Kulkarni, Abhijeet M. Kulkarni, Ioannis Poulakakis +2

VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs

Chaokang Jiang +32w ago

Achieve stable, real-time kilometer-scale autonomous driving simulations by generating vector-graph tiles incrementally using a novel diffusion flow approach.

Chaokang Jiang, Desen Zhou, Jiuming Liu +1

Computer Vision Robotics & Embodied AI World Models & Planning

Kehan Chen +72w ago

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Forget verbose instructions: this new VLN paradigm uses floor plans to guide navigation with concise commands, boosting success rates by 60%.

Kehan Chen, Yan Huang, Dong An +5

Computer Vision Multimodal Models Robotics & Embodied AI

Tharun Sethuraman +52w ago

Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation

Robots can now navigate based on your spoken preferences and visual context, thanks to a clever fusion of VLMs, LLMs, and multi-objective RL.

Tharun Sethuraman, Subham Agrawal, Nils Dengler +3

Natural Language Processing RLHF & Preference Learning Robotics & Embodied AI

Daisuke Yasui +22w ago

Uncovering Latent Phase Structures and Branching Logic in Locomotion Policies: A Case Study on HalfCheetah

Locomotion policies, often considered black boxes, can autonomously learn interpretable phase structures and branching logic, revealing a hidden order in their decision-making.

Daisuke Yasui, Toshitaka Matsuki, Hiroshi Sato

Interpretability & Mechanistic Interp Robotics & Embodied AI

2w ago

Bringing Network Coding into Multi-Robot Systems: Interplay Study for Autonomous Systems over Wireless Communications

Network coding, often overlooked in robotics, can drastically improve the reliability and timeliness of multi-robot communication, outperforming traditional retransmission methods in safety-critical scenarios.

Anil Zaher, Kiril Solovey, Alejandro Cohen

Distributed Systems & Hardware Natural Language Processing Robotics & Embodied AI

2w ago

Manufacturing Micro-Patterned Surfaces with Multi-Robot Systems

Ergodic control lets swarms of robots cooperatively manufacture micro-patterned surfaces, unlocking scalable production of materials with enhanced physical properties.

Annalisa T. Taylor, Malachi Landis, Ping Guo +1

Distributed Systems & Hardware Robotics & Embodied AI

Alvin Zhu +232w ago

DexEXO: A Wearability-First Dexterous Exoskeleton for Operator-Agnostic Demonstration and Learning

A wearable hand exoskeleton that prioritizes comfort and adaptability unlocks scalable robot learning by enabling direct policy training from raw visual data, bypassing complex post-processing.

Alvin Zhu, Alvin Zhu, Mingzhang Zhu +21

ReSteer: Quantifying and Refining the Steerability of Multitask Robot Policies

Stanford HAI2w ago

Robots often ignore your commands mid-task, but ReSteer offers a way to fix this by pinpointing and patching the "blind spots" in their training data.

Zhenyang Chen, Alan Tian, Alan Tian +12

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

Marijn Ruiter +42w ago

RHYME-XT: A Neural Operator for Spatiotemporal Control Systems

Ditch costly PIDE integration: RHYME-XT learns the flow map directly, offering a continuous-time, discretization-invariant representation that beats state-of-the-art neural operators.

Marijn Ruiter, Miguel Aguiar, Jake Rap +2

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Scientific Discovery & Drug Design

2w ago

AERR-Nav: Adaptive Exploration-Recovery-Reminiscing Strategy for Zero-Shot Object Navigation

Robots can now nimbly navigate complex, multi-floor environments without prior training, thanks to a new strategy that dynamically switches between exploration, recovery, and memory recall.

Jingzhi Huang, Jing Huang, Junkai Huang +4

Multimodal Models Robotics & Embodied AI Tool Use & Agents

2w ago

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Legged robots can now perform robust parkour with a 1-meter visual blind zone, thanks to a novel architecture that tightly couples vision, proprioception, and physics-based state estimation.

Jialong Liu, Dehan Shen, Yanbo Wen +2

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

A. Humnabadkar +52w ago

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

Synthetic data and virtual environments are rapidly becoming indispensable for autonomous driving, but realizing their full potential requires tackling challenges like Sim2Real transfer and scalable safety validation.

A. Humnabadkar, A. Sikdar, B. Cave +3

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

2w ago

UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

Achieve state-of-the-art semantic 3D reconstruction from sparse views by intelligently pruning redundant Gaussians and blending 2D and 3D semantic cues.

Guibiao Liao, Kaimin Liao, Hua Wang +3

Computer Vision Multimodal Models Robotics & Embodied AI

2w ago

GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes

Synthesizing realistic 6-DOF object manipulation trajectories in complex 3D environments just got a whole lot better with GMT, a multimodal transformer that substantially outperforms existing methods.

Huajian Zeng, Huajian Zeng, Abhishek Saroha +2

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Xingyu Chen +32w ago

Consistency-Driven Dual LSTM Models for Kinematic Control of a Wearable Soft Robotic Arm

Cycle consistency training unlocks stable and accurate inverse kinematics for wearable soft robots, even with their inherent nonlinearities and hysteresis.

Xingyu Chen, Yifu Xiong, Yi Xiong +1

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Jinyu Miao +122w ago

Physics-informed Deep Mixture-of-Koopmans Vehicle Dynamics Model with Dual-branch Encoder for Distributed Electric-drive Trucks

Representing highly nonlinear vehicle dynamics in a lifted linear space via Koopman operator theory enables state-of-the-art long-term state estimation for complex electric trucks.

Jinyu Miao, Pu Zhang, Rujun Yan +10

Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control

Hao Ma +32w ago

LLMs can act as effective action-level supervisors in reinforcement learning, dramatically boosting the sample efficiency of SAC without sacrificing convergence guarantees.

Hao Ma, Zhiqiang Pu, Xiaolin Ai +1

RLHF & Preference Learning Robotics & Embodied AI Tool Use & Agents

Xinyang Gong +52w ago

ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling

Forget rigid physics engines, this badminton RL environment uses real player data to simulate realistic rallies and strategic gameplay.

Xinyang Gong, Bozhou Chen, Yunlong Lu +3

Robotics & Embodied AI Tool Use & Agents World Models & Planning

MIT CSAIL2w ago

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

Heuristic maritime routes lead to extreme fuel waste in nearly 5% of voyages, but this RL approach cuts that risk by almost 10x.

Aniruddha Bora, J. Chalfant, C. Chryssostomidis

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

CMU ML2w ago·also NII

RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.

Zhenhang Yuan, Shenghai Yuan, Lihua Xie

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Risham Sidhu +32w ago

Grid Spatial Understanding: A Dataset for Textual Spatial Reasoning over Grids, Embodied Settings, and Coordinate Structures

LLMs struggle with spatial reasoning in embodied settings and 3D structure identification even when exposed to visual modalities, but fine-tuning smaller models offers a surprisingly effective alternative to brute-force scaling.

Risham Sidhu, Risham Sidhu, J. Hockenmaier +1

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Robotics & Embodied AI

2w ago·also Tsinghua AI, INRIA

DancingBox: A Lightweight MoCap System for Character Animation from Physical Proxies

Animate 3D characters using bananas and plush toys – DancingBox turns everyday objects into motion capture proxies, making animation accessible to novices.

Haocheng Yuan, Adrien Bousseau, Hao Pan +2

Computer Vision Robotics & Embodied AI

2w ago

P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

VLN agents can navigate more effectively by predicting their future states and proactively planning based on forecasted semantic map cues, rather than relying solely on historical context.

Tianfu Li, Tian Li, Wenbo Chen +4

Multimodal Models Robotics & Embodied AI World Models & Planning

2w ago

GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System

Forget training wheels: GoalVLM lets multi-agent robots navigate to any object you describe, no pre-programmed categories needed.

MoniJesu James, M. James, A. Habel +5

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Stanford HAI2w ago

Rapid Adaptation of Particle Dynamics for Generalized Deformable Object Mobile Manipulation

Encoding deformable object dynamics with particle positions unlocks sim-to-real transfer for manipulation tasks, achieving impressive real-world success rates.

Bohan Wu, Roberto Mart'in-Mart'in

SafeLand: Safe Autonomous Landing in Unknown Environments with Bayesian Semantic Mapping

2w ago

Drones can now land safely in complex, unknown environments using only a camera, thanks to a new system that dynamically maps and reacts to surroundings in real-time.

Markus Gross, Andreas Greiner, S. B. Matha +5

Computer Vision Robotics & Embodied AI World Models & Planning

NVIDIA2w ago

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Ditch fixed compute budgets: this new flow-matching method for robotic control adaptively allocates computation, speeding up simple tasks and focusing on complex ones.

Zunzhe Zhang, R. Huang, Runhan Huang +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Akshat Rana +32w ago

SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations

Scene graphs plus LLMs let robots ask clarifying questions, boosting multi-agent task success by 15%.

Akshat Rana, Peeyush Agarwal, Krishan Rana +1

Reasoning & Chain-of-Thought Robotics & Embodied AI Tool Use & Agents

Gaotian Wang +32w ago

ManiDreams: An Open-Source Library for Robust Object Manipulation via Uncertainty-aware Task-specific Intuitive Physics

ManiDreams lets robots handle real-world uncertainty in manipulation tasks without retraining, outperforming standard RL baselines under various perturbations.

Gaotian Wang, Kejia Ren, Andrew S. Morgan +1

Open-Source Models & Weights Robotics & Embodied AI World Models & Planning

Lukas Cha +52w ago

Multi-material Direct Ink Writing and Embroidery for Stretchable Wearable Sensors

Forget rigid circuits - this new method seamlessly weaves stretchable sensors directly into clothing using a clever combo of 3D printing and embroidery.

Lukas Cha, Ryman Hashem, R. Prakash +3

Robotics & Embodied AI Scientific Discovery & Drug Design

Nikhil Gosala +32w ago

Sparse3DTrack: Monocular 3D Object Tracking Using Sparse Supervision

Unlock accurate monocular 3D object tracking with minimal annotation: Sparse3DTrack achieves state-of-the-art performance using only a handful of labels per track.

Nikhil Gosala, B. Kiran, S. Yogamani +1

Computer Vision Robotics & Embodied AI

Ruixiang Wang +52w ago

EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards

Robot world models can be significantly improved by directly rewarding them for generating videos that lead to physically plausible robot actions, even if the videos themselves contain visual artifacts.

Ruixiang Wang, Qingming Liu, Yueci Deng +3

Computer Vision Robotics & Embodied AI World Models & Planning

Juan Wachs2w ago

Final Report for the Workshop on Robotics&AI in Medicine

A national center focused on AI and robotics in medicine could be the key to unlocking the transformative potential of these technologies in healthcare.

Juan Wachs

Natural Language Processing Robotics & Embodied AI Scientific Discovery & Drug Design

Yash Kulkarni +132w ago

A Single-Fiber Optical Frequency Domain Reflectometry (OFDR)-Based Shape Sensing of Concentric Tube Steerable Drilling Robots

Continuous, high-resolution shape sensing in steerable drilling robots is now possible without directly embedding sensors on the instrument surface, thanks to a clever OFDR-based assembly.

Yash Kulkarni, Y. Kulkarni, Mobina Tavangarifard +11

Robotics & Embodied AI Scientific Discovery & Drug Design

Khai Yi Chin +62w ago

Huddle: Parallel Shape Assembly using Decentralized, Minimalistic Robots

Forget centralized control: this algorithm lets swarms of robots build complex shapes with only local communication and no global positioning.

Khai Yi Chin, K. Chin, Tingwei Meng +4

Distributed Systems & Hardware Robotics & Embodied AI

Adam Dai +62w ago

Full Stack Navigation, Mapping, and Planning for the Lunar Autonomy Challenge

A complete autonomy stack enables centimeter-level localization and mapping on the moon, even without GPS.

Adam Dai, Asta Wu, Keidai Iiyama +4

Computer Vision Robotics & Embodied AI World Models & Planning

BAIR2w ago·also Microsoft Research

Offload or Overload: A Platform Measurement Study of Mobile Robotic Manipulation Workloads

Running robotic manipulation workloads entirely onboard kills robot batteries, but offloading to the cloud tanks accuracy due to network latency, revealing a critical compute placement trade-off.

Sara Pohland, Sara Pohland, Xenofon Foukas +10

Distributed Systems & Hardware Inference & Quantization Robotics & Embodied AI

2w ago·also Huawei, Northwestern, ZJU

SpiderCam: Low-Power Snapshot Depth from Differential Defocus

SpiderCam shatters power consumption barriers for FPGA-based 3D cameras, achieving sub-Watt operation while maintaining real-time performance.

Marcos A. Ferreira, Tianao Li, John Mamish +3

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

2w ago

Symmetry-Reduced Physics-Informed Learning of Tensegrity Dynamics

Exploiting geometric symmetries in tensegrity structures slashes computational cost and boosts accuracy in physics-informed neural networks.

Robotics & Embodied AI Scientific Discovery & Drug Design Training Efficiency & Optimization

Ting Gao +52w ago

Flow Matching Policy with Entropy Regularization

Ditch slow diffusion policies: FMER achieves 7x faster training and superior performance in sparse reward RL by using flow matching and a tractable entropy regularization term.

Ting Gao, Stavros Orfanoudakis, Nan Lin +3

Robotics & Embodied AI Training Efficiency & Optimization

Sinan Ibrahim +52w ago·also Research Center for Digital Engineering

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

Finally, a rigorous RL benchmark: generate environments with *provably* optimal policies, enabling controlled algorithm evaluation against ground truth.

Sinan Ibrahim, Grégoire Ouerdane, Hadi Salloum +3

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Felix Schur2w ago

Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity

Demonstrator diversity unlocks the ability to learn latent actions and dynamics from offline RL data, even without explicit action labels.

Felix Schur

SegFly: A 2D-3D-2D Paradigm for Aerial RGB-Thermal Semantic Segmentation at Scale

Sai Bharadhwaj Matha +42w ago

Unlock scalable aerial scene understanding with SegFly, a massive RGB-T dataset generated via a novel 2D-3D-2D label propagation technique that requires minimal manual annotation.

Sai Bharadhwaj Matha, Rui Song, Viswanathan Muthuveerappan +2

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Tsinghua AI2w ago·also CAS, Northwestern

Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

Ditch the diffusion vs. autoregressive debate: this VLA framework uses diffusion to *draft* actions and an autoregressive model to *verify* them, boosting real-world success by nearly 20%.

Chen Zhao, Zhuoran Wang, Shifeng Bao +5

Computer Vision Multimodal Models Robotics & Embodied AI

Jaein Kim +32w ago

Learning Coordinate-based Convolutional Kernels for Continuous SE(3) Equivariant and Efficient Point Cloud Analysis

Achieve SE(3) equivariance and memory scalability in point cloud analysis with coordinate-based kernels, outperforming state-of-the-art equivariant methods on diverse tasks.

Jaein Kim, Hee Bin Yoo, Dong-Sig Han +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

2w ago·also COWARobot Co. Ltd, Hohai

VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm

By cleverly turning novel view synthesis into a self-supervised inpainting problem, VisionNVS eliminates the need for ground truth images of novel views, outperforming LiDAR-dependent baselines.

Hongbo Lu, Chenghao He, Fan Liu +3

Computer Vision Robotics & Embodied AI World Models & Planning

School of Computer Science and Technology2w ago

Universal Skeleton Understanding via Differentiable Rendering and MLLMs

Unlock the power of MLLMs for structured data like human skeletons with a differentiable rendering approach that allows end-to-end training.

Ziyi Wang, Xinshun Wang, Yang Tang +2

Computer Vision Multimodal Models Robotics & Embodied AI

Shuyao Shi +12w ago·also Corresponding author

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

By fusing IMU-derived egomotion with visual data, Motion-MLLM lets MLLMs achieve SOTA 3D scene understanding with 40% less compute.

Shuyao Shi, Kang G. Shin

Computer Vision Multimodal Models Robotics & Embodied AI

Jinho Park +22w ago·also GHz

AdaRadar: Rate Adaptive Spectral Compression for Radar-based Perception

Achieve 100x radar data compression with only a 1% performance drop by adaptively pruning DCT coefficients based on detection confidence gradients.

Jinho Park, Se Young Chun, Mingoo Seok

Inference & Quantization Robotics & Embodied AI

Chen Liyi +42w ago

Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass

Forget waiting minutes for iterative optimization – Omni-3DEdit performs diverse 3D editing tasks in a single forward pass.

Chen Liyi, Wang Pengfei, Zhang Guowen +2

Computer Vision Multimodal Models Robotics & Embodied AI

2w ago·also D features into accurate

ReLaGS: Relational Language Gaussian Splatting

Skip the costly training and go straight to open-vocabulary 3D reasoning with ReLaGS, which builds a 3D semantic scene graph from language-distilled Gaussians.

Yaxu Xie, Alireza Javanmardi, Christen Millerdurai +4

Computer Vision Multimodal Models Robotics & Embodied AI

Tsinghua AI2w ago

UAV-CB: A Complex-Background RGB-T Dataset and Local Frequency Bridge Network for UAV Detection

A new RGB-T dataset and frequency-aware network exposes the surprising limitations of existing UAV detectors when faced with real-world camouflage and complex backgrounds.

Shenghui Huang, Menghao Hu, Longkun Zou +3

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Computational Neuroscience Unit2w ago·also Ospedale Santa Lucia, Sheffield

Unified Policy Value Decomposition for Rapid Adaptation

Achieve zero-shot adaptation to new tasks in complex control environments by learning a shared low-dimensional goal embedding that unifies policy and value function representations.

Cristiano Capone, Luca Falorsi, Andrea Ciardiello +1

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

CMU ML2w ago

RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids

Forget retargeting: RoboForge's physics-optimized pipeline lets humanoids nail text-guided locomotion with better accuracy and stability.

Xichen Yuan, Zhe Li, Bofan Lyu +6

Natural Language Processing Robotics & Embodied AI

Adam Dai +22w ago

Neural Radiance Maps for Extraterrestrial Navigation and Path Planning

NeRFs can now guide extraterrestrial rovers around unexpected obstacles, thanks to a novel planning framework that blends local observations with global terrain understanding.

Adam Dai, Shubh Gupta, Grace Gao

Computer Vision Robotics & Embodied AI World Models & Planning

Zhou Fang +52w ago

ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action Models

Robot control gets a whole lot faster: ProbeFlow slashes action decoding latency by 14.8x in Vision-Language-Action models, all without retraining.

Zhou Fang, Jiaqi Wang, Yi Zhou +3

Inference & Quantization Multimodal Models Robotics & Embodied AI

Nicola J. Müller +42w ago

Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix)

Q-value policies, traditionally outperformed by state-value policies in planning, can surpass them with the right regularization, offering a faster alternative for policy evaluation.

Nicola J. Müller, Moritz Oster, Isabel Valera +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

2w ago

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Panoramic 3D reconstruction gets a boost with PanoVGGT, a Transformer that handles spherical distortions and global-frame ambiguity to deliver state-of-the-art accuracy in a single pass.

Yijing Guo, Mengjun Chao, Luo Wang +5

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

University2w ago

Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation

Gesture-aware pretraining unlocks significant improvements in 3D hand pose estimation, proving that semantic gesture information acts as a powerful inductive bias.

Rui Hong, Jana Kosecka

Computer Vision Multimodal Models Robotics & Embodied AI

2w ago

WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation

By co-training flow and retrieval networks, WINFlowNets eliminates the need for pre-training, unlocking CFlowNets for dynamic robotic environments where data is scarce.

Zahin Sufiyan, Zahin Sufiyan, Shadan Golestan +7

Robotics & Embodied AI Training Efficiency & Optimization

Adam Dai +22w ago

Visual SLAM with DEM Anchoring for Lunar Surface Navigation

Lunar rovers can now navigate more accurately across vast distances thanks to a new SLAM system that uses readily available Digital Elevation Models to correct visual drift.

Adam Dai, Guillem Casadesus Vila, Grace Gao

Computer Vision Robotics & Embodied AI

Angen Ye +232w ago

GigaWorld-Policy: An Efficient Action-Centered World--Action Model

Robots can now plan 9x faster and achieve significantly higher success rates by decoupling action prediction from video generation in World-Action Models.

Angen Ye, Boyuan Wang, Chaojun Ni +21

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Gaoge Han +72w ago

KineVLA: Towards Kinematics-Aware Vision-Language-Action Models with Bi-Level Action Decomposition

Achieve more precise robot control by explicitly disentangling high-level goals from low-level kinematic instructions.

Gaoge Han, Zhengqing Gao, Ziwen Li +5

Computer Vision Multimodal Models Robotics & Embodied AI

2w ago

Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs

Generalizing RL to continuous state and action spaces just got easier: this paper introduces an operator-theoretic framework and PPO-type algorithms that ditch finite-state assumptions.

Abhishek Gupta, Aditya Mahajan

Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow

Tsinghua AI2w ago

A new mixed reality testbed lets you plug real human drivers into a CAV simulation, offering unprecedented realism for testing autonomous vehicle interactions.

Jianghong Dong, Jiawei Wang, Chunying Yang +5

Specification-Aware Distribution Shaping for Robotics Foundation Models

2w ago

Guaranteeing robot safety and task completion just got easier: this method enforces complex temporal logic constraints on pre-trained robotics models without any fine-tuning.

Sadık Bera Yüksel, Sadik Bera Yuksel, Derya Aksaray

Natural Language Processing Robotics & Embodied AI World Models & Planning

CMU ML2w ago

OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms

LLMs can navigate complex 3D environments more effectively and with far fewer tokens by using a hierarchical scene graph representation derived from omnidirectional sensor data.

Zhongyuang Liu, Zhongyuan Liu, Min He +6

Multimodal Models Robotics & Embodied AI Tool Use & Agents

2w ago·also McGill, Purdue

DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving

Autonomous vehicles can now leverage the rich semantic understanding of VLMs for safer driving without the computational overhead, thanks to a clever training strategy that distills VLM knowledge into a real-time RL policy.

Zilin Huang, Zihao Sheng, Zhengyan Wan +3

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

2w ago

DexViTac: Collecting Human Visuo-Tactile-Kinematic Demonstrations for Contact-Rich Dexterous Manipulation

Policies trained on DexViTac's multimodal dataset achieve over 85% success in real-world dexterous manipulation, proving that high-fidelity tactile data unlocks a new level of robotic dexterity.

Xitong Chen, Yifeng Pan, Min Li +1

Computer Vision Multimodal Models Robotics & Embodied AI

Tsinghua AI2w ago

From Optimizable to Interactable: Mixed Digital Twin-Empowered Testing of Vehicle-Infrastructure Cooperation Systems

Human unpredictability is now a feature, not a bug: a mixed-reality testing framework leverages human interaction to generate high-quality corner cases for vehicle-infrastructure cooperation systems.

Jianghong Dong, Chunying Yang, Mengchi Cai +4

AgentVLN: Towards Agentic Vision-and-Language Navigation

Zihao Xin +72w ago

VLMs can now drive embodied agents to navigate complex environments with unprecedented efficiency, thanks to a novel framework that bridges the gap between 2D semantic understanding and 3D spatial reasoning.

Zihao Xin, Wentong Li, Yixuan Jiang +5

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Md. Mehedi Hasan +42w ago·also Noakhali Science and Technology

ReDAG-RT: Global Rate-Priority Scheduling for Real-Time Multi-DAG Execution in ROS 2

ROS 2's real-time performance gets a major boost with ReDAG-RT, a user-space scheduler that cuts deadline misses by up to 30% without touching the core ROS 2 API.

Md. Mehedi Hasan, Rafid Mostafiz, Bikash Kumar Paul +2

Distributed Systems & Hardware Robotics & Embodied AI

Yanchuan Tang +82w ago

Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Model

Don't let your robot's brief moment of panic get lost in the noise – this new uncertainty method spotlights those critical spikes to predict failures before they happen.

Yanchuan Tang, Taowen Wang, Yuefei Chen +6

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Zihao Zheng +112w ago·also Corresponding Author

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Robots can think (and act) twice as fast: HeiSD's hybrid speculative decoding turbocharges embodied agents by intelligently switching between draft and retrieval strategies.

Zihao Zheng, Z. Mao, Zhihao Mao +9

Inference & Quantization Multimodal Models Robotics & Embodied AI

Hashini Senaratne +62w ago

HRI-SA: A Multimodal Dataset for Online Assessment of Human Situational Awareness during Remote Human-Robot Teaming

Human-robot teams can get a boost: eye-tracking data alone can predict when a human teammate is struggling to understand the robot's situation with nearly 90% recall.

Hashini Senaratne, Richard Attfield, S. Widhanapathirana +4

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

Guillem Casadesus Vila +22w ago

Semantic Segmentation and Depth Estimation for Real-Time Lunar Surface Mapping Using 3D Gaussian Splatting

Ditch LiDAR: 3D Gaussian Splatting, combined with semantic segmentation and stereo depth, enables real-time lunar mapping with centimeter-level accuracy.

Guillem Casadesus Vila, Adam Dai, Grace Gao

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Mar 17, 2026

2w ago·also China Academy of Space Technology, Harvard

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Stop wasting compute: this RL-trained orchestration policy adaptively decides when your embodied agent should reason with an LLM, slashing latency and boosting task success compared to fixed strategies.

Jun Liu, Pu Zhao, Zhenglun Kong +12

Reasoning & Chain-of-Thought Robotics & Embodied AI Tool Use & Agents

Krishamsu Subedi Chhetri +42w ago

Development of Low-Cost and Bidirectional Syringe Pumps for Soft Robotics Applications

A $50 DIY syringe pump enables precise bidirectional control of soft robots, unlocking new possibilities for complex shape-shifting behaviors.

Krishamsu Subedi Chhetri, Aryan Mayor, Elise Corbin +2

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Mutian Xu +42w ago

Kinema4D unlocks zero-shot transfer in embodied AI by simulating physically plausible 4D robot-world interactions, moving beyond rigid 2D constraints.

Mutian Xu, Tianbao Zhang, Tianqi Liu +2

Computer Vision Robotics & Embodied AI World Models & Planning

2w ago

Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots

User-facing guardrails for LLM-enabled robots can balance flexibility and safety by offering constrained choices and clear recourse, rather than open-ended value settings.

Carmen Ng

Constitutional AI & AI Ethics RLHF & Preference Learning Robotics & Embodied AI+1

2w ago·also TRI

DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models

Fine-tuning Vision-Language Model planners for robotic manipulation is now significantly more efficient and safer thanks to a novel framework that leverages video world models to simulate real-world physics.

Emily Yue-Ting Jia, E. Jia, Weiduo Yuan +5

Multimodal Models Robotics & Embodied AI World Models & Planning

Joshua Raymond Bettles +72w ago

Coverage First Next Best View for Inspection of Cluttered Pipe Networks Using Mobile Manipulators

Autonomous robots can now more safely and effectively inspect cluttered, radioactive environments by combining information gain-based planning with stochastic obstacle avoidance.

Joshua Raymond Bettles, Joshua Bettles, Jiaxu Wu +5

Computer Vision Robotics & Embodied AI World Models & Planning

Tyler J. Kovach +72w ago

FAlCon: A unified framework for algorithmic control of quantum dot devices

Finally, a unified software framework promises to tame the wild west of quantum dot device tuning, enabling researchers to share and adapt characterization routines across labs.

Tyler J. Kovach, Daniel Schug, Zach D. Merino +5

Code Generation & Program Synthesis Robotics & Embodied AI Scientific Discovery & Drug Design

Carmen Scheidemann +52w ago

Beyond Cybathlon: On-demand Quadrupedal Assistance for People with Limited Mobility

A quadrupedal robot can now provide on-demand assistance to wheelchair users, offering a more agile and less intrusive alternative to fixed robotic arms.

Carmen Scheidemann, C. Scheidemann, Andrei Cramariuc +3

Contingency-Aware Planning via Certified Neural Hamilton-Jacobi Reachability

2w ago

Neural approximations of Hamilton-Jacobi reachability can now be formally certified for safety, enabling provably safe robot navigation in unknown environments.

Kasidit Muenprasitivej, Derya Aksaray

Learning Lineage-guided Geodesics with Finsler Geometry

2w ago·also New York Genome Center

By blending geometry with classification, this new Finsler metric lets you trace trajectories more accurately through complex systems, like cell development, where you have both spatial data and lineage trees.

Aaron Zweig, Mingxuan Zhang, David A. Knowles +1

Robotics & Embodied AI Scientific Discovery & Drug Design

2w ago·also Lancaster University

PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics

PyPhonPlan offers a new open-source toolkit to simulate speech dynamics with neurally-grounded representations, enabling researchers to model interactive speech production and perception loops.

Sam Kirkham

Robotics & Embodied AI Speech & Audio World Models & Planning

Mahdis Rabbani +22w ago

Asymmetric Nash Seeking via Best Response Maps: Global Linear Convergence and Robustness to Inexact Reaction Models

You can provably find Nash equilibria even when one player only knows the *reaction* of the other, not their full objective.

Mahdis Rabbani, Navid Mojahed, S. Nazari

FastLoop: Parallel Loop Closing with GPU-Acceleration in Visual SLAM

2w ago

Visual SLAM loop closure just got a whole lot faster: FastLoop achieves up to 3x speedups by unleashing the power of GPU parallelism.

Soudabeh Mohammadhashemi, Shishir Gopinath, Kimia Khabiri +3

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

E. Daneshmand +42w ago

SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion

Rank-1 LoRA fine-tuning can safely and efficiently adapt simulated locomotion policies to real-world robots, slashing fine-tuning time by nearly half while maintaining safety.

E. Daneshmand, Shafeef Omar, Glen Berseth +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

K. Uno +102w ago

LIMBERO: A Limbed Climbing Exploration Robot Toward Traveling on Rocky Cliffs

A 10kg quadrupedal robot, LIMBERO, can now climb steep, rocky surfaces thanks to a novel gripper design that achieves exceptional grasping performance with minimal weight.

K. Uno, Kentaro Uno, Masazumi Imai +8