Robotics & Embodied AI - Weekly Roundup

Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System

Yifei Wei +7Apr 27, 2026·also Quantstamp

Decomposing robotic manipulation into coarse and fine-grained actions isn't just conceptually cleaner—it actually unlocks a sweet spot where learning difficulty is balanced, boosting performance.

Yifei Wei, Linqing Zhong, Yi Liu +5

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 27, 2026·also Tsinghua AI, The Key Laboratory of Road and Traffic Engineering, UCF, USTC

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.

Bowen Jian, Rongjie Yu, Hong Wang +2

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

Hikmat Karimov +1Apr 27, 2026

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

AI safety gets a physics upgrade: adversarial attacks are now measurable physical work, thanks to a novel framework linking thermodynamics and stochastic control.

Hikmat Karimov, Rahid Z. Alekberli

Constitutional AI & AI Ethics Robotics & Embodied AI Scalable Oversight & Alignment Theory

All Papers (100)

Apr 27, 2026

Eva Krueger +2Apr 27, 2026

An analysis of sensor selection for fruit picking with suction-based grippers

Knowing *when* to listen to *which* sensor lets robotic fruit pickers predict failures before they happen, boosting accuracy to 90% even with minimal sensor sets.

Eva Krueger, Marcus Rosette, Joseph R. Davidson

Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System

Yifei Wei +7Apr 27, 2026·also Quantstamp

Decomposing robotic manipulation into coarse and fine-grained actions isn't just conceptually cleaner—it actually unlocks a sweet spot where learning difficulty is balanced, boosting performance.

Yifei Wei, Linqing Zhong, Yi Liu +5

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 27, 2026·also Tsinghua AI, The Key Laboratory of Road and Traffic Engineering, UCF, USTC

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.

Bowen Jian, Rongjie Yu, Hong Wang +2

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

Hikmat Karimov +1Apr 27, 2026

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

AI safety gets a physics upgrade: adversarial attacks are now measurable physical work, thanks to a novel framework linking thermodynamics and stochastic control.

Hikmat Karimov, Rahid Z. Alekberli

Constitutional AI & AI Ethics Robotics & Embodied AI Scalable Oversight & Alignment Theory

Haosong Xiao +4Apr 27, 2026

Infrastructure-Guided Connectivity-Enhanced Road Crack Detection and Estimation

Road crack detection gets a boost by having the infrastructure tell the car where to look.

Haosong Xiao, Yamini Ramesh, R. Shukla +2

VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis

Antoine P. Leeman +3Apr 27, 2026

Safe visuomotor control from high-resolution images is now practical at scale, thanks to a learned visual abstraction coupled with an efficient SLS solver.

Antoine P. Leeman, Shuyu Zhan, M. Zeilinger +1

WildLIFT: Lifting monocular drone video to 3D for species-agnostic wildlife monitoring

Vandita Shukla +3Apr 27, 2026

Unlock species-agnostic 3D tracking from standard drone footage with WildLIFT, turning 2D video into structured, viewpoint-aware representations for richer wildlife analysis.

Vandita Shukla, Fabio Remondino, Blair R. Costelloe +1

IntentVLM: Open-Vocabulary Intention Recognition through Forward-Inverse Modeling with Video-Language Models

Hamed Rahimi +4Apr 27, 2026

Robots can now understand human intentions with near-human accuracy thanks to a new video-language model that reasons about goals like a human.

Hamed Rahimi, Clémence Grislain, Adrien Jacquet Cretides +2

Computer Vision Multimodal Models Robotics & Embodied AI

Shaunak Kolhe +13Apr 27, 2026

Pushing Radar Odometry Beyond the Pavement: Current Capabilities and Challenges

Radar odometry, typically confined to urban settings, can be pushed off-road with simple adaptations like IMU preintegration, but still faces significant challenges in unstructured environments.

Shaunak Kolhe, Shaunak Kolhe, Peng Jiang +11

ARETE: Attention-based Rasterized Encoding for Topology Estimation using HSV-transformed Crowdsourced Vehicle Fleet Data

Daniel Fritz +4Apr 27, 2026

Encoding vehicle trajectory directionality via HSV rasterization unlocks accurate lane-level HD map generation from crowdsourced data using a DETR architecture.

Daniel Fritz, Dimitrios Lagamtzis, M. Mink +2

TEACar: An Open-Source Autonomous Driving Platform

Zhongzheng Zhang +7Apr 27, 2026

An open-source autonomous driving platform offers researchers a modular, scalable, and cost-effective alternative to complex and restrictive hardware validation setups.

Zhongzheng Zhang, Maxwell Ruyle, A. Kappes +5

Computer Vision Open-Source Models & Weights Robotics & Embodied AI

University of GuilanApr 27, 2026

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Robots can now "see" and understand doorways, enabling more robust navigation in complex indoor environments.

Ali Tourani, Miguel Fernández-Cortizas, Saad Ejaz +4

Real-time windrow detection from onboard tractor sensors for automated following

Lorenz Gunreben +4Apr 27, 2026

Low-cost stereo vision can rival LiDAR for real-time windrow detection, paving the way for more accessible autonomous farming solutions.

Lorenz Gunreben, Nico Heider, Sebastian Zürner +2

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Seyyid Osman Sevgili +4Apr 27, 2026

An Automatic Ground Collision Avoidance System with Reinforcement Learning

AI can now pilot jet trainers to avoid ground collisions, even with limited visibility.

Seyyid Osman Sevgili, Atahan Cilan, Mahir Demir +2

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Yifan Xie +5Apr 27, 2026

Robots can now leverage human intuition for manipulation tasks, learning from a massive video dataset to improve motion plausibility and robustness, even when conditions change.

Yifan Xie, Yuan Wang, Guangyu Chen +3

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

Apr 27, 2026

Computational Design and Co-Robotic Fabrication for Material Reuse in Architecture

Imagine buildings that adapt to the materials available, not the other way around: this framework uses robots to make it a reality.

Arash Adel, Daniel Ruan, Ruxin Xie

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

W. Z. E. Amri +1Apr 27, 2026·also Leibniz Universität Hannover

SPLIT: Separating Physical-Contact via Latent Arithmetic in Image-Based Tactile Sensors

Simulate once, deploy anywhere: SPLIT lets you train tactile perception models on synthetic data and transfer them across different sensors without retraining.

W. Z. E. Amri, Nicolás Navarro-Guerrero

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Pengcheng Wang +9Apr 27, 2026

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

Discrete diffusion policies, typically used for image generation, turn out to be surprisingly effective and efficient asynchronous executors for robots acting in dynamic environments, outperforming traditional continuous control methods.

Pengcheng Wang, Kaiwen Hong, Kaiwen Hong +7

asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics

Fang Wan +7Apr 27, 2026·also Shanghai AI Lab

Zero-shot Sim2Real transfer for a humanoid ballbot is now possible thanks to a friction-aware RL framework and high-fidelity simulation that models omni-wheel mechanics.

Fang Wan, Guangyi Huang, Tianyu Wu +5

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

NVIDIAApr 27, 2026

MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives

Forget clunky animation pipelines – MotionBricks lets you assemble real-time, high-quality character motions like LEGOs, even controlling robots.

Tingwu Wang, Olivier Dionne, Mick Ruyter +13

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Xincheng Cao +8Apr 27, 2026

Hybrid A*-Based Reverse Path-Planning of a Vehicle with Trailer System

Successfully backing up a trailer without jackknifing or hitting anything just got easier thanks to a new path-planning algorithm that respects the physics of articulated vehicles.

Xincheng Cao, Haochong Chen, Bilin Aksun-Guvenc +6

K. Grover, Pratham Gupta, Jan Kvret'insk'y

Fondazione Bruno KesslerApr 27, 2026·also IISc

Logic of Fuzzy Paths

Separating geometry from logic with fuzzy path constraints yields motion planning specifications that are both more intuitive for humans and more amenable to learning from demonstrations.

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

Zhendong Wang +1Apr 27, 2026

Generalizable Friction Coefficient Estimation via Material Embedding and Proxy Interaction Modeling

Unlock accurate friction estimation for any material pairing with just a handful of proxy material measurements, slashing experimental costs.

Zhendong Wang, Huamin Wang

Robotics & Embodied AI Scientific Discovery & Drug Design

Kai Yang +8Apr 27, 2026

AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

Network jitter in cloud-based robot control can be overcome by converting temporal lag into spatial pose offsets, restoring the VLA's original geometric intent without fine-tuning.

Kai Yang, Zedong Chu, Yingnan Guo +6

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Sheng Zhong +9Apr 27, 2026·also NUDT

Event-based SLAM Benchmark for High-Speed Maneuvers

Current event-based SLAM algorithms falter when faced with the full complexity of high-speed, 6-DoF maneuvers, highlighting a gap between current capabilities and the promise of event cameras.

Sheng Zhong, Junkai Niu, Guillermo Gallego +7

Computer Vision Eval Frameworks & Benchmarks Robotics & Embodied AI

Shubham Sawarkar +7Apr 27, 2026

Sliding Mode Control for Safe Trajectory Tracking with Moving Obstacles Avoidance: Experimental Validation on Planar Robots

Achieve robust trajectory tracking and moving obstacle avoidance for diverse mobile robots, including Ackermann-steered vehicles, by combining sliding mode control with a novel collision cone control barrier function.

Shubham Sawarkar, Shubham Sawarkar, P Sangeerth +5

FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

Zihao Zheng +9Apr 27, 2026

Frequency domain analysis unlocks 1.59x speedups in Vision-Language-Navigation by enabling optimal token caching, a feat previously limited by visual domain approaches.

Zihao Zheng, Xingyu Zhou, Z. Mao +7

Inference & Quantization Multimodal Models Robotics & Embodied AI

Driss Choukri +3Apr 27, 2026

Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions

6G-enabled Internet of Everything promises a unified intelligent ecosystem, but faces critical scalability, security, and privacy challenges that demand innovative research.

Driss Choukri, Essaid Sabir, Elmahdi Driouh +1

Recommendation & Information Retrieval Robotics & Embodied AI Tool Use & Agents

Zirui Chen +2Apr 27, 2026

Guiding Vector Field Generation via Score-based Diffusion Model

Score-based diffusion models can now generate robust guiding vector fields for robotic path following, even when traditional methods stumble on unordered, branching, or probabilistically-generated paths.

Zirui Chen, Shiliang Guo, Shiyu Zhao

Pedestrians play chicken with an autonomous vehicle

Rakshit Soni +3Apr 27, 2026

Autonomous vehicles can learn to navigate pedestrian interactions more efficiently by subtly threatening collisions, as humans do, without compromising safety.

Rakshit Soni, Rakshit Soni, Charles W. Fox +1

Constitutional AI & AI Ethics Robotics & Embodied AI

Xiaohua Zhao +3Apr 27, 2026

Projected Attainable Speed Space: A Driving Efficiency Metric Connecting Instantaneous Evaluation to Travel Time

Autonomous vehicles can drive more efficiently by using a new metric that links real-time acceleration decisions to overall travel time.

Xiaohua Zhao, Zhaowei Huang, Chen Chen +1

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Kaijun Zhou +5Apr 27, 2026

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

Edge NPUs can outperform flagship GPUs in cost and energy efficiency for on-robot VLA model deployment, but only with hardware-aware optimizations that tackle the models' distinct compute and memory-bound phases.

Kaijun Zhou, Qiwei Chen, Dajiang Peng +3

Inference & Quantization Multimodal Models Robotics & Embodied AI

Rakshit Soni +9Apr 27, 2026

OpenPodcar2: a robust, ROS2 vehicle for self-driving research

Democratizing self-driving research, OpenPodcar2 offers a robust, low-cost (≈$7k new, $2k used), open-source autonomous vehicle platform ready for ROS2 integration and real-world deployment.

Rakshit Soni, Rakshit Soni, Chris Waltham +7

Distributed Systems & Hardware Open-Source Models & Weights Robotics & Embodied AI

James D. Motes +2Apr 27, 2026

Multi-Robot Motions in Milliseconds: Vector-Accelerated Primitives for Sampling-Based Planning

Multi-robot motion planning can be accelerated by over 850X, enabling solutions in milliseconds, by exploiting SIMD parallelism with vector-accelerated primitives.

James D. Motes, Marco Morales, Nancy M. Amato

$M^2$-VLA: Boosting Vision-Language Models for Generalizable Manipulation via Layer Mixture and Meta-Skills

Siyao Xiao +11Apr 27, 2026·also Pinterest

Forget end-to-end fine-tuning: $M^2$-VLA unlocks the power of generalized VLMs for robotic manipulation by intelligently mixing layers and incorporating meta-skills.

Siyao Xiao, Yuhong Zhang, Zhifang Liu +9

Computer Vision Multimodal Models Robotics & Embodied AI

Zaid Mahboob +2Apr 27, 2026

Betting for Sim-to-Real Performance Evaluation

Ditch expensive robot trials: a novel "betting" framework lets you accurately predict real-world robot performance using only cheap simulations.

Zaid Mahboob, Yujia Chen, Bowen Weng

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Apr 27, 2026·also Koç University, SFU

Designing Robots to Support Parent-Child Connections: Opportunities Through Robot-Mediated Communication

Robots can strengthen family bonds, but only if designers carefully consider the robot's initiative and communication timing, as families experience tensions around privacy and control.

Michael F. Xu, Bengisu Cagiltay, Yaxin Hu +2

Natural Language Processing Robotics & Embodied AI

Apr 27, 2026

Supporting Family-School Partnerships with Robot-Facilitated Home-Based Activities

A social robot can successfully integrate into family life to support family-school partnerships, but parental facilitation styles significantly impact its effectiveness.

Michael F. Xu, Qiyao Yang, Heather Kirkorian +1

Natural Language Processing Robotics & Embodied AI

Zhengru Fang +7Apr 27, 2026

Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations

Visual RL agents can recover near-perfect performance even under severe, dynamically changing visual corruptions by learning to disentangle task-relevant foreground from perturbation artifacts.

Zhengru Fang, Yu Guo, Fei Liu +5

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Tobias A. Farger +2Apr 27, 2026

Exploiting Differential Flatness for Efficient Learning-based Model Predictive Control of Constrained Multi-Input Control Affine Systems

Achieve real-time learning-based control of complex robotic systems by exploiting differential flatness for dramatic speedups in MPC computation.

Tobias A. Farger, Adam W. Hall, Angela P. Schoellig

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Apr 27, 2026·also CAS, SUSTech, United Nova Technology

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

Forget slow, multi-step action generation: CF-VLA's coarse-to-fine approach slashes latency by 75% while boosting real-robot success rates to a new high of 83%.

Fan Du, Feng Yan, Jianxiong Wu +6

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Andreas Kouloumpris +3Apr 27, 2026·also KIOS Research and Innovation Center of Excellence

Exact, Efficient, and Reliable Multiobjective and Multiconstrained IoT Workflow Scheduling in Edge–Hub–Cloud Cyber–Physical Systems

Ditch the heuristics: MILP delivers up to 30% better latency, energy, and reliability for IoT workflow scheduling in edge-hub-cloud systems.

Andreas Kouloumpris, Georgios L. Stavrinides, Maria K. Michael +1

Distributed Systems & Hardware Robotics & Embodied AI

Apr 26, 2026

Qi Li +7Apr 26, 2026

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

VLA models introduce a fundamentally new risk landscape compared to LLMs or robotics alone, demanding a unified safety perspective that considers irreversible physical consequences and multimodal attack surfaces.

Qi Li, Bo Yin, Weiqi Huang +5

Multimodal Models Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Simone Mosco +2Apr 26, 2026

Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation

Current 3D anomaly detection struggles with real-world complexity, but this new approach directly models inlier feature distributions, achieving state-of-the-art results and offering a more robust solution.

Simone Mosco, Daniel Fusaro, Alberto Pretto

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Apr 25, 2026

Chathurangi Shyalika +2Apr 25, 2026

Neurosymbolic grounding of LLMs in telemetry and knowledge graphs slashes expert-rated overclaims in industrial maintenance explanations by 93%, making AI assistants far more trustworthy in safety-critical settings.

Chathurangi Shyalika, Dhaval Patel, Amit P. Sheth

Interpretability & Mechanistic Interp Natural Language Processing Robotics & Embodied AI+1

Apr 24, 2026

Yaxuan Li +4Apr 24, 2026

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

Forget slow, expensive real-world trials: dWorldEval's discrete diffusion world model lets you evaluate robot policies across thousands of environments and tasks with unprecedented speed and accuracy.

Yaxuan Li, Zhongyi Zhou, Yefei Chen +2

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Apr 24, 2026

Adviser–Actor–Critic: A Precision-Oriented Reinforcement Learning Framework for Space Robotics Control

Achieve 30x improvement in attitude control precision by fusing classical control with RL, enabling reliable space robotics.

Donghe Chen, Jiaxuan Yue, Yubin Peng +2

Robotics & Embodied AI Training Efficiency & Optimization

Apr 23, 2026

Noah Jaffe +1Apr 23, 2026

PHOTON: Non-Invasive Optical Tracking of Key-Lever Motion in Historical Keyboard Instruments

Unlock the secrets of historical keyboard performance with PHOTON, a non-invasive optical tracking system that reveals the subtle interplay between performer input and instrument mechanics.

Noah Jaffe, J. Burgoyne

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Yao Zhang +3Apr 23, 2026

Transforming human motion into structured language allows LLMs to achieve unprecedented accuracy in motion understanding without the constraints of traditional encoding methods.

Yao Zhang, Zhu Liu, T. Ploetz +1

Multimodal Models Natural Language Processing Robotics & Embodied AI

Yilang Liu +4Apr 23, 2026

Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

Multi-task RL agents solving related navigation tasks underwater rely on a surprisingly small fraction of their weights (1.5%) to differentiate between tasks.

Yilang Liu, Melvin Laux, M. D. L. Álvarez +2

Interpretability & Mechanistic Interp RLHF & Preference Learning Robotics & Embodied AI

Heng YangApr 23, 2026

Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

Controller design can be effectively framed as inference, enabling efficient trajectory and policy optimization via tempered sampling.

Heng Yang

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Sukesh SubaharanApr 23, 2026

Dynamical Priors as a Training Objective in Reinforcement Learning

RL policies don't have to be temporally incoherent messes: shaping action probabilities with dynamical priors unlocks structured, interpretable decision-making.

Sukesh Subaharan

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Manuscript received April 19Apr 23, 2026

Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

Channel-free HAR is now possible: a single model can perform activity recognition across diverse IoT sensor setups without needing fixed channel arrangements, thanks to metadata-conditioned fusion.

Tatsuhito Hasegawa

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Robotics & Embodied AI

Tsinghua AIApr 23, 2026

MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting

Autonomous vehicles can now plan trajectories 10x faster without sacrificing performance, thanks to a novel architecture that learns complex driving behaviors in latent space during training.

Yining Xing, Zehong Ke, Yiqian Tu +3

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Hao-Yu Hsu +4Apr 23, 2026·also UIUC

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

Imagine reconstructing detailed human motion and scene layouts using just your smartwatch and earbuds – no cameras needed.

Hao-Yu Hsu, Tianhang Cheng, Jing Wen +2

Computer Vision Multimodal Models Robotics & Embodied AI

Vrije Universiteit AmsterdamApr 23, 2026

Can Large Language Models Assist the Comprehension of ROS2 Software Architectures?

Despite the complexity of ROS2 robotics software architectures, LLMs can achieve near-perfect accuracy in answering questions about them, hinting at a powerful new tool for roboticists.

Laura Duits, Bouazza El Moutaouakil, I. Malavolta

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Robotics & Embodied AI

Tsinghua AIApr 23, 2026

Do MLLMs Understand Pointing? Benchmarking and Enhancing Referential Reasoning in Egocentric Vision

MLLMs often *hallucinate* the referent of a pointing gesture, latching onto nearby or salient objects instead of truly understanding spatial semantics.

Chentao Li, Zirui Gao, Mingze Gao +3

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Apr 23, 2026·also Tsinghua AI, Westlake

OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

Achieve millimeter-level accuracy in 3D human body fitting from multi-modal inputs, even with scale distortion common in AI-generated assets.

Zeyu Cai, Yuliang Xiu, Renke Wang +8

Computer Vision Multimodal Models Robotics & Embodied AI

Yanjiao Liu +3Apr 23, 2026

Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction

Frozen LLMs, when fused with spatial scene encodings, can effectively reason about vehicle trajectories, opening new avenues for integrating language-based reasoning into autonomous driving systems.

Yanjiao Liu, Jiawei Liu, Xun Gong +1

Reasoning & Chain-of-Thought Robotics & Embodied AI Tool Use & Agents

Minjoon Park +1Apr 23, 2026

A Compact Peristaltic Pump Based on Magneto-Elastic Hysteresis with Single Pneumatic Control

A single pneumatic input and clever use of magneto-elastic hysteresis can drive a surprisingly simple and effective peristaltic pump.

Minjoon Park, Metin Sitti

Robotics & Embodied AI Scientific Discovery & Drug Design

Hao Sun +7Apr 23, 2026

Instance-level Visual Active Tracking with Occlusion-Aware Planning

DINOv3 representations and diffusion-based planning enable a visual tracker that's both robust to occlusions and discriminative enough to avoid visually similar distractors.

Hao Sun, Kai Zhou, Hao Gao +5

Computer Vision Robotics & Embodied AI World Models & Planning

Nannan Qin +6Apr 23, 2026

SparseGF: A Height-Aware Sparse Segmentation Framework with Context Compression for Robust Ground Filtering Across Urban to Natural Scenes

Compressing expansive contexts like a convex mirror allows deep learning models to achieve robust ground filtering across diverse landscapes, even in complex urban scenes.

Nannan Qin, Pengjie Tao, H. Guan +4

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

William Hunt +4Apr 23, 2026

Effects of Swarm Size Variability on Operator Workload

Small decreases in swarm size leave human operators with elevated workload, even when performance is unaffected, suggesting a "workload residue" effect that designers must address.

William Hunt, A. Landowska, H. Maior +2

A Case Study in Recovery of Drones using Discrete-Event Systems

Liam Burns +6Apr 23, 2026

Guaranteeing swarm drone recovery from faults is now possible with a hybrid discrete-event system that merges high-level supervision with low-level control.

Liam Burns, Dayse M. Cavalcanti, Felipe G. Cabral +4

Distributed Systems & Hardware Robotics & Embodied AI

Apr 23, 2026·also Tsinghua AI, Sheffield

Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

Point-VLMs can learn to see the world as it really is: targeted reward assignment and cross-modal verification nearly close the reality gap in 3D reasoning.

Jingkun Chen, Ru Xu, Mingqi Gao +2

Computer Vision Multimodal Models Robotics & Embodied AI

Maximilian Stralz +3Apr 23, 2026

Task-Driven Co-Design of Heterogeneous Multi-Robot Systems

Optimality guarantees are now possible when jointly optimizing robot design, fleet composition, and task planning for heterogeneous multi-robot systems.

Maximilian Stralz, Meshal Alharbi, Yujun Huang +1

Robotics & Embodied AI Tool Use & Agents World Models & Planning

I. Liu +9Apr 23, 2026

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

Forget brittle visual-history buffers: LoHo-Manip uses a VLM task manager with visual trace prompts to achieve robust long-horizon robotic manipulation through implicit closed-loop replanning.

I. Liu, An-Chieh Cheng, Rui Yan +7

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Songen Gu +7Apr 23, 2026·also D observations into, DGS-based methods [47, Fudan

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

Achieve robust robot manipulation across diverse viewpoints without camera calibration by synthesizing novel views with a geometry-aware video diffusion model.

Songen Gu, Yuhang Zheng, Weize Li +5

Computer Vision Robotics & Embodied AI World Models & Planning

Yaxuan Li +7Apr 23, 2026

Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training

Forget expensive real-world robot training: Hi-WM lets humans directly edit a robot's simulated reality, turning world models into powerful, reusable playgrounds for failure recovery.

Yaxuan Li, Zhongyi Zhou, Yefei Chen +5

RLHF & Preference Learning Robotics & Embodied AI World Models & Planning

Yan Ning +8Apr 23, 2026

X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation

A robot that can skate, climb stairs, and deliver packages shows how hybrid locomotion can unlock new levels of versatility.

Yan Ning, Xingzhou Chen, De-Heng Li +6

A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge

S. A. Prieto +5Apr 23, 2026

Forget dry training manuals: a challenge-based, LLM-powered humanoid robot can spark real employee excitement and understanding of robotics in the workplace.

S. A. Prieto, M. A. Gopee, Y. Arab +3

Natural Language Processing Robotics & Embodied AI Tool Use & Agents

Adrian Baruck +3Apr 23, 2026

PREVENT-JACK: Context Steering for Swarms of Long Heavy Articulated Vehicles

Swarms of long, articulated vehicles face surprising deadlock challenges, with up to 31% of vehicles immobilized in dense scenarios despite collision avoidance guarantees.

Adrian Baruck, Michael Dub'e, C. Steup +1

SLAM as a Stochastic Control Problem with Partial Information: Optimal Solutions and Rigorous Approximations

Ilir Gusija +2Apr 23, 2026

Optimal robot exploration can be achieved by framing SLAM as a POMDP with a geometry-aware exploration cost, enabling near-optimal policy learning.

Ilir Gusija, F. Alajaji, S. Yuksel

A Bayesian Reasoning Framework for Robotic Systems in Autonomous Casualty Triage

Szymon Rusiecki +5Apr 23, 2026

Expert knowledge, encoded in a Bayesian network, can dramatically improve the accuracy of autonomous robotic triage systems operating in chaotic, data-scarce environments.

Szymon Rusiecki, C. Morales, Pia Story +3

Computer Vision Reasoning & Chain-of-Thought Robotics & Embodied AI

CMU MLApr 23, 2026·also NTU, UB

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Real-world robots can now navigate complex environments with human-level instructions, thanks to a new system that combines efficient perception with high-level reasoning, all while running in real-time on limited hardware.

Kuan Xu, Ruimeng Liu, Yizhuo Yang +5

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 23, 2026

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

Robot hands get a serious upgrade: embedding cameras in fingertips unlocks robust manipulation in cluttered environments where traditional wrist-mounted cameras fail.

Zhen Zhang, Weinan Wang, Hejiang Sun +4

Ufil: A Unified Framework for Infrastructure-based Localization

Simon Schafer +4Apr 23, 2026

Stop reimplementing localization pipelines: Ufil offers a unified, open-source framework for infrastructure-based localization that lets you swap in new components without rewriting everything.

Simon Schafer, Lucas Hegerath, Marius Molz +2

Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot

Yucheng Xin +8Apr 23, 2026

Humanoid robots can now adapt to diverse environments without task-specific tuning by selectively "relaxing" joints, mimicking how humans exploit weightlessness for stability.

Yucheng Xin, Jiacheng Bao, Haoran Yang +6

How VLAs (Really) Work In Open-World Environments

Amir Rasouli +6Apr 23, 2026

Current VLA benchmarks may be overstating real-world readiness, as models succeeding by standard metrics often exhibit unsafe behaviors and poor robustness.

Amir Rasouli, Yangzheng Wu, Zhiyuan Li +4

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Yucheng Xin +7Apr 23, 2026

RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting

Humanoid robots can now seamlessly transition between fighting skills thanks to a novel policy gating approach that ensures stability and smoothness.

Yucheng Xin, Jiacheng Bao, Yubo Dong +5

Full-Body Dynamic Safety for Robot Manipulators: 3D Poisson Safety Functions for CBF-Based Safety Filters

Meg Wilkinson +5Apr 23, 2026

Guaranteeing full-body collision avoidance for robot manipulators in dynamic environments is now computationally tractable thanks to a novel application of 3D Poisson Safety Functions.

Meg Wilkinson, Gilbert Bahati, Ryan M. Bena +3

Reasoning About Traversability: Language-Guided Off-Road 3D Trajectory Planning

Byounggun Park +1Apr 23, 2026

Fine-tuning VLMs with action-aligned language supervision and terrain-aware preference optimization unlocks more robust off-road autonomous driving, outperforming prior approaches on key traversability metrics.

Byounggun Park, Soonmin Hwang

Multimodal Models Robotics & Embodied AI World Models & Planning

Dachong Li +3Apr 23, 2026·also shenzhen university

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Explicitly constraining action generation with predicted spatial "corridors" boosts VLA model performance by up to 12.4% on challenging robotic manipulation tasks.

Dachong Li, Zhuangzhuang Chen, Jin Zhang +1

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Shahriar Rahman Khan +1Apr 23, 2026

Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

Autonomous vehicles can be fooled by coordinated camera and LiDAR attacks that create "phantom" objects, even when using multi-sensor fusion designed for redundancy.

Shahriar Rahman Khan, Raiful Hasan

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Yongying Liu +8Apr 23, 2026

PLAS-Net: Pixel-Level Area Segmentation for UAV-Based Beach Litter Monitoring

Ditch the bounding boxes: PLAS-Net's pixel-perfect segmentation of beach litter reveals that fishing gear, though numerically scarce, dominates the total pollution area.

Yongying Liu, Jiaqi Wang, Jian Song +6

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

Yiming Zhong +7Apr 23, 2026

By spectrally decoupling robot control into intent and dynamics, ResVLA offers a more efficient and robust approach to generative VLA policies.

Yiming Zhong, Yaoyu He, Zemin Yang +5

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Apr 23, 2026·also D consistency. Vista, D-grounded priors for the video diffusion model. 3.2 Training with noisy multiview data So far, Eyeline Labs

Vista4D: Video Reshooting with 4D Point Clouds

Reshooting video from arbitrary viewpoints just got a whole lot better thanks to a 4D point cloud representation that maintains temporal consistency and precise camera control.

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca +9

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 22, 2026

Adriana Aida +28Apr 22, 2026

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

World-model-based planning enables reliable robotic manipulation in complex industrial settings where reactive policies crumble.

Adriana Aida, Walida Amer, Katarina Bankovic +26

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Humanoid Robot (Shanghai) Co.Apr 22, 2026·also HIT, Tongji, UMich

VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation

Vision-based tactile signals in the VTOUCH dataset significantly enhance bimanual manipulation capabilities, paving the way for more effective robotic interactions.

Qianxi Hua, Xinyue Li, Zheng Yan +3

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 22, 2026·also Aristotle University of Thessaloniki, Max Planck

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Ditch sparse contact cues: LEXIS-Flow uses a learned manifold of interaction signatures to capture dense, continuous proximity between humans and objects, leading to more realistic 3D HOI reconstructions.

Dimitrije Antić, Alvaro Budria, George Paschalidis +2

Computer Vision Multimodal Models Robotics & Embodied AI

Harrisburg University of Science and TechnologyApr 22, 2026

Personalized electric vehicle energy consumption estimation framework that integrates driver behavior with map data

Stop guessing and start knowing: this framework accurately predicts *your* EV's energy consumption by learning your driving style and integrating it with detailed map data.

Sreechakra Vasudeva Raju Rachavelpula, Sangwhan Cha

Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems

Hangzhou Dianzi UniversityApr 22, 2026

Layer-selective rehearsal and rapid recovery strategies can boost model performance in federated learning by over 30% in real-world applications.

Beining Wu

Distributed Systems & Hardware Robotics & Embodied AI Training Efficiency & Optimization

Rowan UniversityApr 22, 2026·also SCU, USTB

A Hierarchical MARL-Based Approach for Coordinated Retail P2P Trading and Wholesale Market Participation of DERs

Individual prosumers can now effectively coordinate in electricity markets, boosting overall market performance through a novel hierarchical MARL framework.

Patrick Wilk, Ethan Cantor, Yikui Liu +1

Robotics & Embodied AI Tool Use & Agents

httpsApr 22, 2026·also Technion

Temporal Difference Calibration in Sequential Tasks: Application to Vision-Language-Action Models

Reinforcement learning's Temporal Difference value estimation offers a surprisingly effective and theoretically grounded approach to calibrating uncertainty in vision-language-action models for robotics.

Shelly Francis-Meretzki, Mirco Mutti, Yaniv Romano +1

Computer Vision Multimodal Models Robotics & Embodied AI

Markus Knauer +12Apr 22, 2026

MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

Voice-commanded surface finishing is now a reality, thanks to a new framework that lets non-experts adapt robot skills through touch, language, and a drag-and-drop interface.

Markus Knauer, Edoardo Fiorini, Maximilian Mühlbauer +10

Multimodal Models Natural Language Processing Robotics & Embodied AI

Luffy.AIApr 22, 2026

Distributional Value Estimation Without Target Networks for Robust Quality-Diversity

Achieving robust Quality-Diversity in RL without the computational burden of target networks could revolutionize how we approach skill discovery in complex environments.

Behrad Koohy, J. Bayne

Robotics & Embodied AI Training Efficiency & Optimization

Department of Computer ScienceApr 22, 2026

Vibrotactile Preference Learning: Uncertainty-Aware Preference Learning for Personalized Vibration Feedback

Stop guessing what feels good: this system learns personalized vibration preferences from just 40 pairwise comparisons.

Rongtao Zhang, Xin Zhu, Masoume Pourebadi Khotbehsara +3

RLHF & Preference Learning Robotics & Embodied AI

Anhalt University of Applied SciencesApr 22, 2026·also CNRS, Université Grenoble Alpes

Lever: Inference-Time Policy Reuse under Support Constraints

Forget retraining: LEVER lets you snap together pre-trained RL policies at inference time, matching or beating from-scratch performance in some cases.

Ihor Vitenki, Noha Ibrahim, S. Amer-Yahia

Recommendation & Information Retrieval Robotics & Embodied AI

Apr 22, 2026

Toward Safe Autonomous Robotic Endovascular Interventions using World Models

World models can navigate blood vessels autonomously with higher success rates than standard RL, paving the way for safer robotic stroke treatments.

Harry Robertshaw, N. Fischer, Han-Ru Wu +6