Robotics & Embodied AI - Weekly Roundup

MotionVL: Vision-Language Supervision for Reinforcement Learning of Humanoid Motion

1d ago

Forget hand-crafted rewards: MotionVL uses VLMs and LLMs to automatically generate task-aligned reward functions for humanoid robot RL, leading to more human-like and robust motion.

Yan Luo, Jianhua Wu, Zhenhua Xiong +1

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

Md Saad +21d ago

Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models

Robots get a 33% speed boost and become significantly more adaptable when you let LLMs handle the reasoning and RL handle the movements.

Md Saad, Sajjad Hussain, Mohd Suhaib

RLHF & Preference Learning Robotics & Embodied AI Tool Use & Agents

All Papers (100)

Mar 31, 2026

Military University of Technology1d ago

An Example of Using Unreal Engine’s Built-in AI Algorithms for Realistic UAV Behaviour Simulation

Train drone operators in realistic battlefield environments without ever leaving the simulator, thanks to Unreal Engine's built-in AI.

Michał ZARZYCKI (michal.zarzycki@student.wat.edu.pl), Maria RYCHWICKA (maria.rychwicka@student.wat.edu.pl), Jerzy KACZOROWSKI (jerzy.kaczorowski@student.wat.edu.pl) +3

MotionVL: Vision-Language Supervision for Reinforcement Learning of Humanoid Motion

1d ago

Forget hand-crafted rewards: MotionVL uses VLMs and LLMs to automatically generate task-aligned reward functions for humanoid robot RL, leading to more human-like and robust motion.

Yan Luo, Jianhua Wu, Zhenhua Xiong +1

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

Md Saad +21d ago

Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models

Robots get a 33% speed boost and become significantly more adaptable when you let LLMs handle the reasoning and RL handle the movements.

Md Saad, Sajjad Hussain, Mohd Suhaib

RLHF & Preference Learning Robotics & Embodied AI Tool Use & Agents

Gianluca Aguzzi +31d ago

Phyelds: A Pythonic Framework for Aggregate Computing

Pythonistas rejoice: aggregate programming, a powerful paradigm for distributed systems, finally gets a first-class, easy-to-use library in your favorite language.

Gianluca Aguzzi, Davide Domini, Nicolas Farabegoli +1

Code Generation & Program Synthesis Distributed Systems & Hardware Robotics & Embodied AI

Zhihong Cui +61d ago

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

Autonomous vehicles can drive more safely and reliably by grounding LLM reasoning in a "Commonsense World" that quantifies and leverages the trustworthiness of LLM outputs.

Zhihong Cui, Haoran Tang, Tianyi Li +4

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

1d ago·also XPENG Robotics

DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA

Achieve superhuman robot dexterity with 10x fewer demonstrations by decoupling intent and action through latent world modeling.

Yi Chen, Yuying Ge, Hui Zhou +3

Multimodal Models Robotics & Embodied AI World Models & Planning

Han Deng +161d ago

Owl-AuraID 1.0: An Intelligent System for Autonomous Scientific Instrumentation and Scientific Data Analysis

Automating scientific discovery is now more accessible: Owl-AuraID navigates proprietary GUIs to control diverse precision instruments, freeing researchers from tedious manual operation.

Han Deng, Anqi Zou, Hanling Zhang +14

Robotics & Embodied AI Scientific Discovery & Drug Design Tool Use & Agents

Ganen Sethupathy +21d ago

From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety

Achieve real-time, privacy-aware action detection on edge devices by intelligently fusing fast skeleton tracking with vision-language models, outperforming either approach alone.

Ganen Sethupathy, Lalit Dumka, Jan Schagen

RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

Qiyuan Zhuang +51d ago

Robots can now generalize to unseen objects and categories for manipulation tasks with only a few training examples, thanks to a novel retrieval-augmented affordance prediction framework.

Qiyuan Zhuang, He-Yang Xu, Yijun Wang +3

Computer Vision Recommendation & Information Retrieval Robotics & Embodied AI

Yunyue Wei +51d ago

Scaling Whole-Body Human Musculoskeletal Behavior Emulation for Specificity and Diversity

Emulating human movement with 700 muscles reveals that many different control strategies can produce the same observed motion, challenging the assumption that kinematics uniquely define muscle activation.

Yunyue Wei, Chenhui Zuo, Shanning Zhuang +3

Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry

Akhil Gupta Chigullapally +31d ago

Smart industrial systems, while promising increased efficiency, introduce unforeseen interoperability side-effects and heightened vulnerability to cyber threats across heterogeneous IIoT systems.

Akhil Gupta Chigullapally, Sharvan Vittala, Razin Farhan Hussian +1

Constitutional AI & AI Ethics Robotics & Embodied AI

Yingke Wang +81d ago

IMPASTO: Integrating Model-Based Planning with Learned Dynamics Models for Robotic Oil Painting Reproduction

Robots can now learn to reproduce oil paintings with impressive accuracy through self-play and learned dynamics, even without human demonstrations or high-fidelity simulators.

Yingke Wang, Hao Li, Yifeng Zhu +6

Computer Vision Robotics & Embodied AI World Models & Planning

Amirreza Rouhi +81d ago

PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models

Physical AI systems struggle not with visual recognition, but with understanding space, physics, and action – and PRISM, a new retail video dataset, dramatically closes this gap.

Amirreza Rouhi, P. Sakurikar, Satya Sai Reddy +6

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

Nelly Elsayed1d ago

Security and Privacy in Virtual and Robotic Assistive Systems: A Comparative Framework

Assistive robots aren't just vulnerable to data breaches; they can be hacked to physically harm the very people they're supposed to protect.

Nelly Elsayed

Constitutional AI & AI Ethics Robotics & Embodied AI

Abdullah Thabit +71d ago

SurgNavAR: An Augmented Reality Surgical Navigation Framework for Optical See-Through Head Mounted Displays

Open-source SurgNavAR slashes the barrier to entry for AR surgical navigation research, offering a ready-to-use framework adaptable to diverse surgical applications.

Abdullah Thabit, Mohamed Benmahdjoub, Rafiuddin Jinabade +5

Leveraging Synthetic Data for Enhancing Egocentric Hand-Object Interaction Detection

Rosario Leonardi +31d ago

Synthetic data, when carefully aligned with real-world characteristics, can boost hand-object interaction detection by over 11% even when real labeled data is scarce.

Rosario Leonardi, Antonino Furnari, Francesco Ragusa +1

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Sicheng Lu +91d ago·also NJU

Scaling Video Pretraining for Surgical Foundation Models

Vision-language models falter at the fine-grained temporal recognition crucial for surgical video understanding, while SurgRec excels.

Sicheng Lu, Zikai Xiao, Jianhui Wei +7

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Léopold Maillard +71d ago

SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes

Even state-of-the-art VLMs exhibit systematic failures in reasoning about the physical feasibility of actions in 3D environments, despite high semantic confidence.

Léopold Maillard, Francis Engelmann, Tom Durand +5

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Dimitrios Anastasiou +81d ago

CoRe-DA: Contrastive Regression for Unsupervised Domain Adaptation in Surgical Skill Assessment

Forget expensive labels: CoRe-DA leverages contrastive learning and self-training to achieve state-of-the-art surgical skill assessment across diverse surgical environments without requiring target domain annotations.

Dimitrios Anastasiou, Razvan Caramalau, Jialang Xu +6

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Yue Yang +71d ago

All-in-One Augmented Reality Guided Head and Neck Tumor Resection

Surgeons can now pinpoint tumor margins with millimeter precision using augmented reality, potentially reducing positive margins in head and neck cancer resections.

Yue Yang, M. Chabanas, Carrie Reale +5

Native-Domain Cross-Attention for Camera-LiDAR Extrinsic Calibration Under Large Initial Perturbations

Ni Ou +31d ago·also ByteDance

Ditching depth map projections for camera-LiDAR calibration unlocks significant gains in accuracy and robustness, especially when starting from poor initial extrinsic estimates.

Ni Ou, Zhuo Chen, Xinru Zhang +1

Computer Vision Multimodal Models Robotics & Embodied AI

Jintao Sun +31d ago

Uncertainty-Aware Trajectory Prediction: A Unified Framework Harnessing Positional and Semantic Uncertainties

Quantifying and integrating map uncertainty—both positional and semantic—into trajectory prediction pipelines significantly boosts forecast accuracy, even when using existing baseline models.

Jintao Sun, Hu Zhang, Gangyi Ding +1

Computer Vision Robotics & Embodied AI World Models & Planning

Jiaju Ma +31d ago

Self-Consistency for LLM-Based Motion Trajectory Generation and Verification

LLMs can generate more accurate motion trajectories by clustering them into geometrically consistent families, even without retraining.

Jiaju Ma, R. Kenny Jones, Jiajun Wu +1

Multimodal Models Reasoning & Chain-of-Thought Robotics & Embodied AI

U.V.B.L. Udugama +21d ago

M2H-MX: Multi-Task Dense Visual Perception for Real-Time Monocular Spatial Understanding

Achieve a 60% reduction in trajectory error for monocular SLAM by tightly integrating multi-task dense prediction with a compact perception-to-mapping interface.

U.V.B.L. Udugama, G. Vosselman, F. Nex

MotionScale: Reconstructing Appearance, Geometry, and Motion of Dynamic Scenes with Scalable 4D Gaussian Splatting

Gim Hee Lee1d ago

Reconstructing dynamic 3D scenes from video just got a whole lot better: MotionScale achieves state-of-the-art fidelity and temporal stability by scaling Gaussian splatting to long, complex sequences.

Gim Hee Lee

Computer Vision Robotics & Embodied AI World Models & Planning

Tianyu Huang +71d ago

LightHarmony3D: Harmonizing Illumination and Shadows for Object Insertion in 3D Gaussian Splatting

Forget tedious optimization – LightHarmony3D generates realistic lighting and shadows for inserted 3D objects in a single pass, making scene augmentation feel truly real.

Tianyu Huang, Zhenyang Ren, Zhenchen Wan +5

Computer Vision Multimodal Models Robotics & Embodied AI

Sunil Tiwari +21d ago

3D Architect: An Automated Approach to Three-Dimensional Modeling

Turn 2D orthographic views into 3D models automatically using corner detection and geometric reconstruction.

Sunil Tiwari, Payal Fofadiya, Vicky Vishwakarma

Reconfiguration of supernumerary robotic limbs for human augmentation

Mustafa Mete +31d ago

Unlock adaptable human augmentation in everyday environments with reconfigurable robotic limbs, guided by quantitative analysis of workspace extension and human-robot collaboration.

Mustafa Mete, Anastasia Bolotnikova, Alexander Schuessler +1

HapCompass: A Rotational Haptic Device for Contact-Rich Robotic Teleoperation

Xiangshan Tan +41d ago

A rotating haptic compass on your wrist dramatically improves robotic teleoperation by providing intuitive directional cues, outperforming traditional vibration-based feedback and even improving imitation learning.

Xiangshan Tan, Jingtian Ji, Tianchong Jiang +2

Subjective Quality Assessment of Dynamic 3D Meshes in Virtual Reality Environment

Duc V. Nguyen +21d ago

You can halve the polygon count of dynamic 3D meshes in VR without users noticing, but existing quality metrics won't tell you that.

Duc V. Nguyen, Nguyen Thi Quynh Ly, Truong Thu Huong

Passive iFIR filters for data-driven velocity control in robotics

Yi Zhang +21d ago

Passive iFIR filters learned from just three minutes of robot data can dramatically outperform optimized PID controllers in velocity tracking tasks, offering a fast and stable alternative for robot control.

Yi Zhang, Zixing Wang, Fulvio Forni

Robotics & Embodied AI Training Efficiency & Optimization

Soumyodipta Nath +21d ago

SafeDMPs: Integrating Formal Safety with DMPs for Adaptive HRI

Get provably safe and dynamically robust robot motions in human environments without the computational bottleneck of online optimization.

Soumyodipta Nath, Pranav Tiwari, Ravi Prakash

Constitutional AI & AI Ethics Robotics & Embodied AI

Anja Bosak +31d ago

Design and Aerodynamic Modeling of MetaMorpher: A Hybrid Rotary andFixed-Wing Morphing UAV

Unlock rapid UAV design iteration with MetaMorpher's modular, nonlinear flight dynamics model that accurately simulates diverse wing configurations and flight modes.

Anja Bosak, Dorian Erić, Ana Milas +1

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Pukyong National University1d ago

Semantic Zone-Based Map Management for Stable AI-Integrated Mobile Robots

Semantic scene understanding can keep your robot from crashing when running LLMs on edge devices.

Huichang Yun, Seungho Yoo

Inference & Quantization Robotics & Embodied AI Tool Use & Agents

Stanley Wang +31d ago·also Stanford HAI

Long-Reach Robotic Cleaning for Lunar Solar Arrays

A long-reach robot arm can gently clean lunar solar panels, even with limited force feedback, opening the door to autonomous maintenance on the moon.

Stanley Wang, Velin Kojouharov, Long Yin Chung +1

Distributed Predictive Control Barrier Functions: Towards Scalable Safety Certification in Modular Multi-Agent Systems

Jonas Ohnemus +41d ago

Guaranteeing safety in multi-agent systems with dynamic networks doesn't have to sacrifice performance: this plug-and-play protocol ensures recoverable safety even when agents join/leave or network topologies shift.

Jonas Ohnemus, Alexandre Didier, Ahmed Aboudonia +2

Distributed Systems & Hardware Robotics & Embodied AI

Ancheng Hou +21d ago

GraSP-STL: A Graph-Based Framework for Zero-Shot Signal Temporal Logic Planning via Offline Goal-Conditioned Reinforcement Learning

Offline RL can now tackle complex, unseen temporal logic tasks without retraining, by stitching together learned short-horizon behaviors into long-horizon plans.

Ancheng Hou, Ruijia Liu, Xiang Yin

Communication Outage-Resistant UUV State Estimation: A Variational History Distillation Approach

Shuyue Li +71d ago

UUVs can navigate communication blackouts with 91% more accuracy by distilling patterns from their past trajectories.

Shuyue Li, Miguel López-Benítez, Eng Gee Lim +5

Model Predictive Path Integral PID Control for Learning-Based Path Following

Teruki Kato +21d ago

By optimizing PID gains with MPPI, this method achieves comparable performance to conventional MPPI with significantly fewer samples, offering a more sample-efficient approach to learning-based control.

Teruki Kato, Koshi Oishi, Seigo Ito

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Yuan Hao +51d ago

CReF: Cross-modal and Recurrent Fusion for Depth-conditioned Humanoid Locomotion

Humanoids can now nimbly navigate real-world clutter and complex terrain using only raw depth data, ditching hand-engineered geometric representations.

Yuan Hao, Ruiqi Yu, Shixin Luo +3

Computer Vision Multimodal Models Robotics & Embodied AI

1d ago

CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics

Achieve state-of-the-art robotic manipulation with a model orders of magnitude smaller than VLAs by explicitly aligning kinematic and semantic transitions.

Andrew Jeong, Jaemin Kim, Sebin Lee +1

Multimodal Models Robotics & Embodied AI World Models & Planning

Max Lodel +31d ago

Learning Semantic Priorities for Autonomous Target Search

Forget brute-force coverage – this method learns from simulated expert guidance to prioritize semantically relevant areas, dramatically speeding up target search in unseen environments.

Max Lodel, Nils Wilde, Robert Babuvska +1

Computer Vision Robotics & Embodied AI Tool Use & Agents

Wanlei Li +41d ago

Interacting Multiple Model Proprioceptive Odometry for Legged Robots

Legged robots can now navigate more accurately using only internal sensors, even with imperfect foot contact, thanks to a new probabilistic method that dynamically adapts to different contact scenarios.

Wanlei Li, Zichang Chen, Shilei Li +2

Industrial-Grade Robust Robot Vision for Screw Detection and Removal under Uneven Conditions

Tomoki Ishikura +31d ago

Automating disassembly of complex, degraded appliances in recycling plants is now feasible, achieving high accuracy without pre-programmed coordinates.

Tomoki Ishikura, Genichiro Matsuda, Takuya Kiyokawa +1

SuperGrasp: Single-View Object Grasping via Superquadric Similarity Matching, Evaluation, and Refinement

Lijingze Xiao +41d ago

SuperGrasp achieves robust single-view grasping by cleverly combining superquadric-based similarity matching with an end-to-end refinement network, outperforming existing methods in stability and generalization.

Lijingze Xiao, Jinhong Du, Yang Cong +2

Kernel-SDF: An Open-Source Library for Real-Time Signed Distance Function Estimation using Kernel Regression

Zhirui Dai +61d ago

Real-time, uncertainty-aware signed distance functions are now possible without sacrificing accuracy, thanks to a novel kernel regression and GP regression hybrid.

Zhirui Dai, Tianxing Fan, Mani Amani +4

Computer Vision Open-Source Models & Weights Robotics & Embodied AI

Yinxiao Tian +31d ago

Kilohertz-Safe: A Scalable Framework for Constrained Dexterous Retargeting

Get kilohertz-level dexterous hand teleoperation *with* formal safety guarantees, thanks to a new convex optimization approach.

Yinxiao Tian, Ziyi Yang, Zinan Zhao +1

Robotics & Embodied AI Training Efficiency & Optimization

Sen Wang +91d ago

Efficient Camera Pose Augmentation for View Generalization in Robotic Policy Learning

Policies trained with GenSplat maintain robust performance under severe spatial perturbations where baseline methods completely fail, thanks to its novel 3D Gaussian Splatting-based augmentation.

Sen Wang, Huaiyi Dong, Jingyi Tian +7

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Haihong Hao +71d ago

LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning

VLN agents can now "dream ahead" by learning action-conditioned visual dynamics in a latent space, leading to SOTA results and improved real-world navigation.

Haihong Hao, Lei Chen, Mingfei Han +5

Multimodal Models Robotics & Embodied AI World Models & Planning

Jingli Li +121d ago

Closed-Loop Integrated Sensing, Communication, and Control for Efficient Drone Flight

Ignoring control packet loss in drone communication can lead to trajectory divergence, but this integrated sensing-communication-control scheme achieves decimeter-level accuracy.

Jingli Li, Yiyan Ma, Bo Ai +10

An Information-Theoretic Method for Dynamic System Identification With Output-Only Damping Estimation

Marios Impraimakis +21d ago

Current vibration-based alert systems often misestimate alert durations due to poor damping estimates, but this new information-theoretic method can accurately capture alert duration.

Marios Impraimakis, Feiyu Zhou, Andrew Plummer

Robotics & Embodied AI Scientific Discovery & Drug Design

Soomin Park +31d ago

MaskAdapt: Learning Flexible Motion Adaptation via Mask-Invariant Prior for Physics-Based Characters

Achieve targeted motion adaptation in physics-based characters by learning a mask-invariant prior, enabling robust control even with missing observations or text-driven partial goals.

Soomin Park, Eunseong Lee, Kwang Bin Lee +1

Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes

Mar 30, 2026

Max Qiushi Lin +52d ago

Actor-critic methods can achieve state-of-the-art sample complexity in linear MDPs *without* relying on computationally expensive implicit policies or strong assumptions about exploration.

Max Qiushi Lin, Reza Asad, Kevin Tan +3

Robotics & Embodied AI Training Efficiency & Optimization

Natália Ribeiro Marinho +42d ago

Physics-Informed Framework for Impact Identification in Aerospace Composites

Physics-informed neural networks can now accurately identify impact events on aerospace composites, even with noisy or incomplete data, opening the door to real-time structural health monitoring.

Natália Ribeiro Marinho, Richard Loendersloot, Jan Willem Wiegman +2

Robotics & Embodied AI Scientific Discovery & Drug Design

Yue Jin +12d ago

Learning Partial Action Replacement in Offline MARL

Overcome the curse of dimensionality in offline MARL by learning which agents' actions to replace, achieving state-of-the-art performance with dramatically reduced computation.

Yue Jin, Giovanni Montana

Robotics & Embodied AI Tool Use & Agents

Chanyoung Kim +42d ago·also Soongsil University

LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

VLA models are brittle: even simple synonym substitutions in instructions cause a 22-52% performance drop in robotic manipulation tasks.

Chanyoung Kim, Minwoo Kim, Minseok Kang +2

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Rahul Jaiswal +22d ago

Detecting the Unexpected: AI-Driven Anomaly Detection in Smart Bridge Monitoring

A simple DBSCAN model running on real-time bridge sensor data can outperform other ML models in detecting anomalies, suggesting a practical path to preventing catastrophic failures.

Rahul Jaiswal, Joakim Hellum, Halvor Heiberg

Computer Vision Robotics & Embodied AI Scientific Discovery & Drug Design

Amber Cassimon +32d ago

Intelligent Road Condition Monitoring using 3D In-Air SONAR Sensing

SONAR can "see" road damage and material even when cameras and LiDAR are blinded by rain or fog.

Amber Cassimon, Robin Kerstens, Walter Daems +1

Computer Vision Robotics & Embodied AI Speech & Audio

Tim Plotzki +12d ago

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

RL agents can learn to control complex fluid dynamics 40% faster by pretraining on Koopman-based surrogate models and iteratively refining them with policy-aware data.

Tim Plotzki, Sebastian Peitz

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

Sijin Sun +32d ago

From Vessel Trajectories to Safety-Critical Encounter Scenarios: A Generative AI Framework for Autonomous Ship Digital Testing

Generating realistic, safety-critical maritime scenarios at scale is now possible by combining generative trajectory modeling with automated encounter pairing, moving beyond limited historical data or handcrafted templates.

Sijin Sun, Liangbin Zhao, Ming Deng +1

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Udita Ghosh +42d ago

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

Get 80% of your oracle feedback for free: ROVED leverages vision-language embeddings to drastically reduce the need for human preferences in reinforcement learning.

Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li +2

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

Songjun Tu +62d ago

Dynamic Dual-Granularity Skill Bank for Agentic RL

Agentic RL agents can learn faster and perform better by dynamically maintaining a skill bank that combines high-level task guidance with low-level step-by-step decision support.

Songjun Tu, Chengdong Xu, Qichao Zhang +4

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Hagen Holthusen +22d ago

A Convex Route to Thermomechanics: Learning Internal Energy and Dissipation

Learning thermomechanical material properties just got easier: this new framework guarantees thermodynamic consistency without needing entropy data or enforcing complex convexity constraints.

Hagen Holthusen, Paul Steinmann, Ellen Kuhl

Robotics & Embodied AI Scientific Discovery & Drug Design

2d ago

Integrating Multimodal Large Language Model Knowledge into Amodal Completion

MLLMs can now guide visual generative models to imagine what's hidden behind objects, significantly boosting amodal completion performance.

Heecheol Yun, Eunho Yang

Computer Vision Multimodal Models Robotics & Embodied AI

Tham Piumsomboon +12d ago

Self++: Co-Determined Agency for Human--AI Symbiosis in Extended Reality

XR's potential for AI-driven assistance risks eroding human autonomy, but Self++ offers a design blueprint to ensure AI augments, rather than replaces, human judgment.

Tham Piumsomboon, Thammathip Piumsomboon

Constitutional AI & AI Ethics Robotics & Embodied AI Tool Use & Agents

Yujing Sun +62d ago

Dogfight Search: A Swarm-Based Optimization Algorithm for Complex Engineering Optimization and Mountainous Terrain Path Planning

A new swarm-based optimization algorithm, inspired by dogfighting but built on kinematic equations, achieves state-of-the-art performance across diverse benchmark and real-world engineering problems.

Yujing Sun, Jie Cai, Xingguo Xu +4

Robotics & Embodied AI Training Efficiency & Optimization

P. Rim +192d ago

SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

Training data no longer needs to choose between realism and accuracy: SHOW3D delivers both for hand-object interaction.

P. Rim, Patrick Rim, Kevin Harris +17

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Chao Yin +102d ago

Industrial3D: A Terrestrial LiDAR Point Cloud Dataset and CrossParadigm Benchmark for Industrial Infrastructure

A 40-point mIoU gap between supervised methods and zero-shot segmentation on Industrial3D reveals that foundation models are nowhere near ready for real-world industrial Scan-to-BIM workflows.

Chao Yin, Hongzhe Yue, Qing Han +8

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Yu Sun +182d ago

ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Current robot manipulation benchmarks fail to capture the messy reality of real-world deployment, so this work introduces a new benchmark, ManipArena, to close the sim2real gap.

Yu Sun, Meng Cao, Ping Yang +16

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Quan Meng +42d ago

Seen2Scene: Completing Realistic 3D Scenes with Visibility-Guided Flow

Real-world 3D scene completion is now possible without synthetic training data, thanks to visibility-guided flow matching that handles incomplete scans.

Quan Meng, Yujin Chen, Lei Li +2

Computer Vision Robotics & Embodied AI World Models & Planning

Banglei Guan +82d ago

Event-Based Method for High-Speed 3D Deformation Measurement under Extreme Illumination Conditions

Event cameras can now accurately measure high-speed 3D deformations of structures under extreme lighting, opening up new possibilities for monitoring the safety of critical infrastructure.

Banglei Guan, Yifei Bian, Zibin Liu +6

BlankSkip: Early-exit Object Detection onboard Nano-drones

Carlo Marra +52d ago

Skipping frames without objects boosts nano-drone object detection throughput by 24% with negligible accuracy loss.

Carlo Marra, Beatrice Alessandra Motetti, Alessio Burrello +3

Computer Vision Inference & Quantization Robotics & Embodied AI

2d ago

Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal

Ghost points, often ignored in LiDAR processing, can be effectively identified and removed using full-waveform LiDAR data, leading to substantial improvements in downstream tasks like SLAM and object detection.

Kazuma Ikeda, Ryosei Hara, Rokuto Nagata +6

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Hyeonjun Jeong +22d ago

To View Transform or Not to View Transform: NeRF-based Pre-training Perspective

View transformation may be sabotaging your NeRF pre-training: directly learning continuous 3D representations with NeRP3D avoids conflicting priors and boosts performance on nuScenes.

Hyeonjun Jeong, Juyeb Shin, Dongsuk Kum

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago·also NVIDIA

\textit{4DSurf}: High-Fidelity Dynamic Scene Surface Reconstruction

Achieve 49% and 19% better Chamfer distance than state-of-the-art dynamic surface reconstruction methods on Hi4D and CMU Panoptic datasets, respectively, by enforcing temporal consistency in Gaussian Splatting.

Renjie Wu, Hongdong Li, Jose M. Alvarez +1

Event6D: Event-based Novel Object 6D Pose Tracking

Jae-Young Kang +62d ago

Event cameras unlock 6D pose tracking of novel objects at 120+ FPS, even with rapid motion, by fusing sparse event streams with depth in a way that generalizes zero-shot from synthetic training.

Jae-Young Kang, Hoonehee Cho, Taeyeop Lee +4

Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames

Hu Cao +72d ago

Event cameras, fused with traditional frames using an energy-aware approach, can significantly boost the accuracy of autonomous vehicle steering prediction.

Hu Cao, Jiong Liu, Xingzhuo Yan +5

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Xiaobin Zhou +62d ago

A Self-Rotating Tri-Rotor UAV for Field of View Expansion and Autonomous Flight

Achieve dramatically wider field of view for UAVs without adding sensors or complexity by simply spinning the entire drone.

Xiaobin Zhou, Zihao Zheng, Ao Jin +4

A Foldable and Agile Soft Electromagnetic Robot for Multimodal Navigation in Confined and Unstructured Environments

Zhi-Min Lv +102d ago

A foldable soft robot achieves unprecedented agility with nine distinct locomotion modes, opening new possibilities for navigating the human body.

Zhi-Min Lv, Zhihao Lv, Xiao-Yi Zhang +8

Multimodal Models Robotics & Embodied AI

Subham Agrawal +32d ago

Point of View: How Perspective Affects Perceived Robot Sociability

What looks like polite robot navigation from above can feel downright rude when you're the pedestrian dodging it.

Subham Agrawal, Aftab Akthar, Nils Dengler +1

Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion

Wenqi Cai +42d ago

Forget painstakingly tuning MPC controllers by hand: this method learns optimal humanoid locomotion policies by aligning MPC cost functions with high-fidelity RL data.

Wenqi Cai, K. Vamvoudakis, Kyriakos G. Vamvoudakis +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Yulie Arad +112d ago

Serialized Red-Green-Gray: Quicker Heuristic Validation of Edges in Dynamic Roadmap Graphs

Updating motion planning roadmaps in dynamic environments just got an order of magnitude faster with a GPU-accelerated edge validation scheme.

Yulie Arad, Yulie Arad, Stav Ashur +9

Feel Robot Feels: Tactile Feedback Array Glove for Dexterous Manipulation

Feiyu Jia +82d ago

Feel what the robot feels: a new glove lets human operators experience high-resolution tactile feedback during dexterous teleoperation, dramatically improving performance in contact-rich tasks.

Feiyu Jia, Xiaojie Niu, Sizhe Yang +6

A Predictive Control Strategy to Offset-Point Tracking for Agricultural Mobile Robots

Stephane Ngnepiepaye Wembe +42d ago

Agricultural robots can now more accurately follow paths and avoid crop damage thanks to a new controller that explicitly models the implements they tow.

Stephane Ngnepiepaye Wembe, V. Rousseau, Vincent Rousseau +2

Communications-Aware NMPC for Multi-Rotor Aerial Relay Networks Under Jamming Interference

Giuseppe Silano +72d ago

Tilting your drone's propellers isn't just for agility – it can be a game-changer for maintaining comms under jamming attacks, boosting link reliability by orders of magnitude.

Giuseppe Silano, D. Licea, Daniel Bonilla Licea +5

Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Robin M Kuhn +52d ago

Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation

More sensors, more problems: a simple active stereo camera setup beats out complex multi-sensor rigs for humanoid robot imitation learning when data is scarce.

Robin M Kuhn, Robin Kühn, M. Schappler +3

Computer Vision Multimodal Models Robotics & Embodied AI

Federico Mariano +92d ago

Off-Axis Compliant RCM Joint with Near-Isotropic Stiffness and Minimal Parasitic Error

Neurosurgeons gain a compact, sterilizable RCM joint with near-isotropic stiffness, minimizing unwanted motion during delicate procedures.

Federico Mariano, E. Momi, Elena De Momi +7

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Jeonghwan Kim +52d ago

Flip Stunts on Bicycle Robots using Iterative Motion Imitation

Bicycle robots can now do front-flips, thanks to a reinforcement learning method that bootstraps from dynamically infeasible reference motions.

Jeonghwan Kim, Shamel Fahmi, Seungeun Rho +3

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

MIT CSAIL2d ago

Pandora: Articulated 3D Scene Graphs from Egocentric Vision

Robots can now "see" hidden objects and understand articulation by learning from human egocentric video, even if they can't physically explore those areas themselves.

Alan Yu, Alan Yu, Yun Chang +5

Computer Vision Multimodal Models Robotics & Embodied AI+1

Tianle Zeng +42d ago

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

Finally, a single, open-source platform lets you train and test coordinated air and ground robots in photorealistic urban environments with synchronized physics and sensors.

Tianle Zeng, Han Chen, Hanxuan Chen +2

Computer Vision Robotics & Embodied AI World Models & Planning

Philip Schroeder +112d ago

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

Robots can now learn complex manipulation tasks from scratch using only video and language, bypassing the need for hand-engineered reward functions, demonstrations, or even task-specific tuning.

Philip Schroeder, Philip Schroeder, Thomas Weng +9

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

Xiaohang Nie +132d ago

Synergy: A Next-Generation General-Purpose Agent for Open Agentic Web

Synergy's architecture lets agents evolve through experience by proactively recalling rewarded trajectories, hinting at a new way to build agents that learn and adapt in open, collaborative environments.

Xiaohang Nie, Zihan Guo, Kezhuo Yang +11

Open-Source Models & Weights Robotics & Embodied AI Tool Use & Agents

Ziad Sharawy +32d ago

Detection of Adversarial Attacks in Robotic Perception

Adversarial attacks can cripple robotic perception systems, demanding specialized defenses beyond standard image classification techniques.

Ziad Sharawy, Mohammad Nakshbandiand, S. Grigorescu +1

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Jiangen He +22d ago

Why That Robot? A Qualitative Analysis of Justification Strategies for Robot Color Selection Across Occupational Contexts

Robot color choices are subtly shaped by racial and occupational stereotypes, even when users offer seemingly rational justifications.

Jiangen He, Wanqi Zhang, Jessica K. Barfield

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

Leonardo J. Colombo +32d ago

Stable Walking for Bipedal Locomotion under Foot-Slip via Virtual Nonholonomic Constraints

Bipedal robots can now walk more stably on slippery surfaces thanks to a new control method that explicitly models and compensates for foot slippage.

Leonardo J. Colombo, 'Alvaro Rodr'iguez Abella, A. A. Simões +1

Gleanmer: A 6 mW SoC for Real-Time 3D Gaussian Occupancy Mapping

Zih-Sing Fu +32d ago

Real-time 3D occupancy mapping for edge devices is now possible under a 6mW power budget thanks to Gleanmer, a novel SoC.

Zih-Sing Fu, Peter Zhi Xuan Li, S. Karaman +1

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

MIT CSAIL2d ago

Large Neighborhood Search for Multi-Agent Task Assignment and Path Finding with Precedence Constraints

Freeing robots from pre-assigned tasks slashes completion times in multi-agent settings, with a new algorithm improving performance on almost 90% of tested scenarios.

Viraj Parimi, Brian C. Williams

Koopman Operator Framework for Modeling and Control of Off-Road Vehicle on Deformable Terrain

Clemson University2d ago

Unlock real-time control of off-road vehicles on challenging terrain by representing complex terramechanics with linear Koopman operators learned from simulation data.

Kartik Loya, Phanindra Tallapragada

AutoWorld: Scaling Multi-Agent Traffic Simulation with Self-Supervised World Models

Mozhgan Pourkeshavatz +22d ago

Unlabeled LiDAR data can now drive state-of-the-art traffic simulation, unlocking scalable realism without costly annotations.

Mozhgan Pourkeshavatz, Tianran Liu, Nicholas Rhinehart

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Bhavya Oza +92d ago

See Something, Say Something: Context-Criticality-Aware Mobile Robot Communication for Hazard Mitigations

Context-aware robots that "see something, say something" boost user trust by 82% simply by communicating more intelligently about hazards.

Bhavya Oza, Bhavya Oza, Devam Shah +7