Robotics & Embodied AI - Weekly Roundup

A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry

2w ago·also Columbia

Task-aware 3D reconstruction slashes the number of views needed by focusing on the data that actually matters for downstream applications.

Jingsen Zhu, Silvia Sellán, Alexander Terenin

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

All Papers (94)

May 6, 2026

2w ago

Interactive 3D asset generation can now be driven by functional logic and hierarchical physics, thanks to a new framework that synthesizes simulation-ready assets.

Yunhan Yang, Chunshi Wang, Junliang Ye +7

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Alper Kamil Bozkurt +42w ago

Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning

Stop committing to a single policy in offline-to-online RL: adaptively select and fine-tune policies based on predicted performance to maximize returns under interaction budgets.

Alper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang +2

RLHF & Preference Learning Robotics & Embodied AI

Department of Statistical Sciences2w ago·also Department of Mathematics, UofT, UW-Madison

Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations

Stabilizing nuclear fusion plasma with imitation learning is possible even with limited macroscopic observations, offering a path to practical control strategies.

Xiaofan Xia, Qin Li, Wenlong Mou

Robotics & Embodied AI Scientific Discovery & Drug Design

Eni Solomon Laughter2w ago

Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data

Drivers dynamically switch their perceptual priorities from gap-closing rate to visual looming as braking intensity decreases, overturning long-held assumptions about car-following behavior.

Eni Solomon Laughter

A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry

2w ago·also Columbia

Task-aware 3D reconstruction slashes the number of views needed by focusing on the data that actually matters for downstream applications.

Jingsen Zhu, Silvia Sellán, Alexander Terenin

Order-based Rehearsal Learning

National Key Laboratory for Novel2w ago

You don't need a full causal graph to avoid undesired outcomes; learning a simple order structure can be enough, and even outperform methods that try to learn the whole graph.

Yu-Xuan Tao, Tian-Zuo Wang, Zhi-Hua Zhou

Koopman Identification of Nonlinear Systems via Reservoir Liftings

Weibin Gu +22w ago

Reservoir Computing offers a surprisingly effective way to build Koopman dictionaries for nonlinear system identification, sidestepping the usual dictionary selection and ill-conditioning problems.

Weibin Gu, Chen Yang, Lu Shi

Robotics & Embodied AI Scientific Discovery & Drug Design World Models & Planning

Ilan University2w ago

A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs

Average reward RL can finally handle the messy reality of non-stationary rewards and durations in SMDPs, thanks to a clever harmonic mean trick.

Erel Shtossel, Alicia Vidler, Uri Shaham +1

Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Matan Pagi +12w ago

Control-dependent latent dynamics, achieved with a surprisingly small parameter increase, unlock robust MPC performance in time-varying environments where standard Koopman methods falter.

Matan Pagi, Zohar Sorek

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Berk Sezer +32w ago

Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction

Turns out, all gaze estimation models stumble when robots look down, and complex architectures aren't the answer – data diversity is the real secret to robust human-robot interaction.

Berk Sezer, Ali Gorkem Kuccuk, Erol cSahin +1

Computer Vision Eval Frameworks & Benchmarks Robotics & Embodied AI

Lirui Luo +42w ago

SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

MoEs, despite their scaling advantages, suffer from a surprising "spectral plasticity loss" in continual RL, but a simple Parseval penalty can recover performance.

Lirui Luo, Guoxi Zhang, Hongming Xu +2

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Training Efficiency & Optimization

Yurui Du +32w ago

ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC

Achieve robust long-horizon visual control by adaptively balancing model-based lookahead with bootstrapping, enabling zero-shot transfer to real-world tasks with severe occlusions.

Yurui Du, Pinhao Song, Yutong Hu +1

Computer Vision Robotics & Embodied AI World Models & Planning

Ilan University2w ago

Modular Reinforcement Learning For Cooperative Swarms

Decomposing robot swarm state representations unlocks effective cooperation even with computationally-limited agents.

Erel Shtossel, Gal A. Kaminka

Distributed Systems & Hardware RLHF & Preference Learning Robotics & Embodied AI

2w ago·also Brown, Northeastern

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Forget brittle imitation learning: Q2RL unlocks robust on-robot reinforcement learning by distilling a Q-function from Behavior Cloning and intelligently gating between imitation and RL based on Q-value estimates.

Lakshita Dodeja, Ondrej Biza, Shivam Vats +5

Robotics & Embodied AI Training Efficiency & Optimization

Seungeun Rho +52w ago

LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts

Forget hand-crafted reward functions: this RL framework lets a bicycle robot learn complex stunts from just a spatial guideline and a few key poses.

Seungeun Rho, Shamel Fahmi, Jeonghwan Kim +3

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

2w ago·also Hubei University, Osaka

Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout

Predicting driver behavior in response to traffic conditions is now possible with a new world model that causally links external context to internal driver states.

Haozhuang Chi, Daosheng Qiu, Hao Su +4

Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity

University of Nebraska-Lincoln2w ago·also Ohio State

End-to-end ML models get smoked in real-world mmWave vehicular connectivity: a hybrid vision-primed approach slashes outage rates by leveraging model-based reasoning and RF feedback.

Avhishek Biswas, Apala Pramanik, Eylem Ekici +1

Computer Vision Multimodal Models Robotics & Embodied AI

Xiamen University2w ago·also Key Laboratory of Multimedia Trusted, Ministry of Educa- tion of China, School of Computing and Information

Position: Embodied AI Requires a Privacy-Utility Trade-off

Fragmented privacy patches are insufficient for Embodied AI: a unified, lifecycle-level approach is needed to prevent systemic privacy leakage in real-world deployments.

Xiaoliang Fan, Jiarui Chen, Zhuodong Liu +6

Constitutional AI & AI Ethics Robotics & Embodied AI

2w ago·also CUHK, Hubei University, Shenzhen MSU-BIT University

SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition

LLM-powered multi-agent collaboration can boost zero-shot IMU activity recognition accuracy by 29% compared to existing agent models, even surpassing deep learning baselines.

Naiyu Zheng, Tianlong Yu, Haochen Yin +3

Robotics & Embodied AI Tool Use & Agents

J. Spieler +12w ago

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

Gradient-based MPC can finally beat gradient-free methods in continuous control, thanks to Dream-MPC's clever combination of learned policies, world models, uncertainty regularization, and optimization amortization.

J. Spieler, Sven Behnke

Robotics & Embodied AI Tool Use & Agents World Models & Planning

2w ago

DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration

DAOs could unlock a new era of human-machine collaboration by democratizing the operation and governance of physical-digital systems.

M. Ballandies, Florian Spychiger, Uwe Serdult +1

Constitutional AI & AI Ethics Robotics & Embodied AI Tool Use & Agents

Binh Long Nguyen +42w ago

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

Unlock zero-shot 3D scene understanding: Ilov3Splat lets you identify and segment arbitrary objects in 3D scenes using only natural language, no category supervision needed.

Binh Long Nguyen, Kien Nguyen, S. Sridharan +2

Computer Vision Multimodal Models Robotics & Embodied AI

Tobias Denzinger +22w ago

Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework

Mixing tasks with different safety levels in automotive ECUs can compromise critical functions, highlighting the need for careful task allocation strategies.

Tobias Denzinger, Matthias Becker, Peter Ulbrich

Code Generation & Program Synthesis Robotics & Embodied AI

Serra Z. Dane +32w ago·also UMich, ZJU

Towards Formal Verification of Hybrid Synchronous Programs with Refinement Types

Guaranteeing safety in autonomous systems gets a boost: this work enables formal verification of hybrid system code that directly controls physical processes.

Serra Z. Dane, Jiawei Chen, Marc Pouzet +1

Code Generation & Program Synthesis Robotics & Embodied AI

Yuhu Guo +62w ago

Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception

Get high-fidelity tactile simulations with 65% speedup and 40% less memory by combining coarse physics with neural implicit reconstruction.

Yuhu Guo, Zhikai Shen, Jiasheng Qu +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

2w ago·also D height representation

Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

By intelligently incorporating LiDAR-derived height information, HiPR overcomes limitations of fixed projection spaces, achieving state-of-the-art camera-LiDAR occupancy prediction with real-time performance.

Yuan Wu, Zhiqiang Yan, Jiawei Lian +2

Computer Vision Multimodal Models Robotics & Embodied AI

CARIAD SE2w ago·also TU Berlin, Vision & Robotics GmbH

CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography

Finally, a driving dataset that doesn't just assume perfectly paved roads, offering 6.5x more depth data than KITTI for realistic autonomous driving scenarios.

Gasser Elazab, Frank Neuhaus, Tilman Koß +5

Computer Vision Multimodal Models Robotics & Embodied AI

Laura Bravo-S'anchez +52w ago

Anny-Fit: All-Age Human Mesh Recovery

Adult-trained human mesh recovery models can now handle kids, too, thanks to a new framework that enforces spatial consistency and leverages VLM-derived age and gender cues.

Laura Bravo-S'anchez, M. Armando, Romain Br'egier +3

Computer Vision Multimodal Models Robotics & Embodied AI

2w ago

Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling

Synthesizing realistic duet dance motions gets a boost from explicitly modeling inter-dancer contact, leading to significantly improved interaction fidelity and rhythmic synchronization.

Xuhai Chen, Zhi Cen, Huaijin Pi +3

VL-UniTrack: A Unified Framework with Visual-Language Prompts for UAV-Ground Visual Tracking

Boyue Xu +32w ago

Bridging the gap between aerial and ground-level tracking, VL-UniTrack uses visual-language prompts to achieve robust object tracking even with significant viewpoint differences.

Boyue Xu, Ruichao Hou, Tongwei Ren +1

Computer Vision Multimodal Models Robotics & Embodied AI

C. Gentil +32w ago

Dr-PoGO: Direct Radar Pose-Graph Optimization

Radar SLAM can now achieve state-of-the-art performance via direct scan registration, eliminating the need for hand-engineered feature extraction and enabling robust localization in adverse weather.

C. Gentil, Weican Li, L. Brizi +1

CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

Microsoft Research2w ago·also Drive.

Autonomous driving gets a boost: CRAFT cleverly combines the best of both worlds – dense counterfactual supervision and grounded closed-loop feedback – to significantly improve driving policies.

Keyu Chen, Nanfei Ye, Yida Wang +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Linfeng Li +22w ago

Active Contact Sensing for Robust Robot-to-Human Object Handover

Robots can reliably hand over objects to humans by actively probing grasps, achieving a 30% improvement over passive methods.

Linfeng Li, Lin Shao, David Hsu

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Huimin Wang +92w ago

RL fine-tuning unlocks a 6x performance gain for in-place trajectory editing in autonomous driving, demonstrating the power of aligning diffusion planners with reinforcement learning.

Huimin Wang, Yue Wang, Bihao Cui +7

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Jian Wu +32w ago

Practical validation of synthetic pre-crash scenarios

Stop relying on significance tests that only find differences: this Bayesian framework tells you if your synthetic data is *practically equivalent* to real-world data for your specific safety assessment task.

Jian Wu, Ulrich Sander, Carol A. C. Flannagan +1

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

Xinpan Meng +82w ago

From Reach to Insert: Tactile-Augmented Precision Assembly under Sub-Millimeter Tolerances

Tactile feedback, when strategically sampled and evaluated, unlocks robust and safe robotic insertion policies even under sub-millimeter tolerances.

Xinpan Meng, Siyao Huang, JingPu Yang +6

Robotics & Embodied AI Tool Use & Agents

Franek Stark +42w ago

Right Model, Right Time: Real-Time Cascaded-Fidelity MPC for Bipedal Walking

Achieve real-time bipedal walking control by cleverly swapping high-fidelity for low-fidelity models in MPC, slashing computation without sacrificing stability.

Franek Stark, Felix Wiebe, Shubham Vyas +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Nandiraju Gireesh +52w ago

HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks

Diffusion models can now plan effectively for long-horizon tasks by strategically generating subgoals that are then efficiently realized by rectified flow models.

Nandiraju Gireesh, Yuanliang Ju, Chaoyi Xu +3

Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation

Zimu Gong +42w ago

Generate more realistic and diverse safety-critical autonomous vehicle scenarios by using conditional latent flow matching to bridge the gap between real-world and simulated data.

Zimu Gong, Brian Zhaoning Zhang, Chris Zhang +2

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

2w ago

Tightly-Coupled Estimation and Guidance for Robust Low-Thrust Rendezvous via Adaptive Homotopy

Dynamically adjusting trajectory optimization based on real-time navigation confidence enables robust low-thrust rendezvous, slashing miss distances by two orders of magnitude when faced with degraded sensor data.

Batu Candan, Simone Servadio

Autonomous Laparoscope Control through Unified Mechanics-Based Representation of Multimodal Intraoperative Information

Xiaojian Li +82w ago

Achieve autonomous laparoscope control by translating multimodal surgical data into a single "wrench" that guides the robot's movements.

Xiaojian Li, Jin Fang, Yudong Shi +6

Computer Vision Multimodal Models Robotics & Embodied AI

Antoine Baron +52w ago

On Electropolymerized Fingerprints and their Potential for Identification and Encryption

Forget digital watermarks – now you can physically fingerprint solutions with electrochemically-generated polymer patterns, opening doors to low-cost, physically-encrypted personal information.

Antoine Baron, L. Brulin, Corentin Scholaert +3

Robotics & Embodied AI Scientific Discovery & Drug Design

Yanjia Chen +62w ago

Optimal Uncertainty-Aware Calibration for the AX=YB Problem

Hand-eye calibration gets a 67% accuracy boost in high-uncertainty scenarios thanks to a new optimization framework that cleverly avoids explicit uncertainty modeling.

Yanjia Chen, Xiangfei Li, Huan Zhao +4

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Guy Damari +62w ago

AI-Aided Advancements in Autonomous Underwater Vehicle Navigation

AI is enabling a new generation of AUV navigation systems that overcome the limitations of traditional model-based approaches in complex underwater environments.

Guy Damari, Zeev Yampolsky, Nadav Cohen +4

Robotics & Embodied AI Tool Use & Agents

Gaolin Ge +52w ago

3D Printing of Passively Actuated Self-Folding Robots with Integrated Functional Modules

Forget complex assembly: this 3D printing technique lets you pop out functional, self-folding robots with integrated sensors and actuators directly from a flat sheet.

Gaolin Ge, Qifeng Yang, Haoran Lu +3

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights Robotics & Embodied AI

2w ago·also D observations. In contrast, D-Perception to

ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation

Robotic manipulation gets a serious upgrade: ConsisVLA-4D boosts performance by up to 41.5% and speeds up inference by 2.4x, all while ensuring your robot understands the scene in 3D *and* how it changes over time.

Wei Li, Jizhihui Liu, Li Yixing +3

Computer Vision Multimodal Models Robotics & Embodied AI

Himanshu Paudel +52w ago

A Closed-Form Dual-Barrier CBF Safety Filter for Holonomic Robots on Incrementally Built Occupancy Grid Maps

Guaranteeing robot safety in unknown environments doesn't require complex planning – this closed-form CBF filter does it with minimal computation.

Himanshu Paudel, Basanta Joshi, Dhirendra Raj Madai +3

Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring

Jieying Wang +32w ago

Standard camera auto-exposure is blind to the needs of remote heart-rate monitoring, but a new method closes the gap to enable robust in-vehicle driver monitoring.

Jieying Wang, Xinqi Cai, Caifeng Shan +1

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

2w ago·also BIT, XJTU

By grounding temporal Gaussian aggregation in spatial voxels, Ground4D achieves state-of-the-art 4D reconstruction in challenging off-road environments where existing methods falter.

Shuo Wang, Jilin Mei, Fuyang Liu +6

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Shuo Liu +52w ago

Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding

Stop feeding LLMs redundant and conflicting sensor data in autonomous driving: a new architecture slashes hallucinated entities by coordinating multi-sensor inputs *before* reasoning.

Shuo Liu, Lei Shi, Haowen Liu +3

Computer Vision Multimodal Models Robotics & Embodied AI

Muyao Peng +42w ago

Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection

Even with noisy initial matches, Angle-I2P leverages angular consistency and hierarchical attention to achieve state-of-the-art image-to-point cloud registration.

Muyao Peng, Shun Zou, Pei An +2

Computer Vision Multimodal Models Robotics & Embodied AI

Kaili Zheng +42w ago

InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery

Explicitly modeling human-object interactions boosts multi-person human mesh recovery accuracy by up to 9.9%, showing that interaction context is key to understanding human pose and shape in complex scenes.

Kaili Zheng, Kaiwen Wang, Xun Zhu +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Yihan Lin +62w ago

From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

Image-based latent actions are your secret weapon for long-horizon reasoning in VLAs, while action-based latent actions unlock complex motor coordination.

Yihan Lin, Haoyang Li, Yang Li +4

Computer Vision Multimodal Models Robotics & Embodied AI

Kwkfk2w ago·also D characteristics with stronger viewpoint invariance, D matches to, The Hunan Engineering Research Center of

ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting

Alpha-blending, a core optimization in 3D Gaussian Splatting, subtly hobbles feature learning, but a geometry-weighted fusion approach can unlock more accurate and efficient visual localization.

Yingdong Gu, Shaocheng Yan, Zhenjun Zhao +4

ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)

IMT Nord Europe2w ago·also Explain, University of Lille

Top-view RGB-D person re-identification is surprisingly feasible, even across modalities, despite the inherent challenges of viewpoint and modality variations.

Raphaël Delécluse, Hazem Wannous, Laurent Guimas

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

May 5, 2026

Dongyoung Kim +672w ago

RLDX-1 Technical Report

RLDX-1 achieves double the success rate of existing VLAs on complex humanoid tasks, suggesting a leap in robots' ability to handle contact-rich, dynamic manipulation.

Dongyoung Kim, Huiwon Jang, Myungkyu Koo +65

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Jianjie Fang +102w ago

iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework

Current world models struggle with basic physical interaction tasks like distance perception and trajectory following, highlighting a critical gap in their ability to simulate realistic environments.

Jianjie Fang, Yingshan Lei, Qinglin Wan +8

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

2w ago·also Google Research, Harvard, Northeastern, Notre Dame +2

Deco: Extending Personal Physical Objects into Pervasive AI Companion through a Dual-Embodiment Framework

Instead of creating new AI companions from scratch, Deco shows how to breathe new life into cherished physical objects by giving them a digital voice and personality powered by LLMs.

Zhihan Jiang, Meng Wu, Ruishi Zou +14

Natural Language Processing Robotics & Embodied AI Tool Use & Agents

Koichi Toida2w ago

Bodyless Presence: Reconsidering the Minimal Self in Immersive Video

Immersive video reveals that "being there" hinges more on feeling spatially located than having a virtual body, challenging conventional notions of embodiment in XR.

Koichi Toida

Computer Vision Natural Language Processing Robotics & Embodied AI

Mustafa Sakhaia +32w ago

InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making

Event cameras can significantly boost the reliability of autonomous driving in high-dynamic-range and high-speed scenarios, achieving perfect route completion in CARLA benchmarks.

Mustafa Sakhaia, Kaung Sithua, Min Khant Soe Okea +1

Computer Vision Multimodal Models Robotics & Embodied AI

M. Azad +12w ago

Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles

Fixed confidence thresholds are holding back explainable autonomous driving systems, but this new adaptive approach and dataset could unlock better performance and cross-cultural understanding.

M. Azad, S. B. Shokouhi

Computer Vision Interpretability & Mechanistic Interp Robotics & Embodied AI

2w ago·also Huawei

Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning

Automating stage lighting control across diverse venues is now possible without expert demonstrations, thanks to a novel imitation learning approach that decomposes global color distributions into individual light controls.

Zijian Zhao, Dian Jin, Zijing Zhou +1

Robotics & Embodied AI Speech & Audio

Sinan Bank +12w ago

OPENJ: A Conceptual Framework for Open-Source Digital Human Modeling and Ergonomic Assessment in a CAD Environment

An open-source alternative to expensive, proprietary digital human modeling software could democratize ergonomic analysis and workplace design.

Sinan Bank, Casey E. Eaton

Computer Vision Open-Source Models & Weights Robotics & Embodied AI

Kristy Sakano +22w ago

From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation

Guaranteeing safe robot navigation in unstructured environments just got easier: translate human language rules into formal logic, ground them with VLMs, and let the robot navigate.

Kristy Sakano, Kalonji Harrington, Mumu Xu

Multimodal Models Reasoning & Chain-of-Thought Robotics & Embodied AI

Takahiro Ishikawa-Aso +42w ago

ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management

Achieve a 2.9x reduction in end-to-end latency in ROS 2 communication by trading off scalability for simplicity in cross-process object lifetime management.

Takahiro Ishikawa-Aso, Atsushi Yano, K. Imai +2

Distributed Systems & Hardware Robotics & Embodied AI

Yazan Youssef +22w ago

ARMATA: Auto-Regressive Multi-Agent Task Assignment

End-to-end learning can beat even the best industrial solvers at multi-agent task assignment, improving solution quality by 20% while slashing computation time from hours to seconds.

Yazan Youssef, A. Noureldin, S. Givigi

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Tsinghua AI2w ago

SigLoMa: Learning Open-World Quadrupedal Loco-Manipulation from Ego-Centric Vision

Quadrupedal robots can now perform dynamic loco-manipulation in the real world, matching human teleoperation, using only onboard ego-centric vision and a low-frequency (5Hz) open-vocabulary detector.

Shiyi Chen, Haiyi Liu, Ming Yang +2

Computer Vision Robotics & Embodied AI World Models & Planning

Ana Maria Nascimento +32w ago

Sensorless State Estimation and Control for Agile Cable-Suspended Payload Transport by Quadrotors

Ditching load sensors and directly embedding cable constraints into the quadrotor's control loop unlocks more precise and robust aerial manipulation.

Ana Maria Nascimento, Augusto Sales, A. Lima +1

Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control

Yoshinari Takayama +42w ago

Guaranteeing safety and liveness in complex control systems doesn't require monolithic design; this work shows how to decompose the problem across layers with formal contracts.

Yoshinari Takayama, A. Iovine, Bart Besselink +2

Task-Aware Scanning Parameter Configuration for Robotic Inspection Using Vision Language Embeddings and Hyperdimensional Computing

Zhiling Chen +52w ago

Forget tedious manual tuning: ScanHD lets robots autonomously configure laser profilers based on natural language instructions and visual context, achieving >92% accuracy in real-world inspection tasks.

Zhiling Chen, David J. Gorsich, Matthew P. Castanier +3

Computer Vision Robotics & Embodied AI Tool Use & Agents

Andrea Iannoli +42w ago

Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones

LLMs alone can't reliably fly drone swarms from natural language commands; task-specific tools and runtime guardrails are essential for real-world cyber-physical system control.

Andrea Iannoli, Lorenzo Gigli, L. Sciullo +2

Reasoning & Chain-of-Thought Robotics & Embodied AI Tool Use & Agents

Ho Jae Lee +52w ago

Learning Reactive Dexterous Grasping via Hierarchical Task-Space RL Planning and Joint-Space QP Control

Reactive dexterous grasping can be achieved with zero-shot transfer to real-world objects by decoupling high-level RL planning from low-level QP control, enabling dynamic adjustments to safety margins without retraining.

Ho Jae Lee, Yonghyeon Lee, Alexander Alexiev +3

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Shinas Shaji +32w ago

Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior

LLMs spontaneously exhibit collaborative behaviors like perspective-taking and theory of mind in embodied settings, suggesting a surprising capacity for modeling human collaborators without explicit training.

Shinas Shaji, Teena Hassan, Sebastian Houben +1

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

Yibang Tang +42w ago

SOAR: Real-Time Joint Optimization of Order Allocation and Robot Scheduling in Robotic Mobile Fulfillment Systems

Achieve 15% faster order completion in warehouse robotics with a new deep reinforcement learning approach that jointly optimizes robot scheduling and order allocation in real-time.

Yibang Tang, Yifan Yang, Jingyuan Wang +2

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Hao Wu +122w ago

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

Robot video world models can be significantly improved by distilling a multimodal reward function and stabilizing long-horizon inference, leading to better instruction following and manipulation accuracy.

Hao Wu, Yuqi Li, Yuan Gao +10

Multimodal Models Robotics & Embodied AI World Models & Planning

Prasoon Kumar +22w ago

Robust Visual SLAM for UAV Navigation in GPS-Denied and Degraded Environments: A Multi-Paradigm Evaluation and Deployment Study

Classical SLAM algorithms crumble under visual degradation, but deep learning approaches like MASt3R and DUSt3R maintain impressive localization accuracy, suggesting a path to robust UAV autonomy in challenging environments.

Prasoon Kumar, Akshay Deepak, Sandeep Kumar

Jiao: Bridging Isolation and Customization in Mixed Criticality Robotics

James Yen +72w ago

Achieve near order-of-magnitude reduction in tail timing error in mixed-criticality robotics by decoupling safety-critical control from user applications.

James Yen, Zhibai Huang, Zhixiang Wei +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Robotics & Embodied AI

Zhiyuan Li +62w ago

Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

Robots can now learn manipulation skills from human videos with greater morphological accuracy and temporal consistency, thanks to a new method that disentangles task and embodiment.

Zhiyuan Li, Wenyan Yang, Wenshuai Zhao +4

Computer Vision Multimodal Models Robotics & Embodied AI

Timon Homberger +42w ago

FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

Achieve scalable open-vocabulary semantic maps of entire buildings by fusing both dense and instance-level semantic information in a novel dual-layer voxel representation.

Timon Homberger, F. Busch, Jes'us Gerardo Ortega Peimbert +2

Computer Vision Multimodal Models Robotics & Embodied AI

Panagiotis Rousseas +12w ago

Feasibility-aware Hybrid Control for Motion Planning under Signal Temporal Logics

Escape deadlocks and choreograph robots through complex tasks with this new hybrid control architecture that merges planning and control.

Panagiotis Rousseas, Dimos V. Dimarogonas

BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

Chenhao Yu +52w ago

Unlock agile humanoid robots by ditching teleoperation and training directly from human VR demos.

Chenhao Yu, Hongwu Wang, Youhao Hu +3

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

University of Surrey2w ago

TACO: Trajectory Aligning Cross-view Optimisation

Ditch the GPS: This CVGL pipeline achieves a 5.9x improvement in localization accuracy over IMU-only by intelligently fusing satellite imagery with inertial measurements, needing only a single initial GPS fix.

Tavis Shore, Oscar Mendez, Simon Hadfield

Computer Vision Multimodal Models Robotics & Embodied AI

Sergio A. Esteban +32w ago

On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive Control

Domain randomization doesn't just make your robot policies more robust; it fundamentally warps the optimization landscape, potentially guiding your search towards better contact-rich behaviors.

Sergio A. Esteban, Junheng Li, Vince Kurtz +1

Neural Control: Adjoint Learning Through Equilibrium Constraints

Dezhong Tong +52w ago

Differentiating through physical simulations just got a whole lot easier: Neural Control avoids unrolling iterative solvers by using an adjoint formulation, enabling memory-efficient gradient-based control.

Dezhong Tong, Jiawen Wang, Hengyi Zhou +3

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Shugen Song +22w ago

Robust Path Tracking for Vehicles via Continuous-Time Residual Learning: An ICODE-MPPI Approach

Autonomous vehicles can now stick to the plan even with disturbances, thanks to a new control method that learns and compensates for unmodeled dynamics.

Shugen Song, Wenjie Mei, Chen Zhao