Robotics & Embodied AI - Weekly Roundup

RL-Augmented MPC for Non-Gaited Legged and Hybrid Locomotion

All Papers (100)

Mar 11, 2026

Italian Institute of Technology (IIT)3w ago

Unlock zero-shot sim-to-real transfer for complex legged robots by offloading gait selection to a learned policy that guides a lower-level MPC.

Andrea Patrizi, Carlo Rizzardo, Arturo Laurenzi +3

Robotics & Embodied AI Tool Use & Agents World Models & Planning

3w ago

FG-CLTP: Fine-Grained Contrastive Language Tactile Pretraining for Robotic Manipulation

Tactile robotic perception gets a boost with a new pretraining method that explicitly encodes force, geometry, and orientation, leading to a 52% reduction in regression error.

Wenxuan Ma, Chaofan Zhang, Yinghao Cai +3

Computer Vision Multimodal Models Robotics & Embodied AI

CMU ML3w ago·also Keio, Preferred Networks

Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

AssistMimic enables humanoid robots to learn complex, force-exchanging assistive motions by reformulating imitation learning as a multi-agent RL problem.

Yuto Shibata, Kashu Yamazaki, Lalit Jayanti +3

RLHF & Preference Learning Robotics & Embodied AI

Zirui Zhang +33w ago

TacLoc: Global Tactile Localization on Objects from a Registration Perspective

Ditch the pre-trained models: TacLoc achieves accurate robotic pose estimation from tactile sensing alone by framing it as a one-shot point cloud registration problem.

Zirui Zhang, Boyang Zhang, Fumin Zhang +1

DepthCache: Depth-Guided Training-Free Visual Token Merging for Vision-Language-Action Model Inference

Yuquan Li +33w ago

Achieve up to 1.28x faster VLA model inference for robotic manipulation without retraining, simply by merging visual tokens based on depth.

Yuquan Li, Lianjie Ma, Han Ding +1

Inference & Quantization Multimodal Models Robotics & Embodied AI

3w ago

Multi-Robot Multitask Gaussian Process Estimation and Coverage

Multi-robot coverage can now handle multiple sensory demands simultaneously, with provable guarantees on performance even when those demands are initially unknown.

Lai Wei, Andrew M. McDonald, Vaibhav Srivastava

MapGCLR: Geospatial Contrastive Learning of Representations for Online Vectorized HD Map Construction

3w ago

Self-supervised learning can dramatically improve online HD map construction, outperforming supervised methods even with limited labeled data by leveraging geospatial consistency in BEV feature representations.

Jonas Merkert, Alexander Blumberg, Jan-Hendrik Pauls +1

Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control

M. Vahs +23w ago

Achieve real-time safety-critical robot control in partially observable environments by decoupling goal reaching, information gathering, and safety into modular, certificate-based components operating directly in belief space.

M. Vahs, Joris Verhagen, Jana Tumova

RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation

Jiarui Yang3w ago

VLA-controlled robots can now detect anomalies in under 100ms using a plug-and-play module, enabling real-time recovery from unexpected situations.

Jiarui Yang

Computer Vision Multimodal Models Robotics & Embodied AI

3w ago·also Texas A&M

ResWM: Residual-Action World Model for Visual RL

Stop wrestling with unstable action spaces: ResWM reframes visual RL by predicting incremental action adjustments, leading to smoother control and better performance.

Jseen Zhang, Gabriel Adineera, Jinzhou Tan +1

Computer Vision Robotics & Embodied AI World Models & Planning

3w ago

Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

Unlock bimanual-level cloth manipulation with a single robotic arm using a novel tactile gripper and vision-based perception framework.

Dongmyoung Lee, Wei Chen, Xiaoshuai Chen +2

A Causal Approach to Predicting and Improving Human Perceptions of Social Navigation Robots

Maximilian Diehl +43w ago

Robots can boost their perceived competence by 83% simply by tweaking navigation behaviors suggested by a causal Bayesian network.

Maximilian Diehl, Nathan Tsoi, Gustavo Chávez +2

Interpretability & Mechanistic Interp Robotics & Embodied AI

Kejin Yu +53w ago

A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

Autonomous driving's next leap hinges on reasoning, not just perception, but current LLM-based approaches are too slow for real-time control.

Kejin Yu, Yuhan Sun, Taiqiang Wu +3

Multimodal Models Reasoning & Chain-of-Thought Robotics & Embodied AI

BAIR3w ago

Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics

Ditch the clunky controllers: this hand-shadowing pipeline lets you teleoperate a robot arm with just an RGB-D camera and some clever inverse kinematics.

Hendrik Chiche, Antoine Jamme, Trevor Rigoberto Martinez

Shape Control of a Planar Hyper-Redundant Robot via Hybrid Kinematics-Informed and Learning-based Approach

MIT CSAIL3w ago

Hyper-redundant robots get a 75% accuracy boost thanks to a neural network that adaptively blends learned behavior with kinematic priors.

Y. Song, Wenbo Li, Wenci Xin +3

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

3w ago·also IIT Bombay, Shanghai AI Lab, Unitree

SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

Humanoid robots can now reliably transport objects on a tray in the real world, thanks to a hierarchical RL approach that isolates and cancels gait-induced disturbances.

Anlun Huang, Zhenyu Wu, Simranjeet Singh +2

Robotics & Embodied AI Training Efficiency & Optimization

Yi-Kai Zhang +63w ago·also Plus MMStar RealWorldQA Method

$V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts

Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.

Yi-Kai Zhang, Yueqing Sun, Hongyan Hao +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

3w ago·also Department of Radiation Oncology, Institute for AI and Data Science, Wayne State University

WalkGPT: Grounded Vision-Language Conversation with Depth-Aware Segmentation for Pedestrian Navigation

LVLMs can now provide depth-aware pedestrian navigation guidance by grounding language reasoning and segmentation, without needing user-provided cues or anchor points.

R. Sultan, Hui Zhu, Xiangyu Zhou +4

Computer Vision Multimodal Models Robotics & Embodied AI

3w ago

ASTER: Attitude-aware Suspended-payload Quadrotor Traversal via Efficient Reinforcement Learning

Achieve the seemingly impossible: ASTER uses RL to enable cable-suspended quadrotors to perform autonomous inverted flight.

Dongcheng Cao, Jin Zhou, Shuo Li

Robotics & Embodied AI Training Efficiency & Optimization

Jonathan Cox +43w ago

Semantic Landmark Particle Filter for Robot Localisation in Vineyards

Robots lost in the vineyard? Not anymore: encoding row-level semantics into a particle filter enables robust localization in repetitive agricultural environments where LiDAR and vision alone fail.

Jonathan Cox, James R. Heselden, Marija Popovi'c +2

Type-safe Monitoring of Parameterized Streams

Saarland University3w ago·also DFKI

Guarantee runtime safety in complex cyber-physical systems with unbounded data domains using a refinement type system for parameterized streams, even though it's generally undecidable.

Jan Baumeister, Bernd Finkbeiner, Florian Kohn

Distributed Systems & Hardware Robotics & Embodied AI

Itsuki Hirako +53w ago

ScanDP: Generalizable 3D Scanning with Diffusion Policy

Forget training on massive datasets: this new diffusion policy learns human-like 3D scanning strategies that generalize to unseen objects while being robust to noise.

Itsuki Hirako, R. Hakoda, Yubin Liu +3

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

3w ago·also York

Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure

Training embodied intelligence models just got 40x faster thanks to a thousand-GPU cloud platform and a suite of optimizations spanning data pipelines, model architecture, and infrastructure.

Haoran Sun, Hedan Yang, Jing Long +19

Distributed Systems & Hardware Robotics & Embodied AI Training Efficiency & Optimization

Fanqi Yu +43w ago

Lifelong Imitation Learning with Multimodal Latent Replay and Incremental Adjustment

Forget catastrophic forgetting: this imitation learning framework remembers up to 65% more while improving AUC by 10-17 points on the LIBERO benchmark.

Fanqi Yu, Matteo Tiezzi, Tommaso Apicella +2

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Chuanlong Zang +53w ago

GRACE: A Unified 2D Multi-Robot Path Planning Simulator&Benchmark for Grid, Roadmap, And Continuous Environments

Finally, a multi-robot path planning benchmark that lets you directly compare grid-based, roadmap, and continuous planners on the same tasks.

Chuanlong Zang, Anna Mannucci, Isabelle Barz +3

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Rokuto Nagata +43w ago

D-SLAMSpoof: An Environment-Agnostic LiDAR Spoofing Attack using Dynamic Point Cloud Injection

Even in feature-rich environments, LiDAR SLAM systems are vulnerable to a new spoofing attack (D-SLAMSpoof) that injects dynamically coordinated spurious point clouds, but can be defended against using inertial dead reckoning.

Rokuto Nagata, Kenji Koide, Kazuma Ikeda +2

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Zixing Wang +33w ago

PPGuide: Steering Diffusion Policies with Performance Predictive Guidance

Steer your robot's diffusion policy away from failure modes at inference time with a lightweight performance predictor trained via self-supervised attention.

Zixing Wang, Devesh K. Jha, A. H. Qureshi +1

Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation

Zixuan Liu +63w ago

Forget hand-crafted rewards: this new method learns dexterous manipulation by encouraging the robot hand to explore diverse contact patterns on objects, leading to impressive real-world transfer.

Zixuan Liu, Ruoyi Qiao, Chenrui Tie +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Adrian Andrei Buda +33w ago

Robust Co-design Optimisation for Agile Fixed-Wing UAVs

Robust co-design optimization can significantly improve the performance of agile UAVs in real-world environments by directly incorporating uncertainty and disturbances into the design process.

Adrian Andrei Buda, Xavier Chen, Nicolò Botteghi +1

AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments

Zixuan Chen +83w ago

Robots can now adaptively decide whether to clear clutter or directly grasp, leading to significantly improved success rates in densely cluttered environments.

Zixuan Chen, Wenquan Zhang, Jing Fang +6

Computer Vision Robotics & Embodied AI Tool Use & Agents

Rokuto Nagata +53w ago

MirrorDrift: Actuated Mirror-Based Attacks on LiDAR SLAM

Forget signal injection – a strategically placed, actuated mirror can now hijack even the most secure LiDAR SLAM systems, inducing localization errors exceeding 6 meters.

Rokuto Nagata, Kenji Koide, Kazuma Ikeda +3

Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Penghua Ren +73w ago

Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-Manipulation

Achieve robust humanoid task execution in complex environments by turning high-level language instructions into verifiable, geometrically-grounded task programs that can recover from failures.

Penghua Ren, Haoyang Ge, Chuan Qi +5

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Shuyao Shang +113w ago

DynVLA: Learning World Dynamics for Action Reasoning in Autonomous Driving

By forecasting compact world dynamics before taking action, DynVLA leapfrogs traditional CoT methods to achieve more informed and physically grounded autonomous driving decisions.

Shuyao Shang, Binghan Zhan, Yunfei Yan +9

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

Niusha Khosravi +23w ago

Distributed Kalman--Consensus Filtering with Adaptive Uncertainty Weighting for Multi-Object Tracking in Mobile Robot Networks

By adaptively weighting neighbor information based on uncertainty, distributed multi-object tracking can achieve significantly better performance in mobile robot networks with heterogeneous localization quality.

Niusha Khosravi, R. Ventura, M. Basiri

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

M. Anwar +53w ago

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Multi-robot systems can slash battery consumption by 15% and boost GPU utilization by 50% for large DNN inference by using a hybrid offline-online reinforcement learning strategy to dynamically schedule and distribute DNN module execution.

M. Anwar, Anuradha Ravi, Indrajeet Ghosh +3

Distributed Systems & Hardware Inference & Quantization Robotics & Embodied AI

3w ago

Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning

Robots can now learn to manipulate novel objects in dynamic environments by using LLMs to bridge the gap between symbolic planning and reinforcement learning.

Hong Lu, Pierrick Lorang, Timothy R. Duggan +2

Robotics & Embodied AI Tool Use & Agents World Models & Planning

ETH3w ago·also CMU ML

ADMM-based Continuous Trajectory Optimization in Graphs of Convex Sets

Unlock superior trajectories in complex environments with a new ADMM-based solver that jointly optimizes spatial and temporal domains, eliminating the need for complex warm starting.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Manish Kumar +33w ago

Sublinear-Time Reconfiguration of Programmable Matter with Joint Movements

Forget sequential robot moves: coordinated "amoebot" swarms can morph into target shapes in near-instant time.

Manish Kumar, Othon Michail, Andreas Padalkin +1

Recover to Predict: Progressive Retrospective Learning for Variable-Length Trajectory Prediction

Tsinghua AI3w ago·also Baidu, IEEE

Incomplete trajectory data got you down? This plug-and-play framework progressively aligns features from incomplete observations with complete ones, boosting prediction accuracy in autonomous driving scenarios.

Hao Zhou, Lu Qi, Jason Li +5

OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency

3w ago

Achieve 2.5x higher success in UAV navigation by decoupling target generation from progress monitoring, enabling safer and more efficient zero-shot flight.

Guiyong Zheng, Y. Ban, Mingjie Zhang +2

Computer Vision Multimodal Models Robotics & Embodied AI

Yilin Zou +33w ago

Parallel-in-Time Nonlinear Optimal Control via GPU-native Sequential Convex Programming

Trajectory optimization just got a whole lot faster and more energy-efficient: a GPU-native solver achieves 4x speedup and halves energy consumption compared to optimized CPU baselines.

Yilin Zou, Zhong Zhang, Maxime Robic +1

Distributed Systems & Hardware Robotics & Embodied AI Training Efficiency & Optimization

University of Nebraska-Lincoln3w ago

STADA: Specification-based Testing for Autonomous Driving Agents

Achieve 2x better coverage of autonomous driving safety requirements with 6x fewer simulations by automatically generating test scenarios from formal LTLf specifications.

Joy Saha, Trey Woodlief, Sebastian Elbaum +1

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

CMU ML3w ago·also NC State, SNU, U. Hill

Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation

Injecting muscle synergy priors into reinforcement learning drastically improves the realism of simulated human locomotion, even with limited real-world data.

I. Park, Eunsik Choi, Jangwhan Ahn +4

Dynamic Modeling and Attitude Control of a Reaction-Wheel-Based Low-Gravity Bipedal Hopper

Shriram Hari +33w ago

Reaction wheels can dramatically stabilize bipedal hopping robots in low-gravity environments, enabling more consistent upright landings on irregular extraterrestrial terrains.

Shriram Hari, M. Venkata, S. Nikhil +1

SUBTA: A Framework for Supported User-Guided Bimanual Teleoperation in Structured Assembly

Xiao Liu +63w ago

Achieve significantly higher accuracy and lower mental demand in bimanual teleoperation by intelligently coupling intention estimation with scene-graph task planning and context-aware motion assistance.

Xiao Liu, Prakash Baskaran, Songpo Li +4

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Italian Institute of Technology3w ago

BinWalker: Development and Field Evaluation of a Quadruped Manipulator Platform for Sustainable Litter Collection

A quadruped robot can now autonomously navigate rough terrain and pick up trash, potentially revolutionizing environmental cleanup in areas inaccessible to traditional robots.

Giulio Turrisi, Angelo Bratta, G. Minelli +4

Robotics & Embodied AI Tool Use & Agents

Lianjie Ma +53w ago

AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

Monocular depth estimation can now run at 161 FPS on edge devices without sacrificing too much accuracy, thanks to a clever asynchronous architecture that reuses features from a foundation model.

Lianjie Ma, Yuquan Li, Bi-Ye Jiang +3

Computer Vision Inference & Quantization Robotics & Embodied AI

3w ago·also Huawei, SUSTech, UAlberta

Edge-Assisted Multi-Robot Visual-Inertial SLAM With Efficient Communication

Achieve high-precision multi-robot SLAM with minimal data transmission by selectively compressing and transmitting keyframes and non-keyframes in a cloud-edge-robot architecture.

Xin Liu, Shuhuan Wen, Jing Zhao +249

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

S. Song +33w ago

Overcoming Visual Clutter in Vision Language Action Models via Concept-Gated Visual Distillation

A training-free visual distillation method boosts VLA model performance in cluttered environments by over 34%, proving that targeted noise reduction is more effective than brute-force scaling.

S. Song, S. Kodagoda, Marc Carmichael +1

Computer Vision Multimodal Models Robotics & Embodied AI

NUS3w ago·also Imperial

Adaptive Manipulation Potential and Haptic Estimation for Tool-Mediated Interaction

Robots can now loosen screws with human-level dexterity thanks to a new framework that combines haptic estimation, online planning, and adaptive stiffness control using a parameterized Equilibrium Manifold.

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Mohammed Aman Bhuiyan +53w ago·also North South University

STM32-Based Smart Waste Bin for Hygienic Disposal Using Embedded Sensing and Automated Control

A simple, low-cost smart waste bin design achieves touch-free operation using commodity STM32 microcontrollers and ultrasonic sensors.

Mohammed Aman Bhuiyan, Aritra Islam Saswato, Md. Misbah Khan +3

KnowDiffuser: A Knowledge-Guided Diffusion Planner with LM Reasoning and Prior-Informed Trajectory Initialization

Fan Ding +63w ago

By fusing language model reasoning with diffusion-based trajectory generation, KnowDiffuser leapfrogs existing autonomous driving planners on the nuPlan benchmark.

Fan Ding, Xuewen Luo, Fengze Yang +4

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

3w ago

A gripper for flap separation and opening of sealed bags

A new gripper design automates the tedious and injury-prone task of opening sterile medical pouches, freeing up nurses from a physically demanding, repetitive procedure.

S. Foix, Jaume Oriol, Carme Torras +1

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

3w ago

A single meta-RL policy can now handle 66% mass variations and 70% rotor thrust losses in quadrotors, achieving zero-shot sim-to-real transfer for agile maneuvers.

Jin Zhou, Dongcheng Cao, Xian Wang +1

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

3w ago

Rethinking Gaussian Trajectory Predictors: Calibrated Uncertainty for Safe Planning

Gaussian trajectory predictors often lie about their confidence, but a new loss function leveraging Kernel Density Estimation can make them more honest, leading to safer autonomous navigation.

Fatemeh Cheraghi Pouria, Mahsa Golchoubian, Katie Driggs-Campbell

FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model

3w ago·also Shanghai AI Lab

By decoupling visual and motor information during pretraining, FutureVLA unlocks more effective visuomotor prediction for vision-language-action models, boosting performance without modifying downstream architectures.

Xiaoxu Xu, Ji-lu Ye, Yilun Chen +6

Multimodal Models Robotics & Embodied AI World Models & Planning

Eugene Ku +13w ago

PC-Diffuser: Path-Consistent Capsule CBF Safety Filtering for Diffusion-Based Trajectory Planner

Guaranteeing safety in diffusion-based trajectory planning is now possible by embedding a certifiable barrier function directly into the denoising loop, ensuring forward invariance and preserving the learned path geometry.

Eugene Ku, Yiwei LyuCode

DiT4DiT: Jointly Modeling Video Dynamics and Actions for Generalizable Robot Control

Mondo Robotics3w ago·also B trainable parameters, Soyeon Caren Han is the corresponding

By jointly modeling video dynamics and actions, DiT4DiT achieves 10x sample efficiency and 7x faster convergence in robot policy learning, showing that video generation can be a powerful scaling proxy.

Teli Ma, Jiayu Zheng, Zifan Wang +4

Multimodal Models Robotics & Embodied AI World Models & Planning

R. University3w ago·also National Institute of Technology, Nippn Corporation, Osaka

Few-Shot Adaptation to Non-Stationary Environments via Latent Trend Embedding for Robotics

Forget fine-tuning: this method adapts robots to changing environments by learning a low-dimensional "Trend ID" embedding, keeping the core model fixed.

Yasuyuki Fujii, Emika Kameda, Hiroki Fukada +15

Robotics & Embodied AI Training Efficiency & Optimization

Cenk Çetin +23w ago·also School of Computer Science and Informatics

Learning Adaptive Force Control for Contact-Rich Sample Scraping with Heterogeneous Materials

Robots can now scrape vials like a human chemist, thanks to a reinforcement learning policy that adapts force in real-time based on visual feedback.

Cenk Çetin, Shreyas Pouli, Gabriella Pizzuto

Robotics & Embodied AI Scientific Discovery & Drug Design

Yushan Bai +53w ago

FAR-Dex: Few-shot Data Augmentation and Adaptive Residual Policy Refinement for Dexterous Manipulation

Overcoming the data scarcity bottleneck in robotic arm-hand coordination, FAR-Dex achieves over 80% real-world success in fine-grained dexterous manipulation tasks.

Yushan Bai, Fulin Chen, Hongzheng Sun +3

Data Curation & Synthetic Data Robotics & Embodied AI

Elisa Tosello +43w ago

Interleaving Scheduling and Motion Planning with Incremental Learning of Symbolic Space-Time Motion Abstractions

Achieve efficient task execution in shared workspaces by interleaving scheduling and motion planning, using symbolic feedback to guide the scheduler towards motion-feasible solutions.

Elisa Tosello, Arthur Bit-Monnot, Davide Lusuardi +2

NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

3w ago·also Figure, NTU

A nose-mounted microphone and vibration sensor combo unlocks robust, low-audibility speech interfaces for always-on AI interaction, even in noisy environments.

Jun Rekimoto, Yukino Nishimura, Bo Yang

Natural Language Processing Robotics & Embodied AI Speech & Audio

Mar 10, 2026

Binyuan Huang +73w ago

Robotic Scene Cloning:Advancing Zero-Shot Robotic Scene Adaptation in Manipulation via Visual Prompt Editing

Bypass the need for extensive on-site data collection when deploying pre-trained robot models by visually prompting them to adapt to new scenes.

Binyuan Huang, Yuqing Wen, Yucheng Zhao +5

$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

3w ago

Autonomous vehicles can now better "see" the world even when cameras fail, thanks to a new method that fills in the blanks by leveraging spatial overlaps and learned semantic priors.

Kaixin Lin, Di Wen, Yufan Chen +1

Computer Vision Multimodal Models Robotics & Embodied AI

Nankai University3w ago·also COWARobot Co. Ltd, Melbourne, NKIARI, TU Munich

VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models

By converting point clouds into a format VLMs can understand, VLM-Loc significantly boosts text-to-point-cloud localization accuracy, outperforming prior methods that rely on shallower text-point cloud correspondences.

Shuhao Kang, Youqi Liao, Peijie Wang +4

Computer Vision Multimodal Models Robotics & Embodied AI

Tom Wehrbein +13w ago·also L3S -Leibniz University Hannover

Improving 3D Foot Motion Reconstruction in Markerless Monocular Human Motion Capture

Fine-grained foot motion capture, a notoriously hard problem, gets a 30% accuracy boost by cleverly lifting 2D keypoints to 3D using motion capture data and contextual information, bypassing the need for direct image-3D annotation pairs.

Tom Wehrbein, Bodo Rosenhahn

STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation

3w ago

Forget manual labeling: STONE offers a massive, automatically-labeled dataset for off-road robot navigation, unlocking scalable training for robust 3D traversability prediction.

Konyul Park, Daehun Kim, Jiyong Oh +6

Computer Vision Multimodal Models Robotics & Embodied AI

3w ago

WESPR: Wind-adaptive Energy-Efficient Safe Perception&Planning for Robust Flight with Quadrotors

Drones can now proactively navigate turbulent environments thanks to a fast wind-prediction framework that integrates geometric perception and local weather data.

Khuzema Habib, P. Manjunath, Kasra Torshizi +2

ImpedanceDiffusion: Diffusion-Based Global Path Planning for UAV Swarm Navigation with Generative Impedance Control

3w ago

Ditch the map: a diffusion model learns to plan UAV swarm trajectories directly from RGB images, enabling reactive and adaptive navigation in cluttered environments.

Faryal Batool, Yasheerah Yaqoot, Muhammad Ahsan Mustafa +3

Multimodal Models Robotics & Embodied AI World Models & Planning

3w ago·also Shanghai AI Lab

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

Human-in-the-loop learning can now boost dexterous manipulation VLA models by 25%, thanks to a new framework that smartly samples corrective actions and enables real-time intervention.

Yifan Han, Zhongxi Chen, Yuxuan Zhao +5

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Milo Carroll +43w ago

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

Humanoid robots can now walk robustly in the real world using only onboard sensors, thanks to a new diffusion policy that implicitly learns state estimation.

Milo Carroll, Tianhu Peng, Lingfan Bao +2

Inference & Quantization Robotics & Embodied AI

3w ago·also CAS, PKU, SJTU

Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

Forget hand-crafted heuristics: this new dynamics-aware policy learns to exploit contact forces in cluttered environments, outperforming traditional methods by 25% in simulation and showing impressive sim-to-real transfer.

Yixin Zheng, Jiangran Lyu, Jiayi Chen +7

Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning

Tatjana Krau +33w ago

Physics-based dynamics models can make or break sim-to-real reinforcement learning, boosting real-world success by 50% in industrial control tasks where simplified models fail.

Tatjana Krau, Jorge Mandlmaier, Tobias Damm +1

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

3w ago

X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models

Unlock real-time semantic SLAM and multimodal interaction with 3D Gaussian Splatting using X-GS, a unified and extensible open framework.

Yueen Ma, Irwin King

Computer Vision Multimodal Models Robotics & Embodied AI

Yijun Shen +63w ago

Reward Prediction with Factorized World States

Forget hand-engineered reward functions: this method uses language models to learn factorized world states that generalize to new goals and environments, outperforming LLM-as-a-Judge in zero-shot reward prediction.

Yijun Shen, Delong Chen, Xianming Hu +4

Robust Regularized Policy Iteration under Transition Uncertainty

Zhenghui Fu +33w ago

Offline RL can be made more robust to distribution shift by directly optimizing against worst-case transition dynamics within an uncertainty set, leading to policies that avoid unreliable out-of-distribution actions.

Zhenghui Fu, Weihao Tang, Pengfei Wang +1

Ensuring Data Freshness in Multi-Rate Task Chains Scheduling

José Luis Conradi Hoffmann +13w ago

Ditch the latency tax of traditional scheduling: this new approach delivers data "just-in-time" for safety-critical systems, boosting performance without sacrificing reliability.

José Luis Conradi Hoffmann, Antônio Augusto Fröhlich

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Robotics & Embodied AI

Kadri-Ann Pankratov +83w ago

Receptogenesis in a Vascularized Robotic Embodiment

Imagine robots that can literally grow new sensors on demand, adapting to their environment in real-time through internal chemical reactions.

Kadri-Ann Pankratov, Leonid Zinatullin, Hans Priks +6

Caterpillar-Inspired Spring-Based Compressive Continuum Robot for Bristle-based Exploration

3w ago

A caterpillar-inspired robot can now squeeze into tight spaces and "feel" its way around using artificial bristles, offering a cost-effective upgrade for existing robotic arms.

Zhixian Hu

StyleVLA: Driving Style-Aware Vision Language Action Model for Autonomous Driving

3w ago

A 4B-parameter model outperforms Gemini-3-Pro in autonomous driving by incorporating physics-informed constraints and style-aware training, suggesting specialized models can surpass larger, general-purpose models in domain-specific tasks.

Yuan Gao, Dengyuan Hua, Mattia Piccinini +5

Computer Vision Multimodal Models Robotics & Embodied AI

Artemis Shaw +73w ago

Cutting the Cord: System Architecture for Low-Cost, GPU-Accelerated Bimanual Mobile Manipulation

A complete, GPU-accelerated bimanual mobile manipulation platform can be built for under $1300, opening up robotics research and education to a wider audience.

Artemis Shaw, Chen Liu, Justin Costa +5

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

3w ago

Walking on Rough Terrain with Any Number of Legs

Achieve robust locomotion for multi-legged robots on rough terrain with a surprisingly simple, decentralized control architecture that blends event-driven and CPG-based approaches.

Zhuoyang Chen, Xinyuan Wang, Shai Revzen

SurgFed: Language-guided Multi-Task Federated Learning for Surgical Video Understanding

Zheng Fang +63w ago

By incorporating language guidance into federated learning, SurgFed tackles the long-standing problem of tissue and task heterogeneity in surgical video understanding, leading to improved segmentation and depth estimation across diverse surgical settings.

Zheng Fang, Ziwei Niu, Ziyue Wang +4

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

3w ago

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

Finally, a GelSight-style sensor that doesn't force you to choose between pre-contact vision and high-fidelity tactile sensing.

Zhixian Hu, Zhengtong Xu, Sheeraz Athar

Computer Vision Multimodal Models Robotics & Embodied AI

3w ago

TopoOR: A Unified Topological Scene Representation for the Operating Room

Ditch the flat scene graphs: TopoOR models surgical environments as higher-order topological structures, unlocking superior performance in safety-critical tasks by preserving complex relationships and multimodal data.

Tony Danjun Wang, Ka Young Kim, Tolga Birdal +2

Computer Vision Multimodal Models Robotics & Embodied AI

MIT CSAIL3w ago·also D visual features, TJU

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

Zero-shot robotic manipulation is now within reach: TiPToP matches a 350-hour fine-tuned model without *any* robot data.

William Shen, Nishanth Kumar, Sahit Chintalapudi +7

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Jiarun Song +33w ago

TPIFM: A Task-Aware Model for Evaluating Perceptual Interaction Fluency in Remote AR Collaboration

Task demands in remote AR collaboration dictate how much network delay users can tolerate before perceived fluency breaks down, paving the way for adaptive systems.

Jiarun Song, Ninghao Wan, FuZheng Yang +1

Computer Vision Natural Language Processing Robotics & Embodied AI

Noboru Myers +33w ago

TRIP-Bag: A Portable Teleoperation System for Plug-and-Play Robotic Arms and Leaders

Collect high-quality robot manipulation data anywhere with TRIP-Bag, a teleoperation system that fits in a suitcase and sets up in under 5 minutes.

Noboru Myers, Sankalp Yamsani, Obin Kwon +1

MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

3w ago

Forget waiting hours: this MORL framework achieves 270x speedups on robotics tasks thanks to GPU-native parallelization.

Neil C. Janwani, Ellen R. Novoseller, V. Lawhern +1

Distributed Systems & Hardware Robotics & Embodied AI Training Efficiency & Optimization

3w ago·also SYSU

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

Unlock the power of web videos for embodied AI: implicit geometry representations let agents learn to navigate from real-world room tours without relying on fragile 3D reconstruction.

Mingfei Han, Haihong Hao, Liang Ma +5

Computer Vision Multimodal Models Robotics & Embodied AI

Rojin Zandi +23w ago

Beyond Amplitude: Channel State Information Phase-Aware Deep Fusion for Robotic Activity Recognition

Ignoring CSI phase information in robotic activity recognition is a mistake: fusing it with amplitude data in a novel gated BiLSTM architecture significantly boosts accuracy and robustness.

Rojin Zandi, H. Salehinejad, Milad Siami

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Md Selim Sarowar +23w ago

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

By representing visual inputs as 3D Gaussian primitives, GST-VLA unlocks a new level of geometric understanding for vision-language-action models, leading to substantial performance gains in robotic manipulation tasks.

Md Selim Sarowar, Omer Tariq, Sungho Kim

Computer Vision Multimodal Models Robotics & Embodied AI

3w ago·also NEC Laboratories America, University of Kansas

Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation

Stop letting simulator errors in critical regions derail your policies: Sim2Act aligns surrogate fidelity with downstream decision impact, leading to more stable and robust decision-making.

Hongyu Cao, Jinghan Zhang, Kunpeng Liu +5

Red-Teaming & Adversarial Robustness Robotics & Embodied AI World Models & Planning

3w ago

Reconstructing Movement from Sparse Samples: Enhanced Spatio-Temporal Matching Strategies for Low-Frequency Data

Overcoming GPS trajectory matching limitations in dense urban areas with low-frequency data is now more achievable thanks to enhanced spatio-temporal matching strategies.

Ali Yousefian, Arianna Burzacchi, Simone Vantini