CMU Machine Learning

×Robotics & Embodied AI

23 papers from CMU Machine Learning on Robotics & Embodied AI

Apr 30, 2026

CMU ML3w ago·also NEC Labs America

PhyCo: Learning Controllable Physical Priors for Generative Motion

Control over physical properties like friction and restitution in generated videos is now possible, paving the way for more realistic and controllable video synthesis.

Sriram Narayanan, S. Narayanan, Ziyu Jiang +3

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI+1

Apr 28, 2026

CMU ML3w ago·also NVIDIA, Georgia Tech, Princeton

KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning

Existing robotic methods falter in tackling fundamental physical reasoning challenges, as evidenced by KinDER's rigorous benchmark evaluation.

Yixuan Huang, Bowen Li, Vaibhav Saxena +12

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Apr 22, 2026

CMU MLApr 22, 2026

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.

Shan Zhong, Shanshan Zhong, Yiming Lu +17

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

Apr 21, 2026

CMU MLApr 21, 2026

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.

Yiyang Du, Zhanqiu Guo, Xin Ye +2

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Apr 16, 2026

CMU MLApr 16, 2026

CoGrid&the Multi-User Gymnasium: A Framework for Multi-Agent Experimentation

Democratizing human-AI interaction research, CoGrid and MUG offer accessible tooling for deploying web-based multi-agent experiments.

Chase McDonald, Cleotilde Gonzalez

Open-Source Models & Weights Robotics & Embodied AI Tool Use & Agents

Apr 14, 2026

CMU MLApr 14, 2026

Pi-HOC: Pairwise 3D Human-Object Contact Estimation

Unlock 20x faster and more accurate 3D human-object contact estimation in complex, multi-person scenes with Pi-HOC, a framework that doesn't require object meshes.

Sravan Chittupalli, Ayush Jain, Dong Huang

Computer Vision Multimodal Models Robotics & Embodied AI

CMU MLApr 14, 2026·also UT Arlington

Learning Versatile Humanoid Manipulation with Touch Dreaming

Humanoid robots can now perform complex, contact-rich manipulation tasks with significantly improved dexterity and success by "dreaming" about the feel of their actions.

Yaru Niu, Zhenlong Fang, Binghong Chen +12

Robotics & Embodied AI World Models & Planning

Apr 13, 2026

CMU MLApr 13, 2026

Disentangled Point Diffusion for Precise Object Placement

Disentangling object geometry from placement frame diffusion yields surprisingly high accuracy in robotic manipulation, even surpassing SE(3)-diffusion methods.

Lyuxing He, Lyuxing He, Eric Cai +7

Computer Vision Robotics & Embodied AI

Apr 12, 2026

Apr 12, 2026·also CMU ML, PKU

AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

Achieve sub-centimeter robotic placement accuracy from compositional language instructions by decomposing the task into visual goal representation and goal-conditioned execution.

Zhaofeng Hu, Sifan Zhou, Qinbo Zhang +3

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 9, 2026

CMU MLApr 9, 2026·also Northeastern, Tongji

Visually-grounded Humanoid Agents

Imagine populating any 3D environment with digital humans that spontaneously navigate and interact, driven only by visual input and goals.

Hang Ye, Hang Ye, Xiaoxuan Ma +7

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Apr 7, 2026

Apr 7, 2026·also CMU ML, BIT, Yale

SonoSelect: Efficient Ultrasound Perception via Active Probe Exploration

Get more from less: SonoSelect intelligently guides ultrasound probes to achieve comparable diagnostic accuracy with far fewer views, slashing scanning time and processing costs.

Yixin Zhang, Yunzhong Hou, Longqi Li +2

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Mar 17, 2026

CMU MLMar 17, 2026·also NVIDIA, Keio

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

By fusing IMU and insole pressure data within a physics simulation, GRIP achieves more physically plausible human motion capture than IMU-only methods.

Ryosuke Hori, Jyun-Ting Song, Zhengyi Luo +4

Robotics & Embodied AI World Models & Planning

Mar 11, 2026

CMU MLMar 11, 2026·also Keio, Preferred Networks

Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

AssistMimic enables humanoid robots to learn complex, force-exchanging assistive motions by reformulating imitation learning as a multi-agent RL problem.

Yuto Shibata, Kashu Yamazaki, Lalit Jayanti +3

RLHF & Preference Learning Robotics & Embodied AI

CMU MLMar 11, 2026·also NC State, SNU, U. Hill

Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation

Injecting muscle synergy priors into reinforcement learning drastically improves the realism of simulated human locomotion, even with limited real-world data.

I. Park, Eunsik Choi, Jangwhan Ahn +4

Robotics & Embodied AI World Models & Planning

Mar 4, 2026

CMU MLMar 4, 2026·also BAIR, Meta AI, Brown, UT Austin

Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

Unsupervised discovery of object keypoints and dynamics directly from video unlocks state-of-the-art world models applicable to decision-making.

Tal Daniel, Carl Qi, Dan Haramati +5

Computer Vision Robotics & Embodied AI World Models & Planning

CMU MLMar 4, 2026·also BAIR, MIT CSAIL, NVIDIA, Tsinghua AI +11

ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning

Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.

Kenny Kimble, Kenneth Kimble, Edward H. Adelson +23

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Mar 1, 2026

CMU MLMar 1, 2026

riMESA: Consensus ADMM for Real-World Collaborative SLAM

Achieve 7x accuracy gains in real-world collaborative SLAM by using a robust, distributed optimization algorithm resilient to communication limits and noisy data.

Daniel McGann

Computer Vision Distributed Systems & Hardware Robotics & Embodied AI

Feb 25, 2026

Feb 25, 2026·also CMU ML, UNC

LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

By decomposing long-horizon manipulation into transport and object-centric interaction, LiLo-VLA achieves state-of-the-art zero-shot generalization and robustness, outperforming end-to-end VLA models by a large margin.

Shuo Cheng, Daniel Szafir

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Feb 23, 2026

CMU MLFeb 23, 2026

Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

Unlabeled monocular videos can now be used to train state-of-the-art 3D/4D reconstruction systems, thanks to a factored flow prediction approach that disentangles geometry and pose learning.

Zhongxiao Cong, Qitao Zhao, Shubham Tulsiani

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Feb 23, 2026·also CMU ML, Northwestern

Positioning Modular Co-Design in Future HRI Design Research

Modularity in HRI isn't just about interchangeable parts; it's a powerful design medium for fostering long-term, evolving relationships between humans and robots.

Lingyun Chen, Qing Xiao, Zitao Zhang +2

Natural Language Processing Robotics & Embodied AI Tool Use & Agents

Feb 16, 2026

CMU MLFeb 16, 2026·also DeepMind

BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames

Robots can now learn long-horizon tasks far more effectively by distilling complex histories into a few key visual moments, outperforming standard imitation learning by 70% on real-world tasks.

Max Sobol Mark, Jacky Liang, Maria Attarian +2

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Feb 14, 2026

BAIRFeb 14, 2026·also CMU ML, Google Research, Department of Computational and Data

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

Key contribution not extracted.

Youngsun Wi, Jessica Yin, Jessica Yin +7

Multimodal Models Robotics & Embodied AI

Feb 13, 2026

Tsinghua AIFeb 13, 2026·also CMU ML, HIT, Lumos Robotics *Equal contribution, Peking Unviersity +1

RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

Forget static datasets – RL-based co-training unlocks +20% real-world VLA performance by interactively leveraging simulation while preserving real-world capabilities.

Yinuo Chen, Kang Chen, Tonghe Zhang +2

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

Search

CMU Machine Learning