Search papers, labs, and topics across Lattice.
23 papers from CMU Machine Learning on Robotics & Embodied AI
Control over physical properties like friction and restitution in generated videos is now possible, paving the way for more realistic and controllable video synthesis.
Existing robotic methods falter in tackling fundamental physical reasoning challenges, as evidenced by KinDER's rigorous benchmark evaluation.
Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.
VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.
Democratizing human-AI interaction research, CoGrid and MUG offer accessible tooling for deploying web-based multi-agent experiments.
Unlock 20x faster and more accurate 3D human-object contact estimation in complex, multi-person scenes with Pi-HOC, a framework that doesn't require object meshes.
Humanoid robots can now perform complex, contact-rich manipulation tasks with significantly improved dexterity and success by "dreaming" about the feel of their actions.
Disentangling object geometry from placement frame diffusion yields surprisingly high accuracy in robotic manipulation, even surpassing SE(3)-diffusion methods.
Achieve sub-centimeter robotic placement accuracy from compositional language instructions by decomposing the task into visual goal representation and goal-conditioned execution.
Imagine populating any 3D environment with digital humans that spontaneously navigate and interact, driven only by visual input and goals.
Get more from less: SonoSelect intelligently guides ultrasound probes to achieve comparable diagnostic accuracy with far fewer views, slashing scanning time and processing costs.
By fusing IMU and insole pressure data within a physics simulation, GRIP achieves more physically plausible human motion capture than IMU-only methods.
AssistMimic enables humanoid robots to learn complex, force-exchanging assistive motions by reformulating imitation learning as a multi-agent RL problem.
Injecting muscle synergy priors into reinforcement learning drastically improves the realism of simulated human locomotion, even with limited real-world data.
Unsupervised discovery of object keypoints and dynamics directly from video unlocks state-of-the-art world models applicable to decision-making.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Achieve 7x accuracy gains in real-world collaborative SLAM by using a robust, distributed optimization algorithm resilient to communication limits and noisy data.
By decomposing long-horizon manipulation into transport and object-centric interaction, LiLo-VLA achieves state-of-the-art zero-shot generalization and robustness, outperforming end-to-end VLA models by a large margin.
Unlabeled monocular videos can now be used to train state-of-the-art 3D/4D reconstruction systems, thanks to a factored flow prediction approach that disentangles geometry and pose learning.
Modularity in HRI isn't just about interchangeable parts; it's a powerful design medium for fostering long-term, evolving relationships between humans and robots.
Robots can now learn long-horizon tasks far more effectively by distilling complex histories into a few key visual moments, outperforming standard imitation learning by 70% on real-world tasks.
Forget static datasets – RL-based co-training unlocks +20% real-world VLA performance by interactively leveraging simulation while preserving real-world capabilities.