Apple MLLMUApr 13, 2026arXiv:2604.11737

Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Nick Stracke, Nick Stracke, Kolja Bauer, Kolja Bauer, Stefan Andreas Baumann, Stefan Andreas Baumann, Miguel Angel Bautista, Miguel Angel Bautista, Josh Susskind, Joshua Susskind, Björn Ommer, Bjorn Ommer

AI Summary

The paper introduces a method for efficient motion generation by learning a compressed, long-term motion embedding from tracker data with a 64x temporal compression factor. A conditional flow-matching model is trained in this latent space to generate motion conditioned on text prompts or spatial pokes. The resulting motion distributions outperform state-of-the-art video models and task-specific approaches in terms of efficiency and quality.

Key Contribution

Forget generating entire videos – this method distills motion into a highly compressed latent space, letting you steer scene dynamics with text prompts at unprecedented speeds.

Abstract

Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude more efficiently by directly operating on a long-term motion embedding that is learned from large-scale trajectories obtained from tracker models. This enables efficient generation of long, realistic motions that fulfill goals specified via text prompts or spatial pokes. To achieve this, we first learn a highly compressed motion embedding with a temporal compression factor of 64x. In this space, we train a conditional flow-matching model to generate motion latents conditioned on task descriptions. The resulting motion distributions outperform those of both state-of-the-art video models and specialized task-specific approaches.

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Related Papers