Search papers, labs, and topics across Lattice.
This paper introduces a method to distill Conditional Flow Matching (CFM) policies for robotic manipulation into a fast, single-step policy using Implicit Maximum Likelihood Estimation (IMLE). A bi-directional Chamfer distance loss is used to preserve the multi-modal action distribution of the CFM teacher, avoiding distributional collapse. The resulting policy achieves high-frequency control suitable for real-time receding-horizon replanning and demonstrates improved robustness to disturbances.
Ditch slow, iterative ODE solvers for robot control: this method distills flow-based policies into a single-step model that's fast enough for real-time replanning without sacrificing multi-modal action diversity.
Generative policies based on diffusion and flow matching achieve strong performance in robotic manipulation by modeling multi-modal human demonstrations. However, their reliance on iterative Ordinary Differential Equation (ODE) integration introduces substantial latency, limiting high-frequency closed-loop control. Recent single-step acceleration methods alleviate this overhead but often exhibit distributional collapse, producing averaged trajectories that fail to execute coherent manipulation strategies. We propose a framework that distills a Conditional Flow Matching (CFM) expert into a fast single-step student via Implicit Maximum Likelihood Estimation (IMLE). A bi-directional Chamfer distance provides a set-level objective that promotes both mode coverage and fidelity, enabling preservation of the teacher multi-modal action distribution in a single forward pass. A unified perception encoder further integrates multi-view RGB, depth, point clouds, and proprioception into a geometry-aware representation. The resulting high-frequency control supports real-time receding-horizon re-planning and improved robustness under dynamic disturbances.