Mar 12, 2026arXiv:2603.11984

Ada3Drift: Adaptive Training-Time Drifting for One-Step 3D Visuomotor Robotic Manipulation

Chongyang Xu, Yi Zou, Yixian Zou, Ziliang Feng, Zili Feng, Fanman Meng, Shuaicheng Liu

AI Summary

The paper introduces Ada3Drift, a method that shifts iterative refinement from inference to training time in visuomotor policies to recover multimodal fidelity in single-step action generation for robotic manipulation. Ada3Drift learns a training-time drifting field to attract predicted actions toward expert demonstration modes and repels them from other generated samples. It also uses a sigmoid-scheduled loss transition and multi-scale field aggregation to handle few-shot robotic regimes, achieving state-of-the-art performance with significantly fewer function evaluations compared to diffusion-based methods.

Key Contribution

Achieve 10x faster visuomotor control by shifting iterative refinement from slow inference to efficient training, preserving multimodal action fidelity in single-step robotic manipulation.

Abstract

Diffusion-based visuomotor policies effectively capture multimodal action distributions through iterative denoising, but their high inference latency limits real-time robotic control. Recent flow matching and consistency-based methods achieve single-step generation, yet sacrifice the ability to preserve distinct action modes, collapsing multimodal behaviors into averaged, often physically infeasible trajectories. We observe that the compute budget asymmetry in robotics (offline training vs.\ real-time inference) naturally motivates recovering this multimodal fidelity by shifting iterative refinement from inference time to training time. Building on this insight, we propose Ada3Drift, which learns a training-time drifting field that attracts predicted actions toward expert demonstration modes while repelling them from other generated samples, enabling high-fidelity single-step generation (1 NFE) from 3D point cloud observations. To handle the few-shot robotic regime, Ada3Drift further introduces a sigmoid-scheduled loss transition from coarse distribution learning to mode-sharpening refinement, and multi-scale field aggregation that captures action modes at varying spatial granularities. Experiments on three simulation benchmarks (Adroit, Meta-World, and RoboTwin) and real-world robotic manipulation tasks demonstrate that Ada3Drift achieves state-of-the-art performance while requiring $10\times$ fewer function evaluations than diffusion-based alternatives.

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Ada3Drift: Adaptive Training-Time Drifting for One-Step 3D Visuomotor Robotic Manipulation

Related Papers