Mar 3, 2026arXiv:2603.02613

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

Tianze Zhu, Yinuo Wang, Wenjun Zou, Tianyi Zhang, Likun Wang, Letian Tao, Feihong Zhang, Yao Lyu, Shengbo Eben Li

AI Summary

The paper introduces DACER-F, a novel online reinforcement learning algorithm for autonomous driving that uses flow matching to generate actions in a single inference step. DACER-F leverages Langevin dynamics and Q-function gradients to dynamically optimize actions towards a target distribution balancing high Q-value and exploration. Experiments in complex driving simulations and the DeepMind Control Suite demonstrate that DACER-F outperforms existing methods like DACER and DSAC while maintaining ultra-low inference latency, making it suitable for real-time applications.

Key Contribution

Achieve real-time autonomous driving policy generation with a new flow-matching RL algorithm that slashes inference latency without sacrificing performance.

Abstract

Reinforcement learning (RL) is a fundamental methodology in autonomous driving systems, where generative policies exhibit considerable potential by leveraging their ability to model complex distributions to enhance exploration. However, their inherent high inference latency severely impedes their deployment in real-time decision-making and control. To address this issue, we propose diffusion actor-critic with entropy regulator via flow matching (DACER-F) by introducing flow matching into online RL, enabling the generation of competitive actions in a single inference step. By leveraging Langevin dynamics and gradients of the Q-function, DACER-F dynamically optimizes actions from experience replay toward a target distribution that balances high Q-value information with exploratory behavior. The flow policy is then trained to efficiently learn a mapping from a simple prior distribution to this dynamic target. In complex multi-lane and intersection simulations, DACER-F outperforms baselines diffusion actor-critic with entropy regulator (DACER) and distributional soft actor-critic (DSAC), while maintaining an ultra-low inference latency. DACER-F further demonstrates its scalability on standard RL benchmark DeepMind Control Suite (DMC), achieving a score of 775.8 in the humanoid-stand task and surpassing prior methods. Collectively, these results establish DACER-F as a high-performance and computationally efficient RL algorithm.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

Related Papers