Mar 16, 2026arXiv:2603.14908

PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning

Yinfeng Gao, Qichao Zhang, Deqing Liu, Zhongpu Xia, Guang Li, Kun Ma, Guang Chen, Hangjun Ye, Long Chen, Da-Wei Ding, Dongbin Zhao

AI Summary

PerlAD, a novel pseudo-simulation-based RL method, is introduced to address the challenges of closed-loop end-to-end autonomous driving by enabling efficient, rendering-free trial-and-error training in vector space using offline datasets. A prediction world model generates reactive agent trajectories conditioned on the ego vehicle's plan to bridge the gap between static datasets and dynamic closed-loop environments, while a hierarchical decoupled planner combines IL for lateral path generation and RL for longitudinal speed optimization. Experiments on Bench2Drive and DOS benchmarks demonstrate that PerlAD achieves state-of-the-art performance, surpassing previous E2E RL methods by 10.29% in Driving Score without requiring expensive online interactions and showing reliability in safety-critical occlusion scenarios.

Key Contribution

Ditch expensive, rendering-based RL for autonomous driving: PerlAD uses offline data to train agents in a fast, vector-space pseudo-simulation, outperforming prior methods by 10% on driving score.

Abstract

End-to-end autonomous driving policies based on Imitation Learning (IL) often struggle in closed-loop execution due to the misalignment between inadequate open-loop training objectives and real driving requirements. While Reinforcement Learning (RL) offers a solution by directly optimizing driving goals via reward signals, the rendering-based training environments introduce the rendering gap and are inefficient due to high computational costs. To overcome these challenges, we present a novel Pseudo-simulation-based RL method for closed-loop end-to-end autonomous driving, PerlAD. Based on offline datasets, PerlAD constructs a pseudo-simulation that operates in vector space, enabling efficient, rendering-free trial-and-error training. To bridge the gap between static datasets and dynamic closed-loop environments, PerlAD introduces a prediction world model that generates reactive agent trajectories conditioned on the ego vehicle's plan. Furthermore, to facilitate efficient planning, PerlAD utilizes a hierarchical decoupled planner that combines IL for lateral path generation and RL for longitudinal speed optimization. Comprehensive experimental results demonstrate that PerlAD achieves state-of-the-art performance on the Bench2Drive benchmark, surpassing the previous E2E RL method by 10.29% in Driving Score without requiring expensive online interactions. Additional evaluations on the DOS benchmark further confirm its reliability in handling safety-critical occlusion scenarios.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning

Related Papers