Search papers, labs, and topics across Lattice.
This paper introduces IRR-Drive, an adaptive multimodal reflection framework that enhances autonomous driving by integrating high-level reasoning with physical constraints through a dual-modality approach. By generating preliminary textual intentions and predicting future semantic bird's-eye view (BEV) representations, IRR-Drive allows for rigorous self-correction and refinement of driving trajectories in complex environments. The method achieves state-of-the-art performance on the NAVSIM benchmark, demonstrating its effectiveness in balancing planning performance and computational efficiency through an adaptive reflection mechanism.
IRR-Drive's innovative dual-modality approach enables autonomous vehicles to self-correct trajectories with unprecedented reliability in dynamic environments.
Recent Vision-Language-Action (VLA) models have advanced end-to-end autonomous driving by incorporating reasoning for better interpretability and planning quality. However, most existing approaches directly generate the final trajectory without explicitly examining its future consequences, which limits their reliability in complex and dynamic environments. To address this limitation, we propose IRR-Drive (Intend, Reflect, Refine), an adaptive multimodal reflection framework for autonomous driving. Specifically, to tightly couple high-level reasoning with physical constraints, IRR-Drive first generates a preliminary textual intention and anticipates potential interactions by predicting future semantic bird's-eye view (BEV) representations. This dual-modality (Text + BEV) reflection space explicitly models anticipated scene evolution, enabling the model to rigorously self-correct and refine its initial intent before generating the final trajectory. Furthermore, to balance planning performance and computational efficiency, we construct reflection-oriented training data and design an adaptive reflection reward, enabling the model to adaptively select its reasoning mode according to scene complexity. Instead of using reasoning primarily as an auxiliary interpretation, IRR-Drive directly integrates an adaptive reflection mechanism into the planning framework, enabling grounded, decision-aware trajectory correction that is driven by scene complexity. Our method achieves state-of-the-art performance on the NAVSIM benchmark in both PDMS and EPDMS. Extensive experiments demonstrate the effectiveness of our multimodal reflection framework and validate the efficacy of the proposed adaptive reflection strategy.