FreiburgMar 2, 2026arXiv:2603.02035

LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers

Fabian Schmidt, Fabian Schmidt, Karol Fedurko, Karol Fedurko, Markus Enzweiler, Markus Enzweiler, Abhinav Valada, A. Valada

AI Summary

The paper introduces LAD-Drive, a generative framework for autonomous driving that disentangles high-level intention from low-level spatial planning by inferring a probabilistic meta-action distribution. This distribution, combined with vehicle kinematic state, conditions an action-aware diffusion decoder using a truncated denoising process to refine motion anchors into safe trajectories. Experiments on the LangAuto benchmark show LAD-Drive achieves state-of-the-art performance, improving Driving Score by up to 59% compared to baselines while reducing route deviations and collisions.

Key Contribution

Autonomous driving gets a boost with LAD-Drive, a new method that uses probabilistic meta-actions and diffusion to generate safer, more nuanced trajectories, outperforming existing methods by a significant margin.

Abstract

While multimodal large language models (MLLMs) provide advanced reasoning for autonomous driving, translating their discrete semantic knowledge into continuous trajectories remains a fundamental challenge. Existing methods often rely on unimodal planning heads that inherently limit their ability to represent multimodal driving behavior. Furthermore, most generative approaches frequently condition on one-hot encoded actions, discarding the nuanced navigational uncertainty critical for complex scenarios. To resolve these limitations, we introduce LAD-Drive, a generative framework that structurally disentangles high-level intention from low-level spatial planning. LAD-Drive employs an action decoder to infer a probabilistic meta-action distribution, establishing an explicit belief state that preserves the nuanced intent typically lost by one-hot encodings. This distribution, fused with the vehicle's kinematic state, conditions an action-aware diffusion decoder that utilizes a truncated denoising process to refine learned motion anchors into safe, kinematically feasible trajectories. Extensive evaluations on the LangAuto benchmark demonstrate that LAD-Drive achieves state-of-the-art results, outperforming competitive baselines by up to 59% in Driving Score while significantly reducing route deviations and collisions. We will publicly release the code and models on https://github.com/iis-esslingen/lad-drive.

Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers

Related Papers