Search papers, labs, and topics across Lattice.
KnowDiffuser combines language models for high-level reasoning with diffusion models for trajectory generation in autonomous driving. It uses an LM to infer context-aware meta-actions from scene representations, mapping these to prior trajectories that guide a truncated denoising process. Experiments on nuPlan show KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, improving semantic alignment and physical feasibility.
By fusing language model reasoning with diffusion-based trajectory generation, KnowDiffuser leapfrogs existing autonomous driving planners on the nuPlan benchmark.
Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and lack the ability to generate continuous, physically feasible trajectories required for motion planning. Meanwhile, diffusion models have proven effective at generating reliable and dynamically consistent trajectories, but often lack semantic interpretability and alignment with scene-level understanding. To address these limitations, we propose \textbf{KnowDiffuser}, a knowledge-guided motion planning framework that tightly integrates the semantic understanding of language models with the generative power of diffusion models. The framework employs a language model to infer context-aware meta-actions from structured scene representations, which are then mapped to prior trajectories that anchor the subsequent denoising process. A two-stage truncated denoising mechanism refines these trajectories efficiently, preserving both semantic alignment and physical feasibility. Experiments on the nuPlan benchmark demonstrate that KnowDiffuser significantly outperforms existing planners in both open-loop and closed-loop evaluations, establishing a robust and interpretable framework that effectively bridges the semantic-to-physical gap in AD systems.