Search papers, labs, and topics across Lattice.
This paper introduces a decision-making framework for autonomous driving that distills the reasoning capabilities of large language models into a lightweight, confidence-aware language model. The framework uses a multi-agent system to generate high-quality, confidence-annotated decision demonstrations via chain-of-thought reasoning, which are then used to fine-tune a dual-head language model with RAG. Experiments on the nuPlan benchmark show the approach achieves state-of-the-art success rates with low inference latency, even in long-tail scenarios.
You can get SOTA autonomous driving performance with a distilled, lightweight language model that also tells you how confident it is.
Large Language Models (LLMs) and Multimodal LLMs (MLLMs) have demonstrated immense potential in autonomous driving (AD) by offering human-like reasoning and open-world generalization. However, the excessive computational overhead and high inference latency of these massive models severely hinder their deployment in resource-constrained AD systems. To address this challenge, we propose a novel decision-making framework utilizing a lightweight confidence-aware language model, which bridges the gap between complex multimodal intention reasoning and efficient inference. Specifically, we design a multi-agent collaborative workflow, comprising action voting, confidence assessment, and summarization agents, to generate high-quality, confidence-annotated decision demonstrations via explicit Chain-of-Thought (CoT) reasoning. These demonstrations are then distilled into a lightweight language model featuring a dual-head architecture, enabling the joint prediction of decision probabilities and the generation of textual rationales. The distillation is realized via a confidence-aware fine-tuning strategy coupled with Retrieval Augmented Generation (RAG) to enhance the model's adaptability and data efficiency. Comprehensive closed-loop experiments on the nuPlan benchmark demonstrate that our approach achieves state-of-the-art (SOTA) success rates in both regular and long-tail scenarios while maintaining low inference latency.