Search papers, labs, and topics across Lattice.
This paper introduces an open-source modular benchmark for evaluating diffusion-based motion planners within closed-loop autonomous driving systems. They decompose a diffusion planner into independently executable modules using ONNX GraphSurgeon and reimplement the DPM-Solver++ denoising loop in C++ to enable runtime configurability and observability within the Autoware ROS 2 stack. Experiments in AWSIM demonstrate that encoder caching significantly reduces latency and second-order solvers improve final displacement error compared to first-order solvers.
Diffusion-based motion planners can now be evaluated and optimized within a production-grade autonomous driving stack, thanks to a new open-source modular benchmark that breaks the black box of monolithic deployments.
Diffusion-based motion planners have achieved state-of-the-art results on benchmarks such as nuPlan, yet their evaluation within closed-loop production autonomous driving stacks remains largely unexplored. Existing evaluations abstract away ROS 2 communication latency and real-time scheduling constraints, while monolithic ONNX deployment freezes all solver parameters at export time. We present an open-source modular benchmark that addresses both gaps: using ONNX GraphSurgeon, we decompose a monolithic 18,398 node diffusion planner into three independently executable modules and reimplement the DPM-Solver++ denoising loop in native C++. Integrated as a ROS 2 node within Autoware, the open-source AD stack deployed on real vehicles worldwide, the system enables runtime-configurable solver parameters without model recompilation and per-step observability of the denoising process, breaking the black box of monolithic deployment. Unlike evaluations in standalone simulators such as CARLA, our benchmark operates within a production-grade stack and is validated through AWSIM closed-loop simulation. Through systematic comparison of DPM-Solver++ (first- and second-order) and DDIM across six step-count configurations (N in {3, 5, 7, 10, 15, 20}), we show that encoder caching yields a 3.2x latency reduction, and that second-order solving reduces FDE by 41% at N=3 compared to first-order. The complete codebase will be released as open-source, providing a direct path from simulation benchmarks to real-vehicle deployment.