Search papers, labs, and topics across Lattice.
This paper introduces LEO, a spatio-temporal Graph Attention Network (GAT) for fusing multi-modal sensor tracks to estimate the shape and trajectory of dynamic objects in autonomous driving scenarios. LEO learns adaptive fusion weights and represents multi-scale shapes, enabling it to model complex geometries and generalize across different sensor types and object classes. Experiments on the Mercedes-Benz DRIVE PILOT dataset demonstrate real-time performance and cross-dataset generalization to the View of Delft dataset.
By learning to fuse multi-modal sensor data with a GAT, LEO achieves robust extended object tracking capable of handling complex geometries and generalizing across diverse datasets, addressing a key challenge in autonomous driving.
Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.