FreiburgMercedes-Benz AGApr 2, 2026arXiv:2604.02206

LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications

Mayank Mayank, Mayank Mayank, Bharanidhar Duraisamy, Bharanidhar Duraisamy, F. Geiss, Florian Geiss

AI Summary

This paper introduces LEO, a spatio-temporal Graph Attention Network (GAT) for fusing multi-modal sensor tracks to estimate the shape and trajectory of dynamic objects in autonomous driving scenarios. LEO learns adaptive fusion weights and represents multi-scale shapes, enabling it to model complex geometries and generalize across different sensor types and object classes. Experiments on the Mercedes-Benz DRIVE PILOT dataset demonstrate real-time performance and cross-dataset generalization to the View of Delft dataset.

Key Contribution

By learning to fuse multi-modal sensor data with a GAT, LEO achieves robust extended object tracking capable of handling complex geometries and generalizing across diverse datasets, addressing a key challenge in autonomous driving.

Abstract

Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications

Related Papers