Search papers, labs, and topics across Lattice.
This paper introduces DCAM-DETR, a multimodal object detection framework that fuses RGB and thermal infrared data for improved UAV detection, particularly in challenging conditions where single-modality vision systems falter. The architecture enhances RT-DETR with a MobileMamba backbone for efficient long-range dependency modeling and incorporates cross-attention mechanisms (CDA, CPA) and adaptive feature fusion (AFFM) to effectively integrate the two modalities. Experiments on Anti-UAV300 demonstrate state-of-the-art performance, achieving 94.7% mAP@0.5 and 78.3% mAP@0.5:0.95 at 42 FPS, with further validation on FLIR-ADAS and KAIST datasets.
Achieve state-of-the-art UAV detection by swapping transformers for Mamba, yielding a faster and more accurate multimodal detector.
The proliferation of unmanned aerial vehicles (UAVs) poses escalating security threats across critical infrastructures, necessitating robust real-time detection systems. Existing vision-based methods predominantly rely on single-modality data and exhibit significant performance degradation under challenging scenarios. To address these limitations, we propose DCAM-DETR, a novel multimodal detection framework that fuses RGB and thermal infrared modalities through an enhanced RT-DETR architecture integrated with state space models. Our approach introduces four innovations: (1) a MobileMamba backbone leveraging selective state space models for efficient long-range dependency modeling with linear complexity O(n); (2) Cross-Dimensional Attention (CDA) and Cross-Path Attention (CPA) modules capturing intermodal correlations across spatial and channel dimensions; (3) an Adaptive Feature Fusion Module (AFFM) dynamically calibrating multimodal feature contributions; and (4) a Dual-Attention Decoupling Module (DADM) enhancing detection head discrimination for small targets. Experiments on Anti-UAV300 demonstrate state-of-the-art performance with 94.7% mAP@0.5 and 78.3% mAP@0.5:0.95 at 42 FPS. Extended evaluations on FLIR-ADAS and KAIST datasets validate the generalization capacity across diverse scenarios.