Meta AIJan 19, 2026

DCAM-DETR: Dual Cross-Attention Mamba Detection Transformer for RGB–Infrared Anti-UAV Detection

AI Summary

This paper introduces DCAM-DETR, a multimodal object detection framework that fuses RGB and thermal infrared data for improved UAV detection, particularly in challenging conditions where single-modality vision systems falter. The architecture enhances RT-DETR with a MobileMamba backbone for efficient long-range dependency modeling and incorporates cross-attention mechanisms (CDA, CPA) and adaptive feature fusion (AFFM) to effectively integrate the two modalities. Experiments on Anti-UAV300 demonstrate state-of-the-art performance, achieving 94.7% mAP@0.5 and 78.3% mAP@0.5:0.95 at 42 FPS, with further validation on FLIR-ADAS and KAIST datasets.

Key Contribution

Achieve state-of-the-art UAV detection by swapping transformers for Mamba, yielding a faster and more accurate multimodal detector.

Abstract

The proliferation of unmanned aerial vehicles (UAVs) poses escalating security threats across critical infrastructures, necessitating robust real-time detection systems. Existing vision-based methods predominantly rely on single-modality data and exhibit significant performance degradation under challenging scenarios. To address these limitations, we propose DCAM-DETR, a novel multimodal detection framework that fuses RGB and thermal infrared modalities through an enhanced RT-DETR architecture integrated with state space models. Our approach introduces four innovations: (1) a MobileMamba backbone leveraging selective state space models for efficient long-range dependency modeling with linear complexity O(n); (2) Cross-Dimensional Attention (CDA) and Cross-Path Attention (CPA) modules capturing intermodal correlations across spatial and channel dimensions; (3) an Adaptive Feature Fusion Module (AFFM) dynamically calibrating multimodal feature contributions; and (4) a Dual-Attention Decoupling Module (DADM) enhancing detection head discrimination for small targets. Experiments on Anti-UAV300 demonstrate state-of-the-art performance with 94.7% mAP@0.5 and 78.3% mAP@0.5:0.95 at 42 FPS. Extended evaluations on FLIR-ADAS and KAIST datasets validate the generalization capacity across diverse scenarios.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueInformation

Related Papers

Finding related papers...

Search

DCAM-DETR: Dual Cross-Attention Mamba Detection Transformer for RGB–Infrared Anti-UAV Detection

Related Papers