Search papers, labs, and topics across Lattice.
This paper introduces JMUCOD, a joint maritime-underwater cross-domain object detection framework built upon DETR, to unify object detection in SAR and SAS imagery for improved marine situational awareness. The framework incorporates a Mamba-based state-space model (SSM) backbone for long-range dependencies, a multiscale token adaptation network (MSTAN) to enhance representational capacity, and a fine-grained localization (FGL) loss function to improve spatial precision. Experiments on a newly created MSUSD dataset demonstrate that JMUCOD achieves a mean average precision of 94.0%, outperforming DETR and YOLOv8l with a smaller model size.
Forget training separate models for maritime and underwater object detection: JMUCOD unifies both with a single, efficient architecture that beats DETR by 20% mAP while being smaller than YOLOv8l.
Maritime object detection (MOD) in synthetic aperture radar (SAR) imagery and underwater object detection (UOD) in synthetic aperture sonar (SAS) imagery are pivotal components in a wide array of marine-related applications, including environmental surveillance, resource exploration, and maritime disaster mitigation. Traditionally, these two tasks have been addressed in isolation, thereby limiting the potential for comprehensive marine situational awareness. To bridge this gap, we propose a novel joint maritime-underwater cross-domain object detection framework that unifies MOD and UOD within a single detection paradigm, named joint maritime-underwater cross-domain object detection (JMUCOD). JMUCOD is built upon the detection transformer (DETR) architecture to effectively handle the heterogeneous, multiscale, and multiscene characteristics of marine environments. The framework integrates a state-space model (SSM) derived from the Mamba architecture as its backbone, which excels at modeling long-range dependencies. To refine spatial precision, a fine-grained localization (FGL) loss function is introduced, significantly enhancing detection accuracy. Given the inherent limitations of SSM in capturing fine-grained local details and multiscale feature representations, we also design a lightweight and effective multiscale token adaptation network (MSTAN) to augment the backbone’s representational capacity and adaptability to token-level variations across domains. Furthermore, we establish a comprehensive cross-domain dataset, named the maritime SAR and underwater SAS dataset (MSUSD). MSUSD integrates representative samples from both maritime and underwater environments to enable robust joint cross-domain learning. Specifically, the maritime subdataset incorporates multiple public available SAR datasets, including SAR ship detection dataset, high-resolution ship image dataset, and SAR-AIRCRAFT, while the underwater portion is constructed from the SCTD dataset. The proposed MSUSD ensures broad coverage of target types, environmental conditions, and imaging modalities, thereby facilitating the development and evaluation of detection models with improved generalization and robustness. To further support efficient edge deployment, the lightweight design for JMUCOD algorithm is also conducted through overall model pruning. Experimental results on MSUSD dataset show that JMUCOD achieves a mean average precision of 94.0%, which is 20.7% higher than DETR. Moreover, when compared with YOLOv8l, it can also achieve a better performance with only 52% model size. And JMUCOD can also perform better than the counterparts on SMCDD dataset.