GXULiuzhou Railway Vocational and Technical CollegeNorth China Electric Power UniversityJan 31, 2025

MSD-MixFormer: high-precision detection network for positive and negative road obstacles in complex environments

Zixu Li, Dan Huang, Keying Liu, Xinshuang Liu, Wenguang Luo

AI Summary

The paper introduces MSD-MixFormer, a transformer-based network for detecting positive and negative road obstacles, addressing challenges of low accuracy and weak generalization in complex road scenarios. The network uses multiscale fusion, diffusion mix-attention mechanisms, and a window adjustment mechanism to capture multi-scale obstacle features and reduce foreground/background confusion. Experiments on an extended VRE-NPO dataset demonstrate that MSD-MixFormer achieves higher detection accuracy and faster convergence compared to other models.

Key Contribution

MSD-MixFormer tackles the challenge of detecting road obstacles in complex environments with a novel transformer architecture that leverages multi-scale feature fusion and attention mechanisms to achieve state-of-the-art detection accuracy.

Abstract

Abstract. In complex road scenarios, issues such as low-detection accuracy of positive and negative obstacles on the road surface and weak generalization ability of detection models are prevalent. We propose a high-precision detection network for positive and negative road obstacles, named multiscale diffusion and mix-attention transformer (MSD-MixFormer), which is based on multiscale fusion and diffusion mix-attention mechanisms. This network is designed based on the transformer architecture with a window adjustment mechanism to capture multi-scale features of obstacles, thereby addressing the issue of confusion between obstacle foreground and background. In the shallow and deep stages of the encoder module, dilation self-attention and diffusion self-attention are introduced respectively to enhance the feature fusion capability of the network. In the decoder module, mix-attention operations are incorporated to improve the segmentation accuracy of detailed features, addressing the problem of detail feature loss. To improve the generalization ability of the model, we collect and extend the VRE-NPO dataset with images from actual complex road scenarios. Experiments show that the MSD-MixFormer model converges faster and more stably during training compared with other higher-level models in the same series. In testing, MSD-MixFormer achieves higher detection accuracy for positive and negative obstacles, as well as for objects of varying sizes, across various complex road scenarios.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References25

Year2025

VenueJ. Electronic Imaging

Related Papers

Finding related papers...

Search

MSD-MixFormer: high-precision detection network for positive and negative road obstacles in complex environments

Related Papers