Search papers, labs, and topics across Lattice.
The paper introduces MSD-MixFormer, a transformer-based network for detecting positive and negative road obstacles, addressing challenges of low accuracy and weak generalization in complex road scenarios. The network uses multiscale fusion, diffusion mix-attention mechanisms, and a window adjustment mechanism to capture multi-scale obstacle features and reduce foreground/background confusion. Experiments on an extended VRE-NPO dataset demonstrate that MSD-MixFormer achieves higher detection accuracy and faster convergence compared to other models.
MSD-MixFormer tackles the challenge of detecting road obstacles in complex environments with a novel transformer architecture that leverages multi-scale feature fusion and attention mechanisms to achieve state-of-the-art detection accuracy.
Abstract. In complex road scenarios, issues such as low-detection accuracy of positive and negative obstacles on the road surface and weak generalization ability of detection models are prevalent. We propose a high-precision detection network for positive and negative road obstacles, named multiscale diffusion and mix-attention transformer (MSD-MixFormer), which is based on multiscale fusion and diffusion mix-attention mechanisms. This network is designed based on the transformer architecture with a window adjustment mechanism to capture multi-scale features of obstacles, thereby addressing the issue of confusion between obstacle foreground and background. In the shallow and deep stages of the encoder module, dilation self-attention and diffusion self-attention are introduced respectively to enhance the feature fusion capability of the network. In the decoder module, mix-attention operations are incorporated to improve the segmentation accuracy of detailed features, addressing the problem of detail feature loss. To improve the generalization ability of the model, we collect and extend the VRE-NPO dataset with images from actual complex road scenarios. Experiments show that the MSD-MixFormer model converges faster and more stably during training compared with other higher-level models in the same series. In testing, MSD-MixFormer achieves higher detection accuracy for positive and negative obstacles, as well as for objects of varying sizes, across various complex road scenarios.