Search papers, labs, and topics across Lattice.
This paper introduces a YOLO optimization method incorporating Transformer structures to enhance target detection in complex environments. The approach uses Swin Transformer in the Backbone for improved global modeling, and integrates SE/CBAM attention mechanisms with a lightweight Transformer module in the Neck to enhance feature fusion and small target recognition. Experiments on the FishEye8K dataset demonstrate an 11% mAP improvement and a 9% recall improvement over YOLOv5, while maintaining 85FPS inference speed, achieved through multi-scale training, improved loss function, model pruning, and integer quantization.
Achieve 11% better accuracy and 9% better recall than YOLOv5 by strategically injecting Transformer modules into YOLO's backbone and neck, proving that hybrid architectures can significantly boost performance in complex environments.
In order to improve the target detection performance of YOLO series models in complex environments, this paper proposes a YOLO optimization method that integrates the Transformer structure. This method introduces Swin Transformer in Backbone to enhance the global modeling capability, introduces SE/CBAM attention mechanism in Neck structure and supplements it with lightweight Transformer module, so as to improve the feature fusion and small target recognition capabilities. The experiment is based on FishEye8K traffic scene dataset, adopts multi-scale training and improved loss function strategy, and combines model pruning and integer quantization to optimize inference efficiency. The results show that the proposed model achieves a good balance between mAP , recall rate and model volume, and the accuracy is improved by 11% compared with YOLOv5, the recall rate is improved by more than 9%, and the inference speed is maintained at 85FPS. Visual detection and false detection analysis verify the effectiveness of the method in multi-target, occluded and distorted environments. This study provides a technical path for the fusion of Transformer and YOLO, and has strong practical deployment value.