Mar 1, 2026arXiv:2603.01036

SMR-Net:Robot Snap Detection Based on Multi-Scale Features and Self-Attention Network

AI Summary

The paper introduces SMR-Net, a self-attention-based multi-scale object detection algorithm, to improve snap detection and localization in robot automated assembly, addressing the limitations of traditional visual methods in complex scenarios. SMR-Net uses an attention-enhanced multi-scale feature fusion architecture, incorporating an attention-embedded feature extractor, parallel processing with standard and dilated convolutions, and an adaptive reweighting network. Experiments on two snap datasets demonstrate that SMR-Net significantly outperforms Faster R-CNN, achieving improvements in IoU and mAP, thus showing its effectiveness in complex snap detection and localization.

Key Contribution

Achieve higher precision snap detection for robotic assembly by fusing multi-scale features with self-attention, outperforming Faster R-CNN.

Abstract

In robot automated assembly, snap assembly precision and efficiency directly determine overall production quality. As a core prerequisite, snap detection and localization critically affect subsequent assembly success. Traditional visual methods suffer from poor robustness and large localization errors when handling complex scenarios (e.g., transparent or low-contrast snaps), failing to meet high-precision assembly demands. To address this, this paper designs a dedicated sensor and proposes SMR-Net, an self-attention-based multi-scale object detection algorithm, to synergistically enhance detection and localization performance. SMR-Net adopts an attention-enhanced multi-scale feature fusion architecture: raw sensor data is encoded via an attention-embedded feature extractor to strengthen key snap features and suppress noise; three multi-scale feature maps are processed in parallel with standard and dilated convolution for dimension unification while preserving resolution; an adaptive reweighting network dynamically assigns weights to fused features, generating fine representations integrating details and global semantics. Experimental results on Type A and Type B snap datasets show SMR-Net outperforms traditional Faster R-CNN significantly: Intersection over Union (IoU) improves by 6.52% and 5.8%, and mean Average Precision (mAP) increases by 2.8% and 1.5% respectively. This fully demonstrates the method's superiority in complex snap detection and localization tasks.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SMR-Net:Robot Snap Detection Based on Multi-Scale Features and Self-Attention Network

Related Papers