Search papers, labs, and topics across Lattice.
This paper introduces a pseudo-simulated annealing-based data augmentation technique to improve dense object detection in underwater images. The method synthesizes realistic, crowded fish scenarios by iteratively copying and pasting objects while controlling the acceptance of new states based on an annealing schedule. Experiments on the DeepFish dataset demonstrate that the augmented training data significantly improves the performance of a YOLOv10 detector, especially in challenging real-world test sets.
Annealing-based data augmentation lets you train a YOLOv10 detector to spot more fish in murky underwater images.
Object detection models typically perform well on images captured in controlled environments with stable lighting, water clarity, and viewpoint, but their performance degrades substantially in real-world underwater settings characterized by high variability and frequent occlusions. In this work, we address these challenges by introducing a novel data augmentation framework designed to improve robustness in dense and unconstrained underwater scenes. Using the DeepFish dataset, which contains images of fish in natural environments, we first generate bounding box annotations from provided segmentation masks to construct a custom detection dataset. We then propose a pseudo-simulated annealing-based augmentation algorithm, inspired by the copy-paste strategy of Deng et al. [1], to synthesize realistic crowded fish scenarios. Our approach improves spatial diversity and object density during training, enabling better generalization to complex scenes. Experimental results show that our method significantly outperforms a baseline YOLOv10 model, particularly on a challenging test set of manually annotated images collected from live-stream footage in the Florida Keys. These results demonstrate the effectiveness of our augmentation strategy for improving detection performance in dense, real-world underwater environments.