BiometricsAIHCTLab Research GroupMar 9, 2026arXiv:2603.08235

Exploring Deep Learning and Ultra-Widefield Imaging for Diabetic Retinopathy and Macular Edema

Pablo Jimenez-Lizcano, Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Guillermo González de Rivera, Ruben Vera-Rodriguez, Julian Fierrez

AI Summary

This paper explores deep learning methods for analyzing ultra-widefield (UWF) retinal images to detect diabetic retinopathy (DR) and diabetic macular edema (DME). The study benchmarks CNNs, vision transformers (ViTs), and foundation models on the UWF4DR Challenge dataset for image quality assessment, RDR identification, and DME identification, evaluating both spatial (RGB) and frequency domain representations. Feature-level fusion and Grad-CAM analysis are also employed to improve robustness and explainability, demonstrating the efficacy of ViTs and frequency-domain approaches for UWF image analysis.

Key Contribution

Vision transformers and frequency-domain representations unlock surprisingly strong performance in automated analysis of ultra-widefield retinal images for diabetic retinopathy.

Abstract

Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of preventable blindness among working-age adults. Traditional approaches in the literature focus on standard color fundus photography (CFP) for the detection of these conditions. Nevertheless, recent ultra-widefield imaging (UWF) offers a significantly wider field of view in comparison to CFP. Motivated by this, the present study explores state-of-the-art deep learning (DL) methods and UWF imaging on three clinically relevant tasks: i) image quality assessment for UWF, ii) identification of referable diabetic retinopathy (RDR), and iii) identification of DME. Using the publicly available UWF4DR Challenge dataset, released as part of the MICCAI 2024 conference, we benchmark DL models in the spatial (RGB) and frequency domains, including popular convolutional neural networks (CNNs) as well as recent vision transformers (ViTs) and foundation models. In addition, we explore a final feature-level fusion to increase robustness. Finally, we also analyze the decisions of the DL models using Grad-CAM, increasing the explainability. Our proposal achieves consistently strong performance across all architectures, underscoring the competitiveness of emerging ViTs and foundation models and the promise of feature-level fusion and frequency-domain representations for UWF analysis.

Computer Vision Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Exploring Deep Learning and Ultra-Widefield Imaging for Diabetic Retinopathy and Macular Edema

Related Papers