Samsung R&D Institute UK (SRUK)UT AustinApr 15, 2026arXiv:2604.13795

Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training

Nghia, Nguyen, Amer Wahed, Andy Quesada, Yasir Ali, Hanadi El Achi, Y. Helen Zhang, Jocelyn Ursua, Alex Banerjee, Sahib Kalra, L. Jeffrey Medeiros, Jie Xu

AI Summary

This paper explores the use of Vision Transformers (ViTs) for classifying lymphoma subtypes, specifically anaplastic large cell lymphoma (ALCL) versus classic Hodgkin lymphoma (cHL). To address the impracticality of fully supervised training due to resource constraints, the authors implemented a weakly supervised training approach using slide-level labels for image patches. The resulting ViT model, trained on 100,000 image patches, achieved a diagnostic accuracy of 91.85%, F1 score of 0.92, and AUC of 0.98, demonstrating its potential for clinical application.

Key Contribution

Weakly supervised ViTs can achieve high accuracy in lymphoma classification, offering a practical alternative to fully supervised methods that require extensive manual annotation.

Abstract

Vision transformers (ViT) have been shown to allow for more flexible feature detection and can outperform convolutional neural network (CNN) when pre-trained on sufficient data. Due to their promising feature detection capabilities, we deployed ViTs for morphological classification of anaplastic large cell lymphoma (ALCL) versus classic Hodgkin lymphoma (cHL). We had previously designed a ViT model which was trained on a small dataset of 1,200 image patches in fully supervised training. That model achieved a diagnostic accuracy of 100% and an F1 score of 1.0 on the independent test set. Since fully supervised training is not a practical method due to lack of expertise resources in both the training and testing phases, we conducted a recent study on a modified approach to training data (weakly supervised training) and show that labeling training image patch automatically at the slide level of each whole-slide-image is a more practical solution for clinical use of Vision Transformer. Our ViT model, trained on a larger dataset of 100,000 image patches, yields evaluation metrics with significant accuracy, F1 score, and area under the curve (AUC) at 91.85%, 0.92, and 0.98, respectively. These are respectable values that qualify this ViT model, with weakly supervised training, as a suitable tool for a deep learning module in clinical model development using automated image patch extraction.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training

Related Papers