Feb 18, 2026arXiv:2602.16422

Automated Histopathology Report Generation via Pyramidal Feature Extraction and the UNI Foundation Model

Ahmet Halici, Ahmet Halici, Ece Tugba Cebeci, Ece Tugba Cebeci, Musa Balci, Musa Balci, Mustafa Cini, Mustafa Cini, Serkan Sokmen, Serkan Sokmen

AI Summary

This paper introduces a hierarchical vision-language framework for automated histopathology report generation from whole slide images (WSIs). The approach uses multi-resolution pyramidal patch selection and a frozen UNI Vision Transformer to extract features, which are then fed into a Transformer decoder to generate diagnostic text tokenized with BioGPT. A retrieval-based verification step using Sentence BERT embeddings is implemented to improve report reliability by comparing generated reports with a reference corpus and substituting them with ground truth references when a high similarity match is found.

Key Contribution

Achieve more reliable histopathology reports by verifying AI-generated text against a reference corpus, swapping in ground truth when a close match is found.

Abstract

Generating diagnostic text from histopathology whole slide images (WSIs) is challenging due to the gigapixel scale of the input and the requirement for precise, domain specific language. We propose a hierarchical vision language framework that combines a frozen pathology foundation model with a Transformer decoder for report generation. To make WSI processing tractable, we perform multi resolution pyramidal patch selection (downsampling factors 2^3 to 2^6) and remove background and artifacts using Laplacian variance and HSV based criteria. Patch features are extracted with the UNI Vision Transformer and projected to a 6 layer Transformer decoder that generates diagnostic text via cross attention. To better represent biomedical terminology, we tokenize the output using BioGPT. Finally, we add a retrieval based verification step that compares generated reports with a reference corpus using Sentence BERT embeddings; if a high similarity match is found, the generated report is replaced with the retrieved ground truth reference to improve reliability.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References38

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Automated Histopathology Report Generation via Pyramidal Feature Extraction and the UNI Foundation Model

Related Papers