Sep 22, 2025arXiv:2509.17353

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

A. T. Elboardy, Ghada Khoriba, Essam A. Rashed

AI Summary

This paper introduces a multi-agent reinforcement learning framework integrating LLMs and LVMs for radiology report generation and evaluation, addressing the need for clinically reliable systems and rigorous evaluation protocols. The framework comprises ten specialized agents handling image analysis, report generation, review, and evaluation, enabling fine-grained assessment at both agent and consensus levels. Experiments using chatGPT-4o on public radiology datasets demonstrate the framework's ability to align evaluation protocols with the LLM development lifecycle, paving the way for trustworthy radiology report generation.

Key Contribution

A multi-agent framework using LLMs and LVMs offers a new benchmark for radiology report generation, enabling fine-grained evaluation of clinical reliability and paving the way for trustworthy AI in medical imaging.

Abstract

Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Citation Metrics

Citations1

Influential citations0

References12

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Related Papers