Search papers, labs, and topics across Lattice.
This paper introduces a multi-agent reinforcement learning framework integrating LLMs and LVMs for radiology report generation and evaluation, addressing the need for clinically reliable systems and rigorous evaluation protocols. The framework comprises ten specialized agents handling image analysis, report generation, review, and evaluation, enabling fine-grained assessment at both agent and consensus levels. Experiments using chatGPT-4o on public radiology datasets demonstrate the framework's ability to align evaluation protocols with the LLM development lifecycle, paving the way for trustworthy radiology report generation.
A multi-agent framework using LLMs and LVMs offers a new benchmark for radiology report generation, enabling fine-grained evaluation of clinical reliability and paving the way for trustworthy AI in medical imaging.
Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.