Search papers, labs, and topics across Lattice.
This paper introduces a multi-agent framework using Vision-Language Models (VLMs) for improved lung X-ray diagnosis by mitigating diagnostic hallucinations and incorporating external medical knowledge. The framework employs a WebSearch Agent for RAG, a Central Agent for multimodal reasoning using CoT, and an Evaluation Agent for adversarial error correction. Empirical results demonstrate that this collaborative approach outperforms single-agent systems in diagnostic accuracy, interpretability, and robustness, offering a scalable solution for complex medical diagnostics.
A trio of specialized AI agents—one for web search, one for reasoning, and one for adversarial evaluation—can significantly boost the accuracy of lung X-ray diagnoses compared to standalone systems.
This paper proposes a novel multi-agent framework based on Vision-Language Models (VLMs) to address challenges in medical lung X-ray diagnosis, such as diagnostic hallucinations and insufficient domain knowledge. The framework comprises three specialized agents: the WebSearch Agent for dynamic retrieval of medical knowledge, the Central Agent for multimodal reasoning and decision-making, and the Evaluation Agent for adversarial reflection and error correction. By integrating retrieval-augmented generation (RAG), toolchain functionalities, and Chain-of-Thought (CoT) optimization, the system enhances diagnostic accuracy, interpretability, and robustness. Empirical validation demonstrates the superiority of this collaborative approach over single-agent systems, offering a scalable solution for complex medical diagnostics. The contributions include a modular architecture, dynamic knowledge fusion, and reflection mechanisms, significantly reducing diagnostic errors and improving clinical reliability.