Search papers, labs, and topics across Lattice.
This review paper comprehensively surveys the landscape of Large Language Model (LLM) agents in biomedicine, covering their architectures, methodologies, and applications in areas like clinical decision-making and research automation. The authors analyze emerging benchmarks for evaluating agent performance in dynamic and interactive settings, while also addressing critical challenges such as hallucinations and biases. The paper concludes by outlining future research directions, including continual learning and multi-agent coordination, to facilitate the development of reliable and clinically deployable biomedical LLM agents.
LLM agents are poised to revolutionize biomedicine, but this review highlights the critical need to address challenges like hallucinations and bias before widespread clinical deployment.
Large language model (LLM)-based agents are rapidly emerging as transformative tools across biomedical research and clinical applications. By integrating reasoning, planning, memory, and tool use capabilities, these agents go beyond static language models to operate autonomously or collaboratively within complex healthcare settings. This review provides a comprehensive survey of biomedical LLM agents, spanning their core system architectures, enabling methodologies, and real-world use cases such as clinical decision making, biomedical research automation, and patient simulation. We further examine emerging benchmarks designed to evaluate agent performance under dynamic, interactive, and multimodal conditions. In addition, we systematically analyze key challenges, including hallucinations, interpretability, tool reliability, data bias, and regulatory gaps, and discuss corresponding mitigation strategies. Finally, we outline future directions in areas such as continual learning, federated adaptation, robust multi-agent coordination, and human AI collaboration. This review aims to establish a foundational understanding of biomedical LLM agents and provide a forward-looking roadmap for building trustworthy, reliable, and clinically deployable intelligent systems.