JHUMar 9, 2026arXiv:2603.09018

Meissa: Multi-modal Medical Agentic Intelligence

Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou, Alan L. Yuille

AI Summary

Meissa is a 4B-parameter multi-modal medical agent that learns to use tools and collaborate with other agents by distilling structured trajectories from frontier models. It employs unified trajectory modeling, three-tier stratified supervision, and prospective-retrospective supervision to learn effective interaction policies. Meissa achieves comparable or superior performance to frontier models on 10 of 16 medical benchmarks while operating offline with significantly reduced latency and cost.

Key Contribution

A 4B-parameter model, Meissa, rivals the performance of much larger proprietary models in medical agent tasks, offering a cost-effective and privacy-preserving alternative for clinical applications.

Abstract

Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier models. Specifically, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state-action-observation formalism, allowing one model to generalize across heterogeneous medical environments. (2) Three-tier stratified supervision: the model's own errors trigger progressive escalation from direct reasoning to tool-augmented and multi-agent interaction, explicitly learning difficulty-aware strategy selection. (3) Prospective-retrospective supervision: pairing exploratory forward traces with hindsight-rationalized execution traces enables stable learning of effective interaction policies. Trained on 40K curated trajectories, Meissa matches or exceeds proprietary frontier agents in 10 of 16 evaluation settings across 13 medical benchmarks spanning radiology, pathology, and clinical reasoning. Using over 25x fewer parameters than typical frontier models like Gemini-3, Meissa operates fully offline with 22x lower end-to-end latency compared to API-based deployment. Data, models, and environments are released at https://github.com/Schuture/Meissa.

Multimodal Models Open-Source Models & Weights Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References67

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Meissa: Multi-modal Medical Agentic Intelligence

Related Papers