North China Electric Power UniversityTencent AIApr 22, 2026arXiv:2604.20263

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling

Zhenyu Wang, Geyan Ye, Wei Liu, Man Tat Alexander Ng

AI Summary

The paper introduces AROMA, a multimodal architecture for virtual cell genetic perturbation modeling that integrates textual evidence, graph-topology information, and protein sequence features. AROMA is trained using a two-stage optimization strategy to predict molecular state changes under genetic perturbations, improving both accuracy and interpretability. Experiments demonstrate that AROMA outperforms existing methods across multiple cell lines, exhibits robustness in zero-shot settings, and handles knowledge-sparse scenarios effectively.

Key Contribution

Achieve more reliable and interpretable virtual cell perturbation predictions by combining knowledge-driven multimodal modeling with evidence retrieval.

Abstract

Virtual cell modeling predicts molecular state changes under genetic perturbations in silico, which is essential for biological mechanism studies. However, existing approaches suffer from unconstrained reasoning, uninterpretable predictions, and retrieval signals that are weakly aligned with regulatory topology. To address these limitations, we propose AROMA, an Augmented Reasoning Over a Multimodal Architecture for virtual cell genetic perturbation modeling. AROMA integrates textual evidence, graph-topology information, and protein sequence features to model perturbation-target dependencies, and is trained with a two-stage optimization strategy to yield predictions that are both accurate and interpretable. We also construct two knowledge graphs and a perturbation reasoning dataset, PerturbReason, containing more than 498k samples, as reusable resources for the virtual cell domain. Experiments show that AROMA outperforms existing methods across multiple cell lines, and remains robust under zero-shot evaluation on an unseen cell line, as well as in knowledge-sparse, long-tail scenarios. Overall, AROMA demonstrates that combining knowledge-driven multimodal modeling with evidence retrieval provides a promising pathway toward more reliable and interpretable virtual cell perturbation prediction. Model weights are available at https://huggingface.co/blazerye/AROMA. Code is available at https://github.com/blazerye/AROMA.

Multimodal Models Reasoning & Chain-of-Thought Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling

Related Papers