Nanyang Normal UniversityApr 13, 2026arXiv:2604.10874

AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis

AI Summary

The paper introduces AOP-Smart, a RAG-based framework tailored for Adverse Outcome Pathway (AOP) analysis, leveraging AOP-Wiki's XML data to retrieve relevant knowledge based on Key Events (KEs), Key Event Relationships (KERs), and AOP-specific information. By augmenting LLMs with retrieved knowledge, AOP-Smart significantly reduces hallucinations and improves the accuracy of AOP-related question answering. Experiments on Gemini, DeepSeek, and ChatGPT demonstrate accuracy improvements from 15-35% to 95-100% when using AOP-Smart.

Key Contribution

LLMs armed with RAG can leap from 35% to 100% accuracy on complex toxicology reasoning tasks, suggesting a potent recipe for reliable scientific knowledge processing.

Abstract

Adverse Outcome Pathways (AOPs) are an important knowledge framework in toxicological research and risk assessment. In recent years, large language models (LLMs) have gradually been applied to AOP-related question answering and mechanistic reasoning tasks. However, due to the existence of the hallucination problem, that is, the model may generate content that is inconsistent with facts or lacks evidence, their reliability is still limited. To address this issue, this study proposes an AOP-oriented Retrieval-Augmented Generation (RAG) framework, AOP-Smart. Based on the official XML data from AOP-Wiki, this method uses Key Events (KEs), Key Event Relationships (KERs), and specific AOP information to retrieve relevant knowledge for user questions, thereby improving the reliability of the generated results of large language models. To evaluate the effectiveness of the proposed method, this study constructed a test set containing 20 AOP-related question answering tasks, covering KE identification, upstream and downstream KE retrieval, and complex AOP retrieval tasks. Experiments were conducted on three mainstream large language models, Gemini, DeepSeek, and ChatGPT, and comparative tests were performed under two settings: without RAG and with RAG. The experimental results show that, without using RAG, the accuracies of GPT, DeepSeek, and Gemini were 15.0\%, 35.0\%, and 20.0\%, respectively; after using RAG, their accuracies increased to 95.0\%, 100.0\%, and 95.0\%, respectively. The results indicate that AOP-Smart can significantly alleviate the hallucination problem of large language models in AOP knowledge tasks, and greatly improve the accuracy and consistency of their answers.

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis

Related Papers