Apr 30, 2026arXiv:2604.28061

Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

Lauren Cadwallader, Lauren Cadwallader, Iain Hrynaszkiewicz, I. Hrynaszkiewicz, Parth Sarin, parth sarin, Tim Vines, Timothy H. Vines

AI Summary

This paper introduces a novel LLM-based indicator developed by PLOS and DataSeer to quantify research data reuse in scholarly publications, addressing the need to monitor the downstream impacts of open science. The LLM was trained to identify instances of data reuse within publications. Results indicate a data reuse rate of 43%, surpassing estimates from traditional bibliometric methods, suggesting the positive effects of data sharing are underestimated.

Key Contribution

LLMs reveal that research data is being reused far more often than previously thought, suggesting open science's impact is bigger than we realized.

Abstract

Numerous metascience studies and other initiatives have begun to monitor the prevalence of open science practices when it is more important to understand the'downstream'effects or impacts of open science. PLOS and DataSeer have developed a new LLM-based indicator to measure an important effect of open science: the reuse of research data. Our results show a data reuse rate of 43%, which is higher than established bibliometric techniques. We show that data reuse can be measured at scale using LLMs and generative artificial intelligence. The positive effects of research data sharing and reuse may currently be underestimated.

Eval Frameworks & Benchmarks Natural Language Processing Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

Related Papers