Search papers, labs, and topics across Lattice.
This paper introduces a novel LLM-based indicator developed by PLOS and DataSeer to quantify research data reuse in scholarly publications, addressing the need to monitor the downstream impacts of open science. The LLM was trained to identify instances of data reuse within publications. Results indicate a data reuse rate of 43%, surpassing estimates from traditional bibliometric methods, suggesting the positive effects of data sharing are underestimated.
LLMs reveal that research data is being reused far more often than previously thought, suggesting open science's impact is bigger than we realized.
Numerous metascience studies and other initiatives have begun to monitor the prevalence of open science practices when it is more important to understand the'downstream'effects or impacts of open science. PLOS and DataSeer have developed a new LLM-based indicator to measure an important effect of open science: the reuse of research data. Our results show a data reuse rate of 43%, which is higher than established bibliometric techniques. We show that data reuse can be measured at scale using LLMs and generative artificial intelligence. The positive effects of research data sharing and reuse may currently be underestimated.