Tsinghua AIMar 2, 2026arXiv:2603.01801

What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

Lehui Li, Ruining Wang, Haochen Song, Yaoxin Mao, Tong Zhang, Yuyao Wang, Jiayi Fan, Yitong Zhang, Jieping Ye, Chengqi Zhang, Yongshun Gong

AI Summary

This paper addresses the challenge of automated paper reproduction, focusing on recovering tacit knowledge (relational, somatic, and collective) that is not explicitly stated in papers. They propose a graph-based agent framework (\method) that uses relation-aware aggregation, execution-feedback refinement, and graph-level knowledge induction to recover these different types of tacit knowledge. Experiments on an extended ReproduceBench demonstrate that \method{} significantly reduces the performance gap compared to official implementations and outperforms existing baselines.

Key Contribution

Automating paper reproduction isn't about finding code, it's about filling in the "missing manual" of tacit knowledge, and this graph-based agent closes the gap by 24.68%.

Abstract

Automated paper reproduction -- generating executable code from academic papers -- is bottlenecked not by information retrieval but by the tacit knowledge that papers inevitably leave implicit. We formalize this challenge as the progressive recovery of three types of tacit knowledge -- relational, somatic, and collective -- and propose \method, a graph-based agent framework with a dedicated mechanism for each: node-level relation-aware aggregation recovers relational knowledge by analyzing implementation-unit-level reuse and adaptation relationships between the target paper and its citation neighbors; execution-feedback refinement recovers somatic knowledge through iterative debugging driven by runtime signals; and graph-level knowledge induction distills collective knowledge from clusters of papers sharing similar implementations. On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, \method{} achieves an average performance gap of 10.04\% against official implementations, improving over the strongest baseline by 24.68\%. The code will be publicly released upon acceptance; the repository link will be provided in the final version.

Code Generation & Program Synthesis Open-Source Models & Weights Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

Related Papers