BAIRMar 15, 2026arXiv:2603.14473

AI Can Learn Scientific Taste

Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yu-Gang Jiang, Xipeng Qiu

AI Summary

This paper introduces Reinforcement Learning from Community Feedback (RLCF), a novel training paradigm that leverages large-scale citation data to instill "scientific taste" in AI agents. RLCF trains a Scientific Judge to discriminate between high- and low-impact research ideas based on 700K paper pairs, then uses it as a reward model to train a Scientific Thinker to generate promising research proposals. Experiments demonstrate that Scientific Judge surpasses state-of-the-art LLMs in judging research ideas and that Scientific Thinker can propose ideas with higher potential impact than baselines, suggesting AI can learn scientific taste.

Key Contribution

Forget benchmarks: AI can now learn "scientific taste" and propose research ideas with higher potential impact than humans, thanks to a novel reinforcement learning approach using citation data.

Abstract

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored. In this work, we propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers to judge ideas. For preference alignment, using Scientific Judge as a reward model, we train a policy model, Scientific Thinker, to propose research ideas with high potential impact. Experiments show Scientific Judge outperforms SOTA LLMs (e.g., GPT-5.2, Gemini 3 Pro) and generalizes to future-year test, unseen fields, and peer-review preference. Furthermore, Scientific Thinker proposes research ideas with higher potential impact than baselines. Our findings show that AI can learn scientific taste, marking a key step toward reaching human-level AI scientists.

RLHF & Preference Learning Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AI Can Learn Scientific Taste

Related Papers