Microsoft ResearchCambridgeUniversity of CaliforniaVirginia TechJun 9, 2026arXiv:2606.10587

Towards Diverse Scientific Hypothesis Search with Large Language Models

Haorui Wang, Parshin Shojaee, P. Shojaee, Kazem Meidani, Kunyang Sun, José Miguel Hernández-Lobato, Jos'e Miguel Hern'andez-Lobato, Teresa Head-Gordon, T. Head-Gordon, Jiajun He, Chandan K. Reddy, Chao Zhang, Yuanqi Du

AI Summary

This paper addresses the limitations of traditional hypothesis generation methods in scientific discovery by proposing a novel evolutionary framework that prioritizes diversity alongside quality. The authors reformulate hypothesis search as a sampling problem, leveraging a parallel tempering-inspired approach to explore multiple temperature levels, which facilitates better information exchange and exploration. The results demonstrate that their method significantly enhances both the quality and diversity of generated hypotheses across various domains, while remaining efficient under fixed validation budgets.

Key Contribution

By rethinking hypothesis generation as a sampling problem, this framework boosts both the quality and diversity of scientific hypotheses, challenging the status quo of optimization-focused methods.

Abstract

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.

Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References67

Year2026

VenueN/A

Related Papers

Finding related papers...