Search papers, labs, and topics across Lattice.
The paper introduces a language-model-guided symbolic regression framework to discover interpretable physical laws from high-dimensional materials data. By leveraging the scientific knowledge embedded in large language models, the method efficiently searches for physically plausible formulas, mitigating the combinatorial explosion inherent in traditional symbolic regression. The approach is validated on perovskite materials, discovering novel and accurate formulas for bulk modulus, band gap, and oxygen evolution reaction activity while reducing the search space by approximately $10^5$.
LLMs can slash the search space for physical laws by 100,000x, yielding simpler and more accurate formulas for materials properties.
Discovering interpretable physical laws from high-dimensional data is a fundamental challenge in scientific research. Traditional methods, such as symbolic regression, often produce complex, unphysical formulas when searching a vast space of possible forms. We introduce a framework that guides the search process by leveraging the embedded scientific knowledge of large language models, enabling efficient identification of physical laws in the data. We validate our approach by modeling key properties of perovskite materials. Our method mitigates the combinatorial explosion commonly encountered in traditional symbolic regression, reducing the effective search space by a factor of approximately $10^5$. A set of novel formulas for bulk modulus, band gap, and oxygen evolution reaction activity are identified, which not only provide meaningful physical insights but also outperform previous formulas in accuracy and simplicity.