Max PlanckUniversity of TennesseeJun 3, 2026arXiv:2606.05127

Non-covalent Interactions at cm$^{-1}$ Accuracy: Data Efficient Physics-Informed Distillation for Machine Learning Interatomic Potentials

Yulin Shen, Shahzad Akram, Louis Primeau, Gen Zu, Konstantinos D. Vogiatzis, Yang Zhang, Adrian Del Maestro

AI Summary

This study explores the transferability of knowledge from a pretrained universal machine-learning interatomic potential (MLIP) to specialist potentials with quantum-chemical accuracy through a combination of knowledge distillation and coupled-cluster fine-tuning. The authors demonstrate that using only 30% of the CCSD(T) data for fine-tuning can outperform direct training on the full dataset, achieving a significant reduction in computational cost while enhancing accuracy. Additionally, the findings reveal that the choice of pretrained teacher significantly impacts the physical structure transfer, as evidenced by varying error rates across different polycyclic aromatic hydrocarbons (PAHs).

Key Contribution

Fine-tuning with just 30% of high-fidelity data can achieve quantum-chemical accuracy while slashing computational costs by 60%.

Abstract

Foundation models in atomistic machine learning encode interaction physics across diverse atomic environments, but whether that structure can be transferred when building specialist potentials at quantum-chemical accuracy remains open. Here we show that knowledge distillation from a pretrained universal machine-learning interatomic potential (MLIP), followed by coupled-cluster fine-tuning with single and double excitations and perturbative triples [CCSD(T)], transfers not only low-cost labels but a physically meaningful prior on interaction length scales, anisotropy, and the repulsive-dispersive balance, which CCSD(T) data then sharpens to quantum-chemical accuracy. For He--benzene, fine-tuning with 30% of the CCSD(T) data outperforms direct training using the full 80%; a 60% reduction in the high-fidelity compute budget. A symmetry-adapted perturbation theory (SAPT)-informed adaptive short-range/long-range architecture further lowers the validation MAE from 0.75 1/cm to 0.49 1/cm. Across a circumarene series of polycyclic aromatic hydrocarbons (PAHs), swapping the MLIP teacher under an otherwise identical pipeline changes the coronene error by an order of magnitude while leaving the larger PAHs stable, direct evidence that distillation transfers physical structure, not labels alone. Together, these results identify the choice of pretrained teacher as a primary design axis for data-efficient quantum-chemical-accuracy potentials, alongside architecture and training protocol.

Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Non-covalent Interactions at cm$^{-1}$ Accuracy: Data Efficient Physics-Informed Distillation for Machine Learning Interatomic Potentials

Related Papers