Search papers, labs, and topics across Lattice.
This paper explores a label-free screening strategy for combinatorial electrocatalysts, representing each composition using embeddings derived from scientific texts. It compares Word2Vec and transformer-based embeddings, encoding compositions via element-wise mixing or short prompts. The method filters candidates based on similarity to conductivity and dielectric property concepts, assessed across 15 materials libraries.
Surprisingly, a lightweight Word2Vec model often outperforms transformer-based embeddings in filtering electrocatalyst candidates, achieving greater reduction in possible compositions while maintaining performance.
Compositionally complex solid solution electrocatalysts span vast composition spaces, and even one materials system can contain more candidate compositions than can be measured exhaustively. Here we evaluate a label-free screening strategy that represents each composition using embeddings derived from scientific texts and prioritizes candidates based on similarity to two property concepts. We compare a corpus-trained Word2Vec baseline with transformer-based embeddings, where compositions are encoded either by linear element-wise mixing or by short composition prompts. Similarities to `concept directions', the terms conductivity and dielectric, define a 2-dimensional descriptor space, and a symmetric Pareto-front selection is used to filter candidate subsets without using electrochemical labels. Performance is assessed on 15 materials libraries including noble metal alloys and multicomponent oxides. In this setting, the lightweight Word2Vec baseline, which uses a simple linear combination of element embeddings, often achieves the highest number of reductions of possible candidate compositions while staying close to the best measured performance.