Search papers, labs, and topics across Lattice.
The paper introduces a retrieval-augmented foundation model (MMPT-RAG) for generating matched molecular pair transformations (MMPTs), addressing limitations in existing ML methods for analog design. They train a foundation model on a large dataset of MMPTs using a variable-to-variable formulation, enabling the generation of diverse molecular analogs conditioned on an input molecule. The retrieval-augmented approach incorporates external reference analogs as contextual guidance, improving diversity, novelty, and controllability in practical drug discovery scenarios.
Forget end-to-end molecular design; this retrieval-augmented model lets you steer analog generation with prompts and external references, mimicking medicinal chemists' intuition.
Matched molecular pairs (MMPs) capture the local chemical edits that medicinal chemists routinely use to design analogs, but existing ML approaches either operate at the whole-molecule level with limited edit controllability or learn MMP-style edits from restricted settings and small models. We propose a variable-to-variable formulation of analog generation and train a foundation model on large-scale MMP transformations (MMPTs) to generate diverse variables conditioned on an input variable. To enable practical control, we develop prompting mechanisms that let the users specify preferred transformation patterns during generation. We further introduce MMPT-RAG, a retrieval-augmented framework that uses external reference analogs as contextual guidance to steer generation and generalize from project-specific series. Experiments on general chemical corpora and patent-specific datasets demonstrate improved diversity, novelty, and controllability, and show that our method recovers realistic analog structures in practical discovery scenarios.