Search papers, labs, and topics across Lattice.
EvoFlows is introduced as a novel sequence-to-sequence protein modeling approach for protein engineering, using edit flows to predict insertions, deletions, and substitutions on a template protein sequence. It learns mutational trajectories between evolutionarily-related protein sequences, modeling distributions of related natural proteins and the mutational paths connecting them. In silico evaluations on UNIREF and OAS demonstrate that EvoFlows achieves comparable quality to masked language models in capturing protein sequence distributions, while improving the generation of natural-like mutants from a template.
Forget masked language models: EvoFlows offers controllable, edit-based protein sequence generation, predicting both the *what* and the *where* of mutations for more natural-like protein engineering.
We introduce EvoFlows, a variable-length sequence-to-sequence protein modeling approach uniquely suited to protein engineering. Unlike autoregressive and masked language models, EvoFlows perform a limited, controllable number of insertions, deletions, and substitutions on a template protein sequence. In other words, EvoFlows predict not only _which_ mutation to perform, but also _where_ it should occur. Our approach leverages edit flows to learn mutational trajectories between evolutionarily-related protein sequences, simultaneously modeling distributions of related natural proteins and the mutational paths connecting them. Through extensive _in silico_ evaluation on diverse protein communities from UNIREF and OAS, we demonstrate that EvoFlows capture protein sequence distributions with a quality comparable to leading masked language models commonly used in protein engineering, while showing improved ability to generate non-trivial yet natural-like mutants from a given template protein.