Search papers, labs, and topics across Lattice.
The paper discusses the emergence and capabilities of large language of life models (LLLMs) in life sciences, highlighting their ability to process multiomic data beyond the capabilities of standard multimodal LLMs. It emphasizes the accelerated development of foundation models pretrained on massive datasets of biomolecular data, enabling tasks such as predicting variant effects, assessing gene essentiality, and generating novel DNA sequences. The paper uses Evo as a key example, demonstrating the potential of LLLMs trained on genomic data to understand and manipulate biological systems.
Forget text and images, AI now speaks the language of life, predicting protein structures, generating DNA, and revealing the secrets hidden in our genomes.
In 2021, a year before ChatGPT took the world by storm amid the excitement about generative artificial intelligence (AI), AlphaFold 2 cracked the 50-year-old protein-folding problem, predicting three-dimensional (3D) structures for more than 200 million proteins from their amino acid sequences. This accomplishment was a precursor to an unprecedented burgeoning of large language models (LLMs) in the life sciences. That was just the beginning. In recent months, we have moved into a hyperaccelerated phase of new foundation models, pretrained on massive datasets, with the ability to perform a wide range of tasks that are helping us understand the structure, biology, evolution, and design of proteins, RNA, DNA, and ligands, as well as their biomolecular interactions. Unlike multimodal LLMs such as GPT-4, Gemini, and Claude, which process text, audio, and images, these large language of life models (LLLMs) are multiomic. That is to say, they are not only multimodal but pertain to different layers of molecular biology. For example, Evo, a foundation model trained on 2.7 million diverse phage and prokaryotic genomes (equivalent to about 300 billion DNA nucleotides), predicts the impact of variants in DNA, RNA, or proteins on structure and function, as well as how essential genes are to cell function, and can generate new DNA sequences.