Personalized Medicine: Evo 2 for Treatment Prediction
Fahad Kiani

Fahad Kiani

May 03, 2025

Personalized Medicine: Evo 2 for Treatment Prediction

Understanding Mutations: Evo 2 as a Genomic Analyst Agent

Evo 2 is trained on a massive dataset of genomic sequences, allowing it to learn the "grammar" of DNA and RNA. It can predict the functional impact of mutations by evaluating how "likely" a particular variant is within the broader genomic context. This includes distinguishing between harmful and benign mutations

Understanding Mutations: Evo 2 as a Genomic Analyst Agent

  • Scientific Basis: Evo 2 is trained on a massive dataset of genomic sequences, allowing it to learn the "grammar" of DNA and RNA. It can predict the functional impact of mutations by evaluating how "likely" a particular variant is within the broader genomic context. This includes distinguishing between harmful and benign mutations.  

1. Personalized Medicine: Evo 2 for Treatment Prediction

  • Scientific Basis: Evo 2 can predict the functional impact of mutations, which is crucial for understanding how individual patients might respond to therapies targeting those specific mutations. While Evo 2 isn't directly trained for drug response prediction, its understanding of genomic context can be leveraged.  

Technical Integration:

  • Treatment Prediction Agent: Develop a "Treatment Prediction Agent."
  • Input: This agent would take patient genomic data and information about available cancer therapies (including their target mechanisms) as input.
  • Evo 2 Integration: The agent can use the "Genomic Analyst Agent" to first understand the key mutations in the patient's tumor. Then, based on the known targets of different drugs, it can use Evo 2's predictions on the functional impact of these mutations to infer potential sensitivity or resistance
  • Output: The agent would provide the oncologist with potential treatment options, highlighting those that are most likely to be effective based on the patient's tumor mutations and Evo 2's analysis.

2. Drug Development Assistance: Evo 2 for Target Identification and Design

  • Scientific Basis: Evo 2 can identify gene variants associated with diseases like cancer and predict the functional consequences of these mutations. This can help in identifying novel therapeutic targets. Additionally, Evo 2 can design novel DNA sequences and even entire genomes with specific functionalities, which could aid in designing novel therapies.  
  • Technical Integration:
    • Drug Discovery Agent: Create a "Drug Discovery Agent" (potentially for research-focused users).
    • Input: This agent could take information about cancer types, dysregulated pathways, or specific mutations as input.
    • Evo 2 Integration:
      • The agent can use the "Genomic Analyst Agent" to identify key driver mutations.
      • It can then leverage Evo 2's generative capabilities to design novel molecules (though this might require further integration with other drug design tools) or gene therapies targeting these mutations. Evo 2 can even design genetic elements for cell-type specific gene therapy activation.  
      • Output: The agent could provide researchers with potential drug targets or novel therapeutic sequences generated by Evo 2.

3. Identifying Key Mutations: Evo 2 for Driver vs. Passenger Distinction

  • Scientific Basis: Evo 2's ability to identify disease-causing mutations and its training across diverse species allows it to distinguish between mutations that drive cancer (driver mutations) and those that are merely present (passenger mutations).  
  • Technical Integration:
    • Genomic Analyst Agent Enhancement: The "Genomic Analyst Agent" can be further enhanced to specifically identify and flag potential driver mutations based on Evo 2's predictions.
    • Output Prioritization: The agent's output to the oncologist can prioritize mutations that Evo 2 identifies as likely drivers, enabling more focused research and treatment strategies.

By strategically integrating Evo 2's capabilities through specialized JEDI agents, Jedi Labs is building powerful applications that will assists oncologists in various aspects of cancer care, from understanding the underlying genomic drivers to potentially predicting treatment responses and even contributing to drug discovery efforts.

Further reasearch by @asimov

Evo 2: Features, Architecture, and Availability

  • The model is available as an API endpoint and can be fine-tuned for free using NVIDIA's BioNeMo framework, and its large context window is one of its most important features, allowing it to capture a wide range of biological data and make more accurate predictions.
  • According to corresponding author Brian Hie, Evo 2 may not solve all questions in biology, but it will be helpful in answering many more questions than task-specific models, and its ability to learn the language of DNA and make predictions at a large scale has the potential to make bioengineering more predictable and efficient.
  • Evo 2's architecture is based on a transformer-architecture, a type of neural network that takes inputs and converts them into outputs by looking at an entire sequence, all at once, and then figuring out which features are most important, similar to how ChatGPT learns the language of written text.
  • The release of Evo 2 includes open-source training code, inference code, model parameters, and the OpenGenome2 training data, making it a valuable tool for researchers in the field of biology and bioengineering.

The Challenge of Long Sequences: StripedHyena 2 and Training

  • The Evo 2 model has the capability to hold sequences up to one million nucleotides in its "working memory" at one time, which is eight times more than its predecessor, allowing it to answer questions about entire genes, regulatory regions, and distant gene interactions.
  • This technological leap is important for understanding how eukaryotic genomes work, as genomes are not just linear instruction manuals, but dynamic systems where a gene's behavior changes depending on its physical location and regulatory elements that can be hundreds of thousands of bases away.
  • The Evo 2 team solved the problem of expanding the context window by building an upgraded AI architecture called StripedHyena 2, which is designed to handle ultra-long genetic sequences at a lower computational cost, leveraging different convolutional and attention operators to model both short- and long-range dependencies.
  • StripedHyena 2 combines short explicit, medium regularized, and long implicit convolutions in a gated multi-hybrid architecture, improving computational efficiency, and was used to train Evo 2 on 2,000 H100 GPUs from NVIDIA, which is about 150-times more compute than AlphaFold.

Evo 2's Predictive Power and Development Context

  • Evo 2 was trained on DNA sequences but can make predictions about various aspects of biology, including RNA stability and protein structures, and has been shown to perform well across a broad range of predictive and generative tasks, such as predicting whether a genetic mutation in humans might cause disease.
  • In a test, Evo 2 accurately predicted harmful mutations in the BRCA1 gene more than 90 percent of the time, despite never being trained on any BRCA1 variant data, and also predicted "variants of unknown significance," which are mutations that scientists have never observed but which the model suspects are pathogenic.
  • The development of Evo 2 involved collaboration with several individuals and organizations, including Greg Brockman, co-founder of OpenAI, who worked on the problem of reducing the quadratic cost of transformers during a sabbatical at the Arc Institute, and TogetherAI, which collaborated on the development of the original StripedHyena architecture.

Designing Genomes and Proteins with Evo 2

  • The Evo 2 model has been used to design entire genome sequences from scratch, including mitochondrial genomes, a yeast chromosome, and a small bacterial genome, with the goal of understanding its capabilities in generating functional DNA sequences.
  • In the case of mitochondrial genomes, Evo 2 was able to generate sequences that partly overlapped with natural sequences and encoded the same core genes as those found in real mitochondria, including ribosomal RNAs and tRNAs, and the designed proteins closely resembled their natural counterparts.
  • The team also used AlphaFold 3 to predict the structures of the AI-generated mitochondrial proteins, which showed promising results with pLDDT scores ranging from 0.67 to 0.83, indicating that the designed proteins were similar to their natural counterparts.

Controlling Gene Expression with AI-Designed DNA

  • Furthermore, researchers used Evo 2 to design DNA sequences likely to adopt an "open" or "closed" state within human cells, which is a crucial aspect of gene expression, and they employed existing deep learning models, such as EnformerBorzoi, to evaluate the generated sequences and determine how well they matched the desired chromatin pattern.
  • The results showed that Evo 2's designs more accurately matched the "open" or "closed" chromatin state as the number of sampled sequences increased, with an AUROC greater than 0.9, indicating that the model can generate DNA sequences with precise gene expression levels in different cell types.
  • The Arc Institute researchers, including Hie, are collaborating with DNA synthesis experts at the University of Washington to validate these AI-generated sequences in mouse cells, which could potentially lead to a better understanding of how to design DNA sequences with specific functions and behaviors.

Medical Potential and Biosecurity Considerations

  • The Evo 2 model has the potential to revolutionize medicine by allowing for the design of genetic elements that can be precisely controlled to target specific cells, such as neurons or liver cells, which could lead to more targeted treatments with fewer side effects, as noted by co-author and computational biologist Hani Goodarzi.
  • However, the model's open-source nature also raises concerns about its potential misuse by "bad actors" to design harmful sequences, including bioweapons, which is why the developers have implemented biosecurity measures, such as excluding viruses that infect eukaryotic hosts from the model's training data and testing the model to ensure it would not respond meaningfully to pathogen-related queries.

Understanding Evo 2's Learning and Impact on Bioengineering

  • To address the question of what Evo 2 is actually "learning", the researchers trained a specialized model called a sparse autoencoder, which revealed that Evo 2 had discovered fundamental biological concepts, such as viral DNA signatures and protein secondary structures, solely by training on DNA sequences, and was not just memorizing examples from its training data.
  • The model's ability to predict complex biological features, such as the exon-intron architecture of a woolly mammoth genome, demonstrates its potential to improve the design of biology, which has long been a challenge due to the probabilistic nature of cellular behavior, and could lead to more predictable bioengineering breakthroughs, similar to prime editing, a technology that allows scientists to insert, delete, or swap DNA bases.

Future Perspectives and Comparison with AlphaFold

  • The future uses of Evo 2 are expected to mirror those of AlphaFold, an AI tool that overturned structural biology by "solving" the protein-folding problem, and while some scientists may feel that Evo 2 could spell the end for certain fields, such as crystallography or structural biology, others see it as an opportunity for progress and optimism, as noted by structural biologist Mohammed AlQuraishi.
  • The development of Evo 2 is expected to have a significant impact on the field of biology, similar to the impact of solving individual proteins, and will allow researchers to design original DNA sequences and test them in the laboratory.
  • Scientists will use Evo 2 to design bespoke enzymes and other biological systems, which will give them a glimpse of the future of bioengineering and potentially pave the way for breakthroughs in the field.
  • Evo 2 Interaction:
    • NVIDIA BioNeMo: The most straightforward way to integrate Evo 2 is through the NVIDIA BioNeMo platform. You can access Evo 2 as an NVIDIA NIM microservice via an API. This offers a managed and scalable solution.  
    • Arc Institute GitHub: For more direct control and potential fine-tuning, you can access the Evo 2 code and model parameters on the Arc Institute's GitHub repository. This requires setting up the necessary environment with Python, PyTorch, and potentially NVIDIA GPUs.  

Source Article Details

  • https://www.asimov.press/p/evo-2 (Second article)
  • The article "Evo 2 Can Design Entire Genomes" by Eryney Marrogi and Niko McCarty, with additional reporting by Alec Nielsen, discusses the potential of Evo 2 and its implications for biology and bioengineering, and is available on Asimov Press with a DOI of https://doi.org/10.62211/45yp-23jh.
  • Eryney Marrogi, a medical student at the University of Vermont with experience in biological engineering, and Niko McCarty, a founding editor of Asimov Press, are the authors of the article, which provides insight into the potential of Evo 2 to design entire genomes and its potential applications in the field of biology.

Oct 29, 2024

Knowledge Graphs and Retrieval-Augmented Generation (GraphRAG)

May 04, 2025

Intelligent Transformation