From Code to Comprehension: AI Captures the Language of Life DOI
Luis E. Valentin-Alvarado, Gavin J. Knott

The CRISPR Journal, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 29, 2025

Language: Английский

How to build the virtual cell with artificial intelligence: Priorities and opportunities DOI Creative Commons
Charlotte Bunne, Yusuf Roohani, Yanay Rosen

et al.

Cell, Journal Year: 2024, Volume and Issue: 187(25), P. 7045 - 7063

Published: Dec. 1, 2024

Cells are essential to understanding health and disease, yet traditional models fall short of modeling simulating their function behavior. Advances in AI omics offer groundbreaking opportunities create an virtual cell (AIVC), a multi-scale, multi-modal large-neural-network-based model that can represent simulate the behavior molecules, cells, tissues across diverse states. This Perspective provides vision on design how collaborative efforts build AIVCs will transform biological research by allowing high-fidelity simulations, accelerating discoveries, guiding experimental studies, offering new for cellular functions fostering interdisciplinary collaborations open science.

Language: Английский

Citations

23

Genome modeling and design across all domains of life with Evo 2 DOI Creative Commons
Garyk Brixi, Matthew G. Durrant, Ja‐Lok Ku

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 21, 2025

Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing genomic code have transformed biological research, intelligently composing new systems would also require a deep understanding the immense complexity encoded by genomes. We introduce Evo 2, foundation model trained on 9.3 trillion DNA base pairs from highly curated atlas spanning all domains life. train 2 7B 40B parameters to an unprecedented 1 million token context window single-nucleotide resolution. learns sequence alone accurately predict functional impacts genetic variation—from noncoding pathogenic mutations clinically significant BRCA1 variants—without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that autonomously breadth features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, prophage regions. Beyond its predictive capabilities, generates mitochondrial, prokaryotic, eukaryotic sequences at genome scale greater naturalness coherence than previous methods. Guiding via inference-time search enables controllable generation epigenomic structure, which demonstrate first scaling results in biology. make fully open, parameters, training code, inference OpenGenome2 dataset, accelerate exploration design complexity.

Language: Английский

Citations

7

Genomic language models: opportunities and challenges DOI
Gonzalo Benegas, Chengzhong Ye,

Carlos Albors

et al.

Trends in Genetics, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Language: Английский

Citations

5

Artificial intelligence in clinical genetics DOI Creative Commons
Dat Duong, Benjamin D. Solomon

European Journal of Human Genetics, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

Abstract Artificial intelligence (AI) has been growing more powerful and accessible, will increasingly impact many areas, including virtually all aspects of medicine biomedical research. This review focuses on previous, current, especially emerging applications AI in clinical genetics. Topics covered include a brief explanation different general categories AI, machine learning, deep generative AI. After introductory explanations examples, the discusses genetics three main categories: diagnostics; management therapeutics; support. The concludes with short, medium, long-term predictions about ways that may affect field Overall, while precise speed at which continue to change is unclear, as are overall ramifications for patients, families, clinicians, researchers, others, it likely result dramatic evolution It be important those involved prepare accordingly order minimize risks maximize benefits related use field.

Language: Английский

Citations

5

Machine learning for synthetic gene circuit engineering DOI
Sebastian Palacios,

James J. Collins,

Domitilla Del Vecchio

et al.

Current Opinion in Biotechnology, Journal Year: 2025, Volume and Issue: 92, P. 103263 - 103263

Published: Jan. 27, 2025

Language: Английский

Citations

3

Evaluating the representational power of pre-trained DNA language models for regulatory genomics DOI Creative Commons
Ziqi Tang,

Nikunj V. Somia,

Yiyang Yu

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 4, 2024

ABSTRACT The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity cis -regulatory patterns in the non-coding genome without requiring labels functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged improve predictive performance across broad range regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody foundational understanding biology remains open question. Here we evaluate representational power predict interpret cell-type-specific data span DNA RNA regulation. Our findings suggest probing do not offer substantial advantages over conventional machine approaches use one-hot encoded sequences. This work highlights major gap with current gLMs, raising potential issues pre-training strategies genome.

Language: Английский

Citations

14

Foundation models in bioinformatics DOI Creative Commons
Fei Guo, Renchu Guan, Yaohang Li

et al.

National Science Review, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 25, 2025

With the adoption of foundation models (FMs), artificial intelligence (AI) has become increasingly significant in bioinformatics and successfully addressed many historical challenges, such as pre-training frameworks, model evaluation interpretability. FMs demonstrate notable proficiency managing large-scale, unlabeled datasets, because experimental procedures are costly labor intensive. In various downstream tasks, have consistently achieved noteworthy results, demonstrating high levels accuracy representing biological entities. A new era computational biology been ushered by application FMs, focusing on both general specific issues. this review, we introduce recent advancements employed a variety including genomics, transcriptomics, proteomics, drug discovery single-cell analysis. Our aim is to assist scientists selecting appropriate bioinformatics, according four types: language vision graph multimodal FMs. addition understanding molecular landscapes, AI technology can establish theoretical practical for continued innovation biology.

Language: Английский

Citations

2

Teaching AI to speak protein DOI Creative Commons
Michael Heinzinger, Burkhard Rost

Current Opinion in Structural Biology, Journal Year: 2025, Volume and Issue: 91, P. 102986 - 102986

Published: Feb. 21, 2025

Language: Английский

Citations

2

SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models DOI Creative Commons
Bernardo P. de Almeida,

Hugo Dalla-Torre,

Guillaume Richard

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 15, 2024

Foundation models have achieved remarkable success in several fields such as natural language processing, computer vision and more recently biology. DNA foundation particular are emerging a promising approach for genomics. However, so far no model has delivered granular, nucleotide-level predictions across wide range of genomic regulatory elements, limiting their practical usefulness. In this paper, we build on our previous work the Nucleotide Transformer (NT) to develop segmentation model, SegmentNT, that processes input sequences up 30kb-long predict 14 different classes elements at single nucleotide resolution. By utilizing pre-trained weights from NT, SegmentNT surpasses performance ablation models, including convolution networks with one-hot encoded trained scratch. can process multiple sequence lengths zero-shot generalization 50kb. We show improved detection splice sites throughout genome demonstrate strong precision. Because it evaluates all gene simultaneously, impact variants not only site changes but also exon intron rearrangements transcript isoforms. Finally, human generalize plant species multispecies achieves stronger genic unseen species. summary, demonstrates tackle complex, granular tasks genomics single-nucleotide be easily extended additional species, thus representing new paradigm how analyze interpret DNA. make SegmentNT-30kb available github repository Jax HuggingFace space Pytorch.

Language: Английский

Citations

9

Synthetic genomes unveil the effects of synonymous recoding DOI Creative Commons
Ákos Nyerges, Anush Chiappino-Pepe, Bogdan Budnik

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: June 16, 2024

Abstract Engineering the genetic code of an organism provides basis for (i) making any safely resistant to natural viruses and (ii) preventing information flow into out genetically modified organisms while (iii) allowing biosynthesis encoded unnatural polymers 1–4 . Achieving these three goals requires reassignment multiple 64 codons nature uses encode proteins. However, synonymous codon replacement—recoding—is frequently lethal, how recoding impacts fitness remains poorly explored. Here, we explore effects using whole-genome synthesis, multiplexed directed evolution, genome-transcriptome-translatome-proteome co-profiling on recoded genomes. Using this information, assemble a synthetic Escherichia coli genome in seven sections only 57 By discovering rules responsible lethality developing data-driven multi-omics-based construction workflow that troubleshoots genomes, overcome lethal 62,007 swaps 11,108 additional genomic edits. We show induces transcriptional noise including new antisense RNAs, leading drastic transcriptome proteome perturbation. As elimination select from organism’s results widespread appearance cryptic promoters, choice may naturally evolve minimize noise. Our work first genome-scale description changes influence organismal paves way functional genomes provide firewalls ecosystems produce biopolymers, drugs, enzymes with expanded chemistry.

Language: Английский

Citations

7