The CRISPR Journal, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 29, 2025
Language: Английский
The CRISPR Journal, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 29, 2025
Language: Английский
Cell, Journal Year: 2024, Volume and Issue: 187(25), P. 7045 - 7063
Published: Dec. 1, 2024
Cells are essential to understanding health and disease, yet traditional models fall short of modeling simulating their function behavior. Advances in AI omics offer groundbreaking opportunities create an virtual cell (AIVC), a multi-scale, multi-modal large-neural-network-based model that can represent simulate the behavior molecules, cells, tissues across diverse states. This Perspective provides vision on design how collaborative efforts build AIVCs will transform biological research by allowing high-fidelity simulations, accelerating discoveries, guiding experimental studies, offering new for cellular functions fostering interdisciplinary collaborations open science.
Language: Английский
Citations
23bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 21, 2025
Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing genomic code have transformed biological research, intelligently composing new systems would also require a deep understanding the immense complexity encoded by genomes. We introduce Evo 2, foundation model trained on 9.3 trillion DNA base pairs from highly curated atlas spanning all domains life. train 2 7B 40B parameters to an unprecedented 1 million token context window single-nucleotide resolution. learns sequence alone accurately predict functional impacts genetic variation—from noncoding pathogenic mutations clinically significant BRCA1 variants—without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that autonomously breadth features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, prophage regions. Beyond its predictive capabilities, generates mitochondrial, prokaryotic, eukaryotic sequences at genome scale greater naturalness coherence than previous methods. Guiding via inference-time search enables controllable generation epigenomic structure, which demonstrate first scaling results in biology. make fully open, parameters, training code, inference OpenGenome2 dataset, accelerate exploration design complexity.
Language: Английский
Citations
7Trends in Genetics, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 1, 2025
Language: Английский
Citations
5European Journal of Human Genetics, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 13, 2025
Abstract Artificial intelligence (AI) has been growing more powerful and accessible, will increasingly impact many areas, including virtually all aspects of medicine biomedical research. This review focuses on previous, current, especially emerging applications AI in clinical genetics. Topics covered include a brief explanation different general categories AI, machine learning, deep generative AI. After introductory explanations examples, the discusses genetics three main categories: diagnostics; management therapeutics; support. The concludes with short, medium, long-term predictions about ways that may affect field Overall, while precise speed at which continue to change is unclear, as are overall ramifications for patients, families, clinicians, researchers, others, it likely result dramatic evolution It be important those involved prepare accordingly order minimize risks maximize benefits related use field.
Language: Английский
Citations
5Current Opinion in Biotechnology, Journal Year: 2025, Volume and Issue: 92, P. 103263 - 103263
Published: Jan. 27, 2025
Language: Английский
Citations
3bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: March 4, 2024
ABSTRACT The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity cis -regulatory patterns in the non-coding genome without requiring labels functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged improve predictive performance across broad range regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody foundational understanding biology remains open question. Here we evaluate representational power predict interpret cell-type-specific data span DNA RNA regulation. Our findings suggest probing do not offer substantial advantages over conventional machine approaches use one-hot encoded sequences. This work highlights major gap with current gLMs, raising potential issues pre-training strategies genome.
Language: Английский
Citations
14National Science Review, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 25, 2025
With the adoption of foundation models (FMs), artificial intelligence (AI) has become increasingly significant in bioinformatics and successfully addressed many historical challenges, such as pre-training frameworks, model evaluation interpretability. FMs demonstrate notable proficiency managing large-scale, unlabeled datasets, because experimental procedures are costly labor intensive. In various downstream tasks, have consistently achieved noteworthy results, demonstrating high levels accuracy representing biological entities. A new era computational biology been ushered by application FMs, focusing on both general specific issues. this review, we introduce recent advancements employed a variety including genomics, transcriptomics, proteomics, drug discovery single-cell analysis. Our aim is to assist scientists selecting appropriate bioinformatics, according four types: language vision graph multimodal FMs. addition understanding molecular landscapes, AI technology can establish theoretical practical for continued innovation biology.
Language: Английский
Citations
2Current Opinion in Structural Biology, Journal Year: 2025, Volume and Issue: 91, P. 102986 - 102986
Published: Feb. 21, 2025
Language: Английский
Citations
2bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: March 15, 2024
Foundation models have achieved remarkable success in several fields such as natural language processing, computer vision and more recently biology. DNA foundation particular are emerging a promising approach for genomics. However, so far no model has delivered granular, nucleotide-level predictions across wide range of genomic regulatory elements, limiting their practical usefulness. In this paper, we build on our previous work the Nucleotide Transformer (NT) to develop segmentation model, SegmentNT, that processes input sequences up 30kb-long predict 14 different classes elements at single nucleotide resolution. By utilizing pre-trained weights from NT, SegmentNT surpasses performance ablation models, including convolution networks with one-hot encoded trained scratch. can process multiple sequence lengths zero-shot generalization 50kb. We show improved detection splice sites throughout genome demonstrate strong precision. Because it evaluates all gene simultaneously, impact variants not only site changes but also exon intron rearrangements transcript isoforms. Finally, human generalize plant species multispecies achieves stronger genic unseen species. summary, demonstrates tackle complex, granular tasks genomics single-nucleotide be easily extended additional species, thus representing new paradigm how analyze interpret DNA. make SegmentNT-30kb available github repository Jax HuggingFace space Pytorch.
Language: Английский
Citations
9bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: June 16, 2024
Abstract Engineering the genetic code of an organism provides basis for (i) making any safely resistant to natural viruses and (ii) preventing information flow into out genetically modified organisms while (iii) allowing biosynthesis encoded unnatural polymers 1–4 . Achieving these three goals requires reassignment multiple 64 codons nature uses encode proteins. However, synonymous codon replacement—recoding—is frequently lethal, how recoding impacts fitness remains poorly explored. Here, we explore effects using whole-genome synthesis, multiplexed directed evolution, genome-transcriptome-translatome-proteome co-profiling on recoded genomes. Using this information, assemble a synthetic Escherichia coli genome in seven sections only 57 By discovering rules responsible lethality developing data-driven multi-omics-based construction workflow that troubleshoots genomes, overcome lethal 62,007 swaps 11,108 additional genomic edits. We show induces transcriptional noise including new antisense RNAs, leading drastic transcriptome proteome perturbation. As elimination select from organism’s results widespread appearance cryptic promoters, choice may naturally evolve minimize noise. Our work first genome-scale description changes influence organismal paves way functional genomes provide firewalls ecosystems produce biopolymers, drugs, enzymes with expanded chemistry.
Language: Английский
Citations
7