Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders DOI Creative Commons
Rose Orenbuch, Aaron W. Kollasch,

Hansen Spinner

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 28, 2023

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods increasingly successful at for known genes, they do not generalize well to other genes the scores calibrated across proteome. To address this, we developed deep generative model, popEVE, that combines evolutionary information with population sequence data achieves state-of-the-art performance ranking by severity distinguish patients severe developmental disorders from potentially healthy individuals. popEVE identifies 442 cohort of disorder cases, including evidence 119 novel without need gene-level enrichment overestimating prevalence pathogenic population. By placing on unified scale, our model offers comprehensive perspective distribution fitness entire proteome broader human provides compelling even exceptionally rare single-patient where conventional techniques relying repeated observations may be applicable. Interactive web viewer downloads available pop.evemodel.org.

Language: Английский

CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions DOI Creative Commons
Max Schubach, Thorben Maaß, Lusiné Nazaretyan

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 52(D1), P. D1143 - D1154

Published: Jan. 5, 2024

Machine Learning-based scoring and classification of genetic variants aids the assessment clinical findings is employed to prioritize in diverse studies analyses. Combined Annotation-Dependent Depletion (CADD) one first methods for genome-wide prioritization across different molecular functions has been continuously developed improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) sequence conservation (Zoonomia). evaluated version on data sets derived from ClinVar, ExAC/gnomAD 1000 Genomes variants. For coding effects, tested 31 Deep Mutational Scanning (DMS) ProteinGym and, prediction, used saturation mutagenesis reporter assay promoter enhancer sequences. The inclusion features further overall performance CADD. As with previous releases, all sets, v1.7 scores, scripts on-site an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ community.

Language: Английский

Citations

117

Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering DOI Creative Commons
Kerr Ding, M. A. Chin, Yunlong Zhao

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: July 29, 2024

Abstract The effective design of combinatorial libraries to balance fitness and diversity facilitates the engineering useful enzyme functions, particularly those that are poorly characterized or unknown in biology. We introduce MODIFY, a machine learning (ML) algorithm learns from natural protein sequences infer evolutionarily plausible mutations predict fitness. MODIFY co-optimizes predicted sequence starting libraries, prioritizing high-fitness variants while ensuring broad coverage. In silico evaluation shows outperforms state-of-the-art unsupervised methods zero-shot prediction enables ML-guided directed evolution with enhanced efficiency. Using we engineer generalist biocatalysts derived thermostable cytochrome c achieve enantioselective C-B C-Si bond formation via new-to-nature carbene transfer mechanism, leading six away previously developed enzymes exhibiting superior comparable activities. These results demonstrate MODIFY’s potential solving challenging problems beyond reach classic evolution.

Language: Английский

Citations

16

A DNA language model based on multispecies alignment predicts the effects of genome-wide variants DOI
Gonzalo Benegas,

Carlos Albors,

Alan J. Aw

et al.

Nature Biotechnology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 2, 2025

Language: Английский

Citations

4

Identification of constrained sequence elements across 239 primate genomes DOI Creative Commons
Lukas F. K. Kuderna, Jacob C. Ulirsch,

Sabrina Mohd Rashid

et al.

Nature, Journal Year: 2023, Volume and Issue: 625(7996), P. 735 - 742

Published: Nov. 29, 2023

Abstract Noncoding DNA is central to our understanding of human gene regulation and complex diseases 1,2 , measuring the evolutionary sequence constraint can establish functional relevance putative regulatory elements in genome 3–9 . Identifying genomic that have become constrained specifically primates has been hampered by faster evolution noncoding compared protein-coding 10 relatively short timescales separating primate species 11 previously limited availability whole-genome sequences 12 Here we construct a alignment 239 species, representing nearly half all extant order. Using this resource, identified are under selective across other mammals at 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites 267,410 transcription factor binding but not placental validate their cis -regulatory effects on expression. These enriched for genetic variants affect expression traits diseases. Our results highlight important role recent differentiating primates, including humans, from mammals.

Language: Английский

Citations

38

Rare penetrant mutations confer severe risk of common diseases DOI
Petko Fiziev, Jeremy F. McRae, Jacob C. Ulirsch

et al.

Science, Journal Year: 2023, Volume and Issue: 380(6648)

Published: June 1, 2023

We examined 454,712 exomes for genes associated with a wide spectrum of complex traits and common diseases observed that rare, penetrant mutations in implicated by genome-wide association studies confer ~10-fold larger effects than variants the same genes. Consequently, an individual at phenotypic extreme greatest risk severe, early-onset disease is better identified few rare collective action many weak effects. By combining across phenotype-associated into unified genetic model, we demonstrate superior portability diverse global populations compared common-variant polygenic scores, greatly improving clinical utility genetic-based prediction.

Language: Английский

Citations

37

Will variants of uncertain significance still exist in 2030? DOI Creative Commons
Douglas M. Fowler, Heidi L. Rehm

The American Journal of Human Genetics, Journal Year: 2023, Volume and Issue: 111(1), P. 5 - 10

Published: Dec. 11, 2023

Language: Английский

Citations

32

COVID-19 annual update: a narrative review DOI Creative Commons
Michela Biancolella, Vito Luigi Colona, Lucio Luzzatto

et al.

Human Genomics, Journal Year: 2023, Volume and Issue: 17(1)

Published: July 24, 2023

Abstract Three and a half years after the pandemic outbreak, now that WHO has formally declared emergency is over, COVID-19 still significant global issue. Here, we focus on recent developments in genetic genomic research COVID-19, give an outlook state-of-the-art therapeutical approaches, as gradually transitioning to endemic situation. The sequencing characterization of rare alleles different populations made it possible identify numerous genes affect either susceptibility or severity disease. These findings provide beginning new avenues pan-ethnic therapeutic well potential screening protocols. causative virus, SARS-CoV-2, spotlight, but novel threatening virus could appear anywhere at any time. Therefore, continued vigilance further warranted. We also note emphatically prevent future pandemics other world-wide health crises, imperative capitalize what have learnt from COVID-19: specifically, regarding its origins, world’s response, insufficient preparedness. This requires unprecedented international collaboration timely data sharing for coordination effective response rapid implementation containment measures.

Language: Английский

Citations

28

Harnessing deep learning for population genetic inference DOI
Xin Huang, Aigerim Rymbekova, Olga Dolgova

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 25(1), P. 61 - 78

Published: Sept. 4, 2023

Language: Английский

Citations

27

Applications of artificial intelligence in clinical laboratory genomics DOI Creative Commons
Swaroop Aradhya, Flavia M. Facio, Hillery C. Metz

et al.

American Journal of Medical Genetics Part C Seminars in Medical Genetics, Journal Year: 2023, Volume and Issue: 193(3)

Published: July 28, 2023

Abstract The transition from analog to digital technologies in clinical laboratory genomics is ushering an era of “big data” ways that will exceed human capacity rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular facilitate timely diagnosis management genomic disorders require supportive artificial intelligence methods. These are already being introduced into identify variants DNA sequencing data, predict the effects on protein structure function inform interpretation pathogenicity, link phenotype ontologies genetic identified through exome or genome help clinicians reach diagnostic answers faster, correlate with tumor staging treatment approaches, utilize natural language processing critical published medical literature during analysis use interactive chatbots individuals who qualify for testing provide pre‐test post‐test education. With careful ethical development validation genomics, these advances expected significantly enhance abilities geneticists translate clearly synthesized information managing care their patients at scale.

Language: Английский

Citations

26

A foundational large language model for edible plant genomes DOI Creative Commons
Javier Mendoza‐Revilla,

Evan Trop,

Liam Gonzalez

et al.

Communications Biology, Journal Year: 2024, Volume and Issue: 7(1)

Published: July 9, 2024

Abstract Significant progress has been made in the field of plant genomics, as demonstrated by increased use high-throughput methodologies that enable characterization multiple genome-wide molecular phenotypes. These findings have provided valuable insights into traits and their underlying genetic mechanisms, particularly model species. Nonetheless, effectively leveraging them to make accurate predictions represents a critical step crop genomic improvement. We present AgroNT, foundational large language trained on genomes from 48 species with predominant focus show AgroNT can obtain state-of-the-art for regulatory annotations, promoter/terminator strength, tissue-specific gene expression, prioritize functional variants. conduct large-scale silico saturation mutagenesis analysis cassava evaluate impact over 10 million mutations provide predicted effects resource variant characterization. Finally, we propose diverse datasets compiled here Plants Genomic Benchmark (PGB), providing comprehensive benchmark deep learning-based methods research. The pre-trained is publicly available HuggingFace at https://huggingface.co/InstaDeepAI/agro-nucleotide-transformer-1b future research purposes.

Language: Английский

Citations

11