Variant Effect Prediction in the Age of Machine Learning DOI
Yana Bromberg, R. Prabakaran, Anowarul Kabir

et al.

Cold Spring Harbor Perspectives in Biology, Journal Year: 2024, Volume and Issue: 16(7), P. a041467 - a041467

Published: April 15, 2024

Over the years, many computational methods have been created for analysis of impact single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all supervised and thus limited by inadequate sizes experimentally curated data sets lack a standardized definition variant effect. The emergence unsupervised, deep learning (DL)-based raised an important question: Can machines learn language life unannotated protein sequence well enough to identify significant errors "sentences"? Our suggests that some unsupervised perform as or better than existing methods. Unsupervised are also faster can, thus, be useful large-scale evaluations. For other methods, however, their performance varies both evaluation metrics type effect being predicted. We note method is still lacking on less-studied, nonhuman proteins where hold most promise.

Language: Английский

MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning DOI Creative Commons
Chang Li, Degui Zhi, Kai Wang

et al.

Genome Medicine, Journal Year: 2022, Volume and Issue: 14(1)

Published: Oct. 8, 2022

Multiple computational approaches have been developed to improve our understanding of genetic variants. However, their ability identify rare pathogenic variants from benign ones is still lacking. Using context annotations and deep learning methods, we present pathogenicity prediction models, MetaRNN MetaRNN-indel, help prioritize nonsynonymous single nucleotide (nsSNVs) non-frameshift insertion/deletions (nfINDELs). We use independent test sets demonstrate that these new models outperform state-of-the-art competitors achieve a more interpretable score distribution. Importantly, scores both are comparable, enabling easy adoption integrated genotype-phenotype association analysis methods. All pre-computed nsSNV available at http://www.liulab.science/MetaRNN . The stand-alone program also https://github.com/Chang-Li2019/MetaRNN

Language: Английский

Citations

102

Mendelian inheritance revisited: dominance and recessiveness in medical genetics DOI
Johannes Zschocke, Peter H. Byers, Andrew O.M. Wilkie

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(7), P. 442 - 463

Published: Feb. 20, 2023

Language: Английский

Citations

53

Analysis of AlphaMissense data in different protein groups and structural context DOI Creative Commons
Hedvig Tordai, Odalys Torres,

Máté Csepi

et al.

Scientific Data, Journal Year: 2024, Volume and Issue: 11(1)

Published: May 14, 2024

Abstract Single amino acid substitutions can profoundly affect protein folding, dynamics, and function. The ability to discern between benign pathogenic is pivotal for therapeutic interventions research directions. Given the limitations in experimental examination of these variants, AlphaMissense has emerged as a promising predictor pathogenicity missense variants. Since heterogenous performance on different types proteins be expected, we assessed efficacy across several groups (e.g. soluble, transmembrane, mitochondrial proteins) regions intramembrane, membrane interacting, high confidence AlphaFold segments) using ClinVar data validation. Our comprehensive evaluation showed that delivers outstanding performance, with MCC scores predominantly 0.6 0.74. We observed low disordered datasets related CFTR ABC protein. However, superior was shown when benchmarked against quality CFTR2 database. results emphasizes AlphaMissense’s potential pinpointing functional hot spots, its likely surpassing benchmarks calculated from ProteinGym datasets.

Language: Английский

Citations

22

Critical assessment of missense variant effect predictors on disease-relevant variant data DOI Creative Commons
Ruchir Rastogi, Ryan Chung,

Sindy Li

et al.

Human Genetics, Journal Year: 2025, Volume and Issue: unknown

Published: March 21, 2025

Abstract Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity missense variants necessary evaluate their clinical research utility guide future improvements. The Critical Assessment Genome Interpretation (CAGI) conducts ongoing Annotate-All-Missense (Missense Marathon) challenge, in which variant effect predictors (also called impact predictors) evaluated on added disease-relevant databases following prediction submission deadline. Here we assess submitted CAGI 6 commonly genetics, recently developed deep learning methods. We examine performance across a range settings relevant for applications, focusing different subsets evaluation data as well high-specificity high-sensitivity regimes. Our evaluations reveal notable advances current methods relative older, well-cited field. While meta-predictors tend outperform constituent individual predictors, several newer perform comparably meta-predictors. Predictor varies between regimes, highlighting may be optimal use cases. also characterize two potential sources bias. Predictors incorporate allele frequency predictive feature have reduced when distinguishing pathogenic from very rare benign variants, trained labels curated often inherit gene-level label imbalances. findings help illuminate modern identify areas development.

Language: Английский

Citations

2

Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases DOI Creative Commons
Francisco M. De La Vega, Shimul Chowdhury,

Barry Moore

et al.

Genome Medicine, Journal Year: 2021, Volume and Issue: 13(1)

Published: Oct. 14, 2021

Clinical interpretation of genetic variants in the context patient's phenotype is becoming largest component cost and time expenditure for genome-based diagnosis rare diseases. Artificial intelligence (AI) holds promise to greatly simplify speed genome by integrating predictive methods with growing knowledge disease. Here we assess diagnostic performance Fabric GEM, a new, AI-based, clinical decision support tool expediting interpretation.We benchmarked GEM retrospective cohort 119 probands, mostly NICU infants, diagnosed diseases, who received whole-genome or whole-exome sequencing (WGS, WES). We replicated our analyses separate 60 cases collected from five academic medical centers. For comparison, also analyzed these current state-of-the-art variant prioritization tools. Included comparisons were trio, duo, singleton cases. Variants underpinning diagnoses spanned diverse modes inheritance types, including structural (SVs). Patient phenotypes extracted notes two means: manually using an automated natural language processing (CNLP) tool. Finally, 14 previously unsolved reanalyzed.GEM ranked over 90% causal genes among top second candidate prioritized review median 3 per case, either curated CNLP-derived descriptions. Ranking trios duos was unchanged when as singletons. In 17 20 SVs, identified SVs 19/20 within five, irrespective whether SV calls provided inferred ab initio its own internal detection algorithm. showed similar absence parental genotypes. Analysis resulted novel finding one candidates ultimately not advanced upon manual cases, no new findings 10 cases.GEM enabled inclusive all types through nomination very short list disorders final reporting. combination deep phenotyping CNLP, enables substantial automation disease diagnosis, potentially decreasing case review.

Language: Английский

Citations

103

Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity DOI Creative Commons
Mathieu Quinodoz, Virginie G. Peter, Katarina Cisarova

et al.

The American Journal of Human Genetics, Journal Year: 2022, Volume and Issue: 109(3), P. 457 - 470

Published: Feb. 3, 2022

We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied 840 genes from ClinVar database, this detected significant non-random pathogenic benign 387 (46%) 172 (20%) genes, respectively, revealing that variant clustering is widespread across human exome. This likely occurs as consequence mechanisms shaping pathogenicity at protein level, illustrated by overlap some clusters with known functional domains. then took advantage these findings develop predictor, MutScore, integrates qualitative features DNA substitutions new additional information derived positional clustering. Using random forest approach, MutScore was able identify mutations very high accuracy, outperforming existing predictive tools, especially for associated autosomal-dominant disease Thus, changes an important previously underappreciated feature exome, which can be harnessed improve prediction disambiguation uncertain significance.

Language: Английский

Citations

58

Genome interpretation using in silico predictors of variant impact DOI Creative Commons
Panagiotis Katsonis, Kevin Wilhelm, Amanda M. Williams

et al.

Human Genetics, Journal Year: 2022, Volume and Issue: 141(10), P. 1549 - 1577

Published: April 30, 2022

Estimating the effects of variants found in disease driver genes opens door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction all available variants, leaving majority as unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on large scale, most often based numerous genetic differences between species. Despite concerns that these may lack reliability individual subjects, their practical applications over cohorts suggest they are already helpful have role play genome interpretation when used at proper scale context. review, we aim gain insights into training validation variant effect predicting illustrate representative types experimental clinical applications. Objective performance assessments using various datasets not yet published indicate strengths limitations each method. These show cautious use impact predictors is essential for addressing challenges.

Language: Английский

Citations

54

Predicting functional effect of missense variants using graph attention neural networks DOI
Haicang Zhang,

Michelle S. Xu,

Xiao Fan

et al.

Nature Machine Intelligence, Journal Year: 2022, Volume and Issue: 4(11), P. 1017 - 1028

Published: Nov. 15, 2022

Language: Английский

Citations

49

Interpreting protein variant effects with computational predictors and deep mutational scanning DOI Creative Commons
Benjamin Livesey, Joseph A. Marsh

Disease Models & Mechanisms, Journal Year: 2022, Volume and Issue: 15(6)

Published: June 1, 2022

ABSTRACT Computational predictors of genetic variant effect have advanced rapidly in recent years. These programs provide clinical and research laboratories with a rapid scalable method to assess the likely impacts novel variants. However, it can be difficult know what extent we trust their results. To benchmark performance, are often tested against large datasets known pathogenic benign benchmarking data may overlap used train some supervised predictors, which leads re-use or circularity, resulting inflated performance estimates for those predictors. Furthermore, new usually found by authors superior all previous suggests degree computational bias benchmarking. Large-scale functional assays as deep mutational scans one possible solution this problem, providing independent measurements. In Review, discuss key advances predictor methodology, current strategies how derived from overcome issue circularity. We also ability such directly predict mutations might affect future need

Language: Английский

Citations

41

Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs DOI Creative Commons
Ipsita Agarwal, Zachary L. Fuller, Simon Myers

et al.

eLife, Journal Year: 2023, Volume and Issue: 12

Published: Jan. 17, 2023

Causal loss-of-function (LOF) variants for Mendelian and severe complex diseases are enriched in 'mutation intolerant' genes. We show how such observations can be interpreted light of a model mutation-selection balance use the to relate pathogenic consequences LOF mutations at present their evolutionary fitness effects. To this end, we first infer posterior distributions costs 17,318 autosomal 679 X-linked genes from exome sequences 56,855 individuals. Estimated loss gene copy typically above 1%; they tend largest genes, whether or not have Y homolog, followed by pseudoautosomal region. compare inferred effects all possible de novo those identified individuals diagnosed with one six severe, developmental disorders. Probands carry an excess estimated 10%; as simulation, when sampled population, highly deleterious only couple generations old. Moreover, proportion carried probands reflects typical age onset disease. The study design also has discernible influence: greater is detected pedigree than case-control studies, autism, simplex multiplex families female versus male probands. Thus, anchoring human genetics population genetic allows us learn about different mapping strategies traits.

Language: Английский

Citations

35