Identifying non-coding variant effects at scale via machine learning models of cis-regulatory reporter assays DOI Creative Commons
John C. Butts, Stephen Rong, Sager J. Gosai

et al.

Published: April 18, 2025

Abstract The inability to interpret the functional impact of non-coding variants has been a major impediment in promise precision medicine. While high-throughput experimental approaches such as Massively Parallel Reporter Assays (MPRAs) have made progress identifying causal and their underlying molecular mechanisms, these tools cannot exhaustively measure variant effects genome-wide. Here we present MPAC, an ensemble machine-learning models trained on MPRA data that provides accurate scalable prediction cis-regulatory variants. Using MPAC predict allelic for 575M single nucleotide (SNVs) across diverse applications, including complex trait genetics, clinical tumor sequencing, evolutionary analyses, saturation mutagenesis. We find predictions match performance empirical MPRAs trait-associated alleles. demonstrate utility by applying it ClinVar, pathogenic variation with higher accuracy than other sequence-to-function models. also nominate 1,892 candidate cancer drivers predicting somatic SNVs COSMIC database. Next, evaluate population-level genetic all 514M gnomAD, quantifying relationship between regulatory function constraint. Finally, generate prospective maps using in-silico mutagenesis 18,658 human promoters, observing widespread selection against predicted disrupt promoter activity. Collectively, this study establishes value comprehensive, publicly available resource interpretation.

Language: Английский

2024 ASHG Scientific Achievement Award DOI Creative Commons
Nadav Ahituv

The American Journal of Human Genetics, Journal Year: 2025, Volume and Issue: 112(3), P. 473 - 477

Published: March 1, 2025

Language: Английский

Citations

0

Identifying non-coding variant effects at scale via machine learning models of cis-regulatory reporter assays DOI Creative Commons
John C. Butts, Stephen Rong, Sager J. Gosai

et al.

Published: April 18, 2025

Abstract The inability to interpret the functional impact of non-coding variants has been a major impediment in promise precision medicine. While high-throughput experimental approaches such as Massively Parallel Reporter Assays (MPRAs) have made progress identifying causal and their underlying molecular mechanisms, these tools cannot exhaustively measure variant effects genome-wide. Here we present MPAC, an ensemble machine-learning models trained on MPRA data that provides accurate scalable prediction cis-regulatory variants. Using MPAC predict allelic for 575M single nucleotide (SNVs) across diverse applications, including complex trait genetics, clinical tumor sequencing, evolutionary analyses, saturation mutagenesis. We find predictions match performance empirical MPRAs trait-associated alleles. demonstrate utility by applying it ClinVar, pathogenic variation with higher accuracy than other sequence-to-function models. also nominate 1,892 candidate cancer drivers predicting somatic SNVs COSMIC database. Next, evaluate population-level genetic all 514M gnomAD, quantifying relationship between regulatory function constraint. Finally, generate prospective maps using in-silico mutagenesis 18,658 human promoters, observing widespread selection against predicted disrupt promoter activity. Collectively, this study establishes value comprehensive, publicly available resource interpretation.

Language: Английский

Citations

0