
Published: April 18, 2025
Abstract The inability to interpret the functional impact of non-coding variants has been a major impediment in promise precision medicine. While high-throughput experimental approaches such as Massively Parallel Reporter Assays (MPRAs) have made progress identifying causal and their underlying molecular mechanisms, these tools cannot exhaustively measure variant effects genome-wide. Here we present MPAC, an ensemble machine-learning models trained on MPRA data that provides accurate scalable prediction cis-regulatory variants. Using MPAC predict allelic for 575M single nucleotide (SNVs) across diverse applications, including complex trait genetics, clinical tumor sequencing, evolutionary analyses, saturation mutagenesis. We find predictions match performance empirical MPRAs trait-associated alleles. demonstrate utility by applying it ClinVar, pathogenic variation with higher accuracy than other sequence-to-function models. also nominate 1,892 candidate cancer drivers predicting somatic SNVs COSMIC database. Next, evaluate population-level genetic all 514M gnomAD, quantifying relationship between regulatory function constraint. Finally, generate prospective maps using in-silico mutagenesis 18,658 human promoters, observing widespread selection against predicted disrupt promoter activity. Collectively, this study establishes value comprehensive, publicly available resource interpretation.
Language: Английский