From a genomic risk model to clinical trial implementation in a learning health system: the ProGRESS Study DOI Creative Commons
Jason L. Vassy,

Anna Dornisch,

Roshan Karunamuni

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 4, 2024

ABSTRACT Background As healthcare moves from a one-size-fits-all approach towards precision care, individual risk prediction is an important step in disease prevention and early detection. Biobank-linked systems can generate knowledge about genomic test the impact of implementing that care. Risk-stratified prostate cancer screening one clinical application might benefit such approach. Methods We developed translation pipeline for genomics-informed national system. used data 585,418 male participants Veterans Affairs (VA) Million Veteran Program (MVP), among whom 101,920 self-identify as Black/African-American, to develop validate Prostate CAncer integrated Risk Evaluation (P-CARE) model, model based on polygenic score, family history, genetic principal components. The was externally validated 18,457 PRACTICAL Consortium participants. A novel blended genome-exome (BGE) platform laboratory assay both P-CARE rare variants cancer-associated genes, including additional validation 74,331 samples All Us Research Program. Results In overall ancestry-stratified analyses, score 601 associated with any, metastatic, fatal MVP PRACTICAL. Values at ≥80th percentile multiancestry cohort were hazard ratios (HR) 2.75 (95% CI 2.66-2.84), 2.78 2.54-2.99), 2.59 2.22-2.97) MVP, respectively, compared median. When high– low-risk groups defined HR>1.5 HR<0.75 metastatic cancer, 220,062 (37.6%) high-risk vs.146,826 (25.1%) had 47.9% vs. 14.1%, 9.3% 2.0%, 3.6% 0.8% cumulative cause-specific incidence by age 90, respectively. reports are now being implemented trial VA system (Clinicaltrials.gov NCT05926102 ). Conclusions consisting components describes clinically gradient diverse patient population demonstrates potential learning health implement evaluate care approaches.

Language: Английский

Causal interpretations of family GWAS in the presence of heterogeneous effects DOI Creative Commons
Carl Veller, Molly Przeworski, Graham Coop

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 16, 2023

Family-based genome-wide association studies (GWAS) have emerged as a gold standard for assessing causal effects of alleles and polygenic scores. Notably, family are often claimed to provide an unbiased estimate the average effect (or treatment effect; ATE) allele, on basis analogy between random transmission from parents children randomized controlled trial. Here, we show that this interpretation does not hold in general. Because Mendelian segregation only randomizes among heterozygotes, homozygotes observable. Consequently, if allele has different can arise presence gene-by-environment interactions, gene-by-gene or differences LD patterns, biased sample. At single locus, family-based be thought providing heterozygotes (i.e., local LATE). This extend scores, however, because sets SNPs heterozygous each family. Therefore, other than under specific conditions, within-family regression slope PGS cannot assumed any subset weighted families. Instead, reinterpreted enabling extent which at loci contributes population-level variance trait. include between-family variance, applies (roughly) half sample variance. In practice, potential biases GWAS likely smaller those arising confounding standard, population-based GWAS, so remain important dissection genetic contributions phenotypic variation. Nonetheless, estimates is less straightforward been widely appreciated.

Language: Английский

Citations

6

On blockwise and reference panel-based estimators for genetic data prediction in high dimensions DOI

Bingxin Zhao,

Shurong Zheng,

Hongtu Zhu

et al.

The Annals of Statistics, Journal Year: 2024, Volume and Issue: 52(3)

Published: June 1, 2024

Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or linkage disequilibrium (LD) pattern) of variants often presents a block-diagonal structure, numerous methods account dependence among in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, variant each block is typically estimated from external reference panels rather than original training set. This paper unified analysis blockwise panel-based estimators framework without sparsity restrictions. We find that, surprisingly, even when has structure with well-defined boundaries, estimation adjusting can be substantially less accurate controlling whole matrix. Further, built on set are likely have varying performance high dimensions, which may reflect cost having only access summary level based novel results random theory numerically evaluate our using extensive simulations real UK Biobank.

Language: Английский

Citations

2

Genomic landscape of cancer in racially and ethnically diverse populations DOI
Claire E. Thomas, Ulrike Peters

Nature Reviews Genetics, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 28, 2024

Language: Английский

Citations

2

scAI-SNP: a method for inferring ancestry from single-cell data DOI Creative Commons

Sung Chul Hong,

Francesc Muyas, Isidro Cortés‐Ciriano

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 17, 2024

Collaborative efforts, such as the Human Cell Atlas, are rapidly accumulating large amounts of single-cell data. To ensure that atlases representative human genetic diversity, we need to determine ancestry donors from whom data generated. Self-reporting race and ethnicity, although important, can be biased is not always available for datasets already collected. Here, introduce scAI-SNP, a tool infer directly genomics train identified 4.5 million ancestry-informative single-nucleotide polymorphisms (SNPs) in 1000 Genomes Project dataset across 3201 individuals 26 population groups. For query set, scAI-SNP uses these SNPs compute contribution each groups donor cells were obtained. Using diverse sets with matched whole-genome sequencing data, show robust sparsity accurately consistently samples derived types tissues cancer cells, applied different modalities profiling assays, RNA-seq ATAC-seq. Finally, argue ensuring represent ancestry, ideally alongside ultimately important improved equitable health outcomes by accounting diversity.

Language: Английский

Citations

1

Population Performance and Individual Agreement of Coronary Artery Disease Polygenic Risk Scores DOI Creative Commons
Sarah Abramowitz, Kristin Boulier, Karl Keat

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 26, 2024

Abstract Importance Polygenic risk scores (PRSs) for coronary artery disease (CAD) are a growing clinical and commercial reality. Whether existing provide similar individual-level assessments of liability is critical consideration implementation that remains uncharacterized. Objective Characterize the reliability CAD PRSs perform equivalently at population level predicting risk. Design Cross-sectional Study. Setting All Us Research Program (AOU), Penn Medicine Biobank (PMBB), UCLA ATLAS Precision Health Biobank. Participants Volunteers diverse genetic backgrounds enrolled in AOU, PMBB, with available electronic health record genotyping data. Exposures from previously published new developed separately testing cohorts. Main Outcomes Measures Sets prediction were identified by comparing calibration discrimination (Brier score AUROC) generalized linear models prevalent using Bayesian analysis variance. Among performing scores, agreement between estimates was tested intraclass correlation (ICC) Light’s Kappa, measures inter-rater reliability. Results 50 calculated 171,095 AOU participants. When included model CAD, 48 had practically equivalent Brier AUROCs (region practical equivalence = 0.02). Across these 84% participants least one both top bottom quintile. Continuous individual predictions poor, an ICC 0.351 (95% CI; 0.349, 0.352). Agreement two statistically moderate, 0.649 0.646, 0.652). used to evaluate consistency assignment high-risk thresholds, did not exceed 0.56 (interpreted as ‘fair’) across scores. Repeating among 41,193 PMBB 50,748 yielded different sets which also lacked strong agreement. Conclusions Relevance three biobanks, performed produced unreliable estimates. Approaches must consider potential discordant otherwise indistinguishable

Language: Английский

Citations

1

Dual exposure-by-polygenic score interactions highlight disparities across social groups in the proportion needed to benefit DOI Creative Commons
Sini Nagpal, Greg Gibson

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 30, 2024

The transferability of polygenic scores across population groups is a major concern with respect to the equitable clinical implementation genomic medicine. Since genetic associations are identified relative mean, inevitably differences in disease or trait prevalence among social strata influence relationship between PGS and risk. Here we quantify magnitude PGS-by-Exposure (PGSxE) interactions for seven human diseases (coronary artery disease, type 2 diabetes, obesity thresholded body mass index waist-to-hip ratio, inflammatory bowel chronic kidney asthma) pairs 75 exposures White-British subset UK Biobank study (n=408,801). Across 24,198 PGSxE models, 746 (3.1%) were significant by two criteria, at least three-fold more than expected chance under each criterion. Predictive accuracy significantly improved high-risk including interaction terms effects as large those documented low ancestries. predominant mechanism PGS×E shown be amplification presence adverse such polyunsaturated fatty acids, mediators obesity, determinants ill health. We introduce notion proportion needed benefit (PNB) which cumulative number treat range show that typically this halved 70

Language: Английский

Citations

1

Real-time dynamic polygenic prediction for streaming data DOI
Justin D. Tubbs, Yu Chen, Rui Duan

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 14, 2024

Abstract Polygenic risk scores (PRSs) are promising tools for advancing precision medicine. However, existing PRS construction methods rely on static summary statistics derived from genome-wide association studies (GWASs), which often updated at lengthy intervals. As genetic data and health outcomes continuously being generated an ever-increasing pace, the current training deployment paradigm is suboptimal in maximizing prediction accuracy of PRSs incoming patients healthcare settings. Here, we introduce real-time PRS-CS (rtPRS-CS), enables online, dynamic refinement calibration as each new sample collected, without need to perform intermediate GWASs. Through extensive simulation studies, evaluate performance rtPRS-CS across various architectures sizes. Leveraging quantitative traits Mass General Brigham Biobank UK Biobank, show that can integrate massive streaming enhance over time. We further apply 22 schizophrenia cohorts 7 Asian regions, demonstrating clinical utility dynamically predicting stratifying disease diverse ancestries.

Language: Английский

Citations

0

Investigating the Role of Neighborhood Socioeconomic Status and Germline Genetics on Prostate Cancer Risk DOI Creative Commons
Jonathan Judd, Jeffrey P. Spence, Jonathan K. Pritchard

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 2, 2024

Background: Genetic factors play an important role in prostate cancer (PCa) development with polygenic risk scores (PRS) predicting disease across genetic ancestries. However, there are few convincing modifiable for PCa and little is known about their potential interaction risk. We analyzed incident cases (n=6,155) controls (n=98,257) of European African ancestry from the UK Biobank (UKB) cohort to evaluate neighborhood socioeconomic status (nSES)-and how it may interact PRS-on Methods: evaluated a multi-ancestry PRS containing 269 variants understand association germline genetics UKB. Using English Indices Deprivation, set validated metrics that quantify lack resources within geographical areas, we performed logistic regression investigate main effects interactions between nSES deprivation, PRS, PCa. Results: The was strongly associated (OR=2.04; 95%CI=2.00-2.09; P<0.001). Additionally, deprivation indices were inversely PCa: employment (OR=0.91; 95%CI=0.86-0.96; P<0.001), education (OR=0.94; 95%CI=0.83-0.98; health income showed heterogeneity indices, except Townsend Index (P=0.03) Conclusions: reaffirmed as factor identified domains influence detection potentially correlated environmental exposures These findings also suggest act independently.

Language: Английский

Citations

0

SPLENDID incorporates continuous genetic ancestry in biobank-scale data to improve polygenic risk prediction across diverse populations DOI
Tony Chen, Haoyu Zhang,

Rahul Mazumder

et al.

Published: Oct. 17, 2024

Polygenic risk scores are widely used in disease stratification, but their accuracy varies across diverse populations. Recent methods large-scale leverage multi-ancestry data to improve under-represented populations require labelling individuals by ancestry for prediction. This poses challenges practical use, as clinical practices typically not based on ancestry. We propose SPLENDID, a novel penalized regression framework biobank-scale data. Our method utilizes principal component interactions model genetic continuum within single prediction all ancestries, eliminating the need discrete labels. In extensive simulations and analyses of 9 traits from All Us Research Program (N=224,364) UK Biobank (N=340,140), SPLENDID significantly outperformed existing sparsity. By directly incorporating continuous training, stands valuable tool robust fairer implementation.

Language: Английский

Citations

0

Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions DOI Creative Commons
Hannah Klinkhammer, Christian Staerk, Carlo Maj

et al.

Statistics in Medicine, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 23, 2024

ABSTRACT Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common variants with low medium effect sizes. As genotype data are high‐dimensional in nature, it is crucial develop methods that can be applied large‐scale (large and large ). Many PRS tools aggregate univariate summary statistics genome‐wide association studies into single score. Recent advancements allow simultaneous modeling of variant effects individual‐level data. In this context, we introduced snpboost, an algorithm applies statistical boosting estimate via multivariable regression models. By processing iteratively batches, snpboost deal cohort Having solved the technical obstacles due dimensionality, methodological scope now broadened—focusing key objectives for clinical application PRS. Similar most has, so far, been restricted quantitative binary traits. Now, incorporate more advanced alternatives—targeted particular outcome. Adapting loss function extends framework further situations such as time‐to‐event count Furthermore, alternative functions continuous outcomes us focus not only mean conditional distribution but also other aspects may helpful stratification individual patients quantify prediction uncertainty, example, median or quantile regression. This work enhances fitting across multiple model classes previously unfeasible type.

Language: Английский

Citations

0