Genomic insights for personalised care in lung cancer and smoking cessation: motivating at-risk individuals toward evidence-based health practices DOI Creative Commons
Tony Chen, Giang Pham, Louis Fox

et al.

EBioMedicine, Journal Year: 2024, Volume and Issue: 110, P. 105441 - 105441

Published: Nov. 8, 2024

Language: Английский

PennPRS: a centralized cloud computing platform for efficient polygenic risk score training in precision medicine DOI Creative Commons
Jin Jin, Bingxuan Li, Xiyao Wang

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 10, 2025

Polygenic risk scores (PRS) are becoming increasingly vital for prediction and stratification in precision medicine. However, PRS model training presents significant challenges broader adoption of PRS, including limited access to computational resources, difficulties implementing advanced methods, availability privacy concerns over individual-level genetic data. Cloud computing provides a promising solution with centralized data resources. Here we introduce PennPRS ( https://pennprs.org ), scalable cloud platform online We developed novel pseudo-training algorithms multiple methods ensemble approaches, enabling without requiring These were rigorously validated through extensive simulations large-scale real analyses involving 6,000 phenotypes across various sources. supports single- multi-ancestry seven allowing users upload their own or query from more than 27,000 datasets the GWAS Catalog, submit jobs, download trained models. Additionally, applied our pipeline train models 8,000 made weights publicly accessible. In summary, improve accessibility applications reduce disparities resources global research community.

Language: Английский

Citations

0

Opportunities and challenges of local ancestry in genetic association analyses DOI
Quan Sun, Andréa R. V. R. Horimoto, Brian Chen

et al.

The American Journal of Human Genetics, Journal Year: 2025, Volume and Issue: 112(4), P. 727 - 740

Published: April 1, 2025

Language: Английский

Citations

0

The PRIMED Consortium: Reducing disparities in polygenic risk assessment DOI Creative Commons
Iftikhar J. Kullo, Matthew P. Conomos, Sarah C. Nelson

et al.

The American Journal of Human Genetics, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 1, 2024

Language: Английский

Citations

3

Analysis-ready VCF at Biobank scale using Zarr DOI Creative Commons
Eric Czech, Timothy R. Millar,

Will Tyler

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: June 12, 2024

Abstract Background Variant Call Format (VCF) is the standard file format for interchanging genetic variation data and associated quality control metrics. The usual row-wise encoding of VCF model (either as text or packed binary) emphasises efficient retrieval all a given variant, but accessing on field sample basis inefficient. Biobank scale datasets currently available consist hundreds thousands whole genomes terabytes compressed VCF. Row-wise storage fundamentally unsuitable more scalable approach needed. Results Zarr storing multi-dimensional that widely used across sciences, ideally suited to massively parallel processing. We present specification, an using Zarr, along with fundamental software infrastructure reliable conversion at scale. show how this far than based approaches, competitive specialised methods genotype in terms compression ratios single-threaded calculation performance. case studies subsets three large human (Genomics England: n =78,195; Our Future Health: =651,050; All Us: =245,394) genome Norway Spruce ( =1,063) SARS-CoV-2 =4,484,157). demonstrate potential enable new generation high-performance cost-effective applications via illustrative examples cloud computing GPUs. Conclusions Large row-encoded files are major bottleneck current research, processing these incurs substantial cost. building widely-used, open-source technologies has greatly reduce costs, may diverse ecosystem next-generation tools analysing directly from cloud-based object stores, while maintaining compatibility existing file-oriented workflows. Key Points supported, underlying entrenched bioinformatics pipelines. (or inherently inefficient large-scale provides solution, by fields separately chunk-compressed binary format.

Language: Английский

Citations

1

Genomic insights for personalised care in lung cancer and smoking cessation: motivating at-risk individuals toward evidence-based health practices DOI Creative Commons
Tony Chen, Giang Pham, Louis Fox

et al.

EBioMedicine, Journal Year: 2024, Volume and Issue: 110, P. 105441 - 105441

Published: Nov. 8, 2024

Language: Английский

Citations

0