SQUiD: ultra-secure storage and analysis of genetic data for the advancement of precision medicine DOI Creative Commons
Jacob Blindenbach, Jiayi Kang,

Seungwan Hong

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: Dec. 18, 2024

Cloud computing allows storing the ever-growing genotype-phenotype datasets crucial for precision medicine. Due to sensitive nature of this data and varied laws regulations, additional security measures are needed ensure privacy. We develop SQUiD, a secure queryable database analyzing data. SQUiD storage querying in low-security, low-cost public cloud using homomorphic encryption multi-client setting. demonstrate SQUiD's practical usability scalability synthetic UK Biobank

Language: Английский

The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity DOI Creative Commons
María Cerezo, Elliot Sollis, Yue Ji

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 53(D1), P. D998 - D1005

Published: Nov. 12, 2024

The NHGRI-EBI GWAS Catalog serves as a vital resource for the genetic research community, providing access to most comprehensive database of human results. Currently, it contains close 7 000 publications >15 traits, from which more than 625 lead associations have been curated. Additionally, 85 full genome-wide summary statistics datasets-containing association data all variants in analysis-are available downstream analyses such meta-analysis, fine-mapping, Mendelian randomisation or development polygenic risk scores. As centralised repository results, sets and implements standards submission harmonisation, encourages use consistent descriptors samples methodologies. We share processes vocabulary with PGS Catalog, improving interoperability growing user group. Here, we describe latest changes content, improvements our interface, implementation GWAS-SSF standard format statistics. address challenges handling rapid increase large-scale molecular quantitative trait need sensitivity population cohort while maintaining reusability.

Language: Английский

Citations

13

Evaluating Performance and Agreement of Coronary Heart Disease Polygenic Risk Scores DOI
Sarah Abramowitz, Kristin Boulier, Karl Keat

et al.

JAMA, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 16, 2024

Importance Polygenic risk scores (PRSs) for coronary heart disease (CHD) are a growing clinical and commercial reality. Whether existing provide similar individual-level assessments of susceptibility remains incompletely characterized. Objective To characterize the agreement CHD PRSs that perform similarly at population level. Design, Setting, Participants Cross-sectional study participants from diverse backgrounds enrolled in All Us Research Program (AOU), Penn Medicine BioBank (PMBB), University California, Los Angeles (UCLA) ATLAS Precision Health Biobank with electronic health record genotyping data. Exposures published new developed separately testing samples. Main Outcomes Measures performed population-level prediction were identified by comparing calibration discrimination models prevalent CHD. Individual-level was tested intraclass correlation coefficient (ICC) Light κ. Results A total 48 calculated 171 095 AOU participants. The mean (SD) age 56.4 (16.8) years. 104 947 (61.3%) female. 35 590 (20.8%) most genetically to an African reference population, 29 801 (17.4%) admixed American 100 493 (58.7%) European remaining Central/South Asian, East Middle Eastern populations. There 17 589 (10.3%) 153 506 without (89.7%) When included model CHD, 46 had practically equivalent Brier area under receiver operator curves (region practical equivalence ±0.02). Twenty percent least 1 score both top bottom 5% risk. Continuous individual predictions poor (ICC, 0.373 [95% CI, 0.372-0.375]). κ, used evaluate consistency assignment, did not exceed 0.56. Analysis among 41 193 PMBB 53 092 yielded different sets scores, which also lacked agreement. Conclusions Relevance level demonstrated highly variable estimates Recognizing may generate incongruent estimates, effective implementation will require refined statistical methods quantify uncertainty strategies communicate this patients clinicians.

Language: Английский

Citations

8

Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores DOI Open Access
Chenyin Gao, Justin D. Tubbs, Yi Han

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 6, 2025

The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting most suitable PRS model a specific target population remains challenging, due to issues such as limited transferability, het-erogeneity, scarcity observed phenotype in settings. Ensemble learning offers promising avenue enhance predictive accuracy genetic assessments, but existing methods often rely on or additional genome-wide association studies (GWAS) from optimize ensemble weights, limiting utility real-time implementation. Here, we present UN supervised en Semble ( UNSemblePRS ), an unsupervised framework, that combines without requiring summaries population. Unlike traditional approaches, aggregates based prediction concordance across curated subset candidate models. We evaluated using both continuous binary traits All Us database, demonstrating its scalability robust performance diverse populations. These results underscore accessible tool integrating contexts, offering broad applicability continues expand.

Language: Английский

Citations

0

pandasPGS: a Python package for easy retrieval of Polygenic Score Catalog data DOI Creative Commons
Zheyu Zhang,

Jintong Zhou,

Tianze Cao

et al.

PeerJ, Journal Year: 2025, Volume and Issue: 13, P. e18985 - e18985

Published: Feb. 12, 2025

Background The Polygenic Score (PGS) Catalog is a public database dedicated to storing polygenic risk scores. To date, the has included 5,022 scores associated with 656 different traits. Although PGS offers an official resource representational state transfer (REST) application programming interface (API), there no ready-made data client tailored for any specific language. Researchers are thus required invest time in becoming familiar structure of REST API and implement corresponding their language choice integrate into analytical workflows. Methods In this work we introduce pandasPGS, Python package that provides programmatic access data. After being called by researcher, pandasPGS will automatically select appropriate uniform locator (URL) request based on name parameters function, merge obtained pagination addition, also further pre-processing functions. According data, it can convert several hierarchical pandas.DataFrame objects, which convenient analysis researchers. Results This tool allows researchers easily analyze using Python. It alleviates cost learn APIs Catalog. source codes be found https://github.com/tianzelab/pandaspgs , documentations https://tianzelab.github.io/pandaspgs/ .

Language: Английский

Citations

0

Atlas of genetic and phenotypic associations across 42 female reproductive health diagnoses DOI
Natàlia Pujol‐Gualdo, Jelisaveta Džigurski, Valentina Rukins

et al.

Nature Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: March 11, 2025

Language: Английский

Citations

0

Somatic and Stem Cell Bank to study the contribution of African ancestry to dementia: African iPSC Initiative DOI Creative Commons
Mahmoud Bukar Maina, Murtala Bindawa Isah, Jacob Marsh

et al.

Alzheimer s & Dementia, Journal Year: 2025, Volume and Issue: 21(4)

Published: April 1, 2025

Africa, home to 1.4 billion people and the highest genetic diversity globally, harbors unique variants crucial for understanding complex diseases like neurodegenerative disorders. However, African populations remain underrepresented in induced pluripotent stem cell (iPSC) collections, limiting exploration of population-specific disease mechanisms therapeutic discoveries. To address this gap, we established an open-access Somatic Stem Cell Bank. In initial phase, generated 10 rigorously characterized iPSC lines from fibroblasts representing five Nigerian ethnic groups both sexes. These underwent extensive profiling pluripotency, stability, differentiation potential, Alzheimer's Parkinson's risk variants. Clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated protein 9 technology was used introduce frontotemporal dementia-associated MAPT mutations (P301L R406W). This collection offers a renewable, genetically diverse resource investigate pathogenicity populations, facilitating breakthroughs research, drug discovery, regenerative medicine. We were characterized. dementia-causing mutations. The Bank is research.

Language: Английский

Citations

0

The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity DOI Creative Commons
María Cerezo, Elliot Sollis, Yue Ji

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 23, 2024

Abstract The NHGRI-EBI GWAS Catalog serves as a vital resource for the genetic research community, providing access to most comprehensive database of human results. Currently, it contains close 7,000 publications more than 15,000 traits, from which 625,000 lead associations have been curated. Additionally, 85,000 full genome-wide summary statistics datasets - containing association data all variants in analysis are available downstream analyses such meta-analysis, fine-mapping, Mendelian randomisation or development polygenic risk scores. As centralised repository results, sets and implements standards submission harmonisation, encourages use consistent descriptors samples methodologies. We share processes vocabulary with PGS Catalog, improving interoperability growing user group. Here, we describe latest changes content, improvements our interface, implementation GWAS-SSF standard format statistics. address challenges handling rapid increase large-scale molecular quantitative trait need sensitivity population cohort while maintaining reusability.

Language: Английский

Citations

1

Influence of BMI-associated genetic variants and metabolic risk factors on weight loss with semaglutide: a longitudinal clinico-genomic cohort study DOI Creative Commons
Matthew E. Levy, Natalie Telis, Kelly M. Schiabor Barrett

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 2, 2024

Abstract Background Individual weight loss response to the GLP-1 receptor agonist semaglutide varies considerably, with many possible contributing factors. Leveraging multiple clinico-genomic cohorts, we analyzed differences in trajectories according patient characteristics, including a polygenic score (PGS) and metabolic risk factors, initiators BMI ≥27 kg/m 2 . Methods This longitudinal study utilized clinical-grade exome sequencing electronic health record data from six U.S. cohorts within Helix Research Network (n=134,806). A PGS was calculated using 26,941 variants. Twelve-month were modeled mixed effects models, associations demographics, PGS, comorbidities, medications, laboratory results evaluated. Findings Among 1,923 users, mean pretreatment 38.4 For those on doses ≥1.7 mg, body reduction 7.3% at 6 months 9.9% 12 months. Over months, low associated an adjusted 1.5% 1.8% additional compared intermediate high respectively (both p<0.01). Male sex, type diabetes, hypertension, obstructive sleep apnea, non-alcoholic fatty liver disease each 1.2%-1.9% less (all p<0.05). In 1%-increase hemoglobin A1c 0.6% (p=0.0019). Interpretation adults overweight or obesity, lower genetic predisposition obesity is linked greater semaglutide. Additionally, significantly impacts drug’s effectiveness. These findings underscore importance of precision medicine management. Funding Renown Health Foundation. Nevada Governor’s Office Economic Development. HealthPartners.

Language: Английский

Citations

1

EMBL’s European Bioinformatics Institute (EMBL-EBI) in 2024 DOI Creative Commons
Matthew Thakur, Cath Brooksbank, ROBERT FINN

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 53(D1), P. D10 - D19

Published: Nov. 28, 2024

The European Molecular Biology Laboratory's Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources public biomolecular data. Based at Wellcome Genome Campus in Hinxton, UK, EMBL-EBI six sites Laboratory, Europe's only intergovernmental life sciences organization. This overview summarizes latest developments services that data resources provide to scientific communities globally (https://www.ebi.ac.uk/services).

Language: Английский

Citations

1

Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors DOI Creative Commons
Chelsea C. Cataldo‐Ramirez, Meng Lin,

Andrew P. McMahon

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 29, 2024

ABSTRACT Genome-wide association studies (GWAS) and polygenic score (PGS) development are typically constrained by the data available in biobank repositories which European cohorts vastly overrepresented. Here, we increase utility of non-European participant within UK Biobank (UKB) characterizing genetic affinities UKB participants who self-identify as Bangladeshi, Indian, Pakistani, “White Asian” (WA), “Any Other (AOA), towards creating a more robust South Asian sample size for future analyses. We assess relationships between structure self-selected ethnic identities resulting consistent patterns clustering used to train support vector machine (SVM). The SVM model was utilized reassign n = 1,853 AOA WA at subcontinental level, group 1,381 additional participants. then leverage these samples GWAS performance PGS development. further include environmental covariates height implementing rigorous covariate selection procedure, compare outputs two models: null env . show that derived from environmentally adjusted yields comparable prediction models developed with an order magnitude larger training dataset ( R 2 =0.021 vs 0.026). Models 7 - 8 double variance explained alone. In summary, demonstrate how can be improved leveraging ambiguous ethnicity codes, ancestry matched imputation panels, including covariates.

Language: Английский

Citations

0