Cited by Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction

Analysis of polygenic risk score usage and performance in diverse human populations DOI

Laramie E. Duncan, Hanyang Shen, Bizu Gelaye

et al.

Nature Communications, Journal Year: 2019, Volume and Issue: 10(1)

Published: July 25, 2019

Abstract A historical tendency to use European ancestry samples hinders medical genetics research, including the of polygenic scores, which are individual-level metrics genetic risk. We analyze first decade scoring studies (2008–2017, inclusive), and find that 67% included exclusively participants another 19% only East Asian participants. Only 3.8% were among cohorts African, Hispanic, or Indigenous peoples. predictive performance ancestry-derived scores is lower in non-European (e.g. African samples: t = −5.97, df 24, p 3.7 × 10 −6 ), we demonstrate effects methodological choices score distributions for worldwide populations. These findings highlight need improved treatment linkage disequilibrium variant frequencies when applying ancestry, bolster rationale large-scale GWAS diverse human

Language: Английский

Citations

894

A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction DOI

Nicholas Pudjihartono, Tayaza Fadason, Andreas W. Kempa-Liehr

et al.

Frontiers in Bioinformatics, Journal Year: 2022, Volume and Issue: 2

Published: June 27, 2022

Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications machine is precision medicine, where disease risk predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to so-called “curse dimensionality” (i.e., extensively larger number features compared samples). Therefore, generalizability models benefits from feature selection, which aims extract only most “informative” remove noisy “non-informative,” irrelevant redundant features. In this article, we provide a general overview different selection methods, their advantages, disadvantages, use cases, focusing detection relevant SNPs) for prediction.

Language: Английский

Citations

406

Psychiatric genetics and the structure of psychopathology DOI

Jordan W. Smoller, Ole A. Andreassen, Howard J. Edenberg

et al.

Molecular Psychiatry, Journal Year: 2018, Volume and Issue: 24(3), P. 409 - 420

Published: Jan. 9, 2018

Language: Английский

Citations

360

Machine Learning SNP Based Prediction for Precision Medicine DOI

Daniel Ho, William Schierding, Melissa Wake

et al.

Frontiers in Genetics, Journal Year: 2019, Volume and Issue: 10

Published: March 27, 2019

In the past decade, precision genomics based medicine has emerged to provide tailored and effective healthcare for patients depending upon their genetic features. Genome Wide Association Studies have also identified population risk variants common complex diseases. order meet full promise of medicine, research is attempting leverage our increasing genomic understanding further develop personalized medical through ever more accurate disease prediction models. Polygenic scoring machine learning are two primary approaches prediction. Despite recent improvements, results polygenic remain limited due that currently used. By contrast, algorithms increased predictive abilities risk. This increase in from ability handle multi-dimensional data. Here, we an overview We highlight application developments describe how can lead improved prediction, which will help incorporate features into future healthcare. Finally, discuss models might manage by providing tissue-specific targets customized, preventive interventions.

Language: Английский

Citations

181

Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods DOI

Bo Li,

Nanxi Zhang,

You‐Gan Wang

et al.

Frontiers in Genetics, Journal Year: 2018, Volume and Issue: 9

Published: July 4, 2018

The analysis of large genomic data is hampered by issues such as a small number observations and predictive variables (commonly known "large P N"), high dimensionality or highly correlated structures. Machine learning methods are renowned for dealing with these problems. To date machine have been applied in Genome-Wide Association Studies identification candidate genes, epistasis detection, gene network pathway analyses prediction phenotypic values. However, the utility two methods, Gradient Boosting (GBM) Extreme Method (XgBoost), identifying subset SNP makers breeding values has never explored before. In this study, using 38,082 markers body weight phenotypes from 2,093 Brahman cattle (1,097 bulls discovery population 996 cows validation population), we examined efficiency three namely Random Forests (RF), GBM XgBoost, (a) top 400, 1,000, 3,000 ranked SNPs; (b) subsets SNPs to construct relationship matrices (GRMs) estimation (GEBVs). For comparison purposes, also calculated GEBVs (1) that were randomly selected evenly spaced across genome, (2) all SNPs. We found RF especially efficient direct links genes affecting growth trait. estimate accuracy (0.43), identified (0.42) (0.46) had similar those whole panel. performance was substantially better than genome (0.18-0.29). Of consistently outperformed XgBoost accuracy.

Language: Английский

Citations

168

Association mapping in plants in the post-GWAS genomics era DOI

P. K. Gupta, Pawan L. Kulwal, Vandana Jaiswal

et al.

Advances in genetics, Journal Year: 2019, Volume and Issue: unknown, P. 75 - 154

Published: Jan. 1, 2019

Language: Английский

Citations

149

An atlas of genetic scores to predict multi-omic traits DOI

Yu Xu, Scott C. Ritchie,

Yujian Liang

et al.

Nature, Journal Year: 2023, Volume and Issue: 616(7955), P. 123 - 131

Published: March 29, 2023

Language: Английский

Citations

Machine learning in rare disease DOI

Jineta Banerjee, Jaclyn Taroni, Robert J. Allaway

et al.

Nature Methods, Journal Year: 2023, Volume and Issue: 20(6), P. 803 - 814

Published: May 29, 2023

Language: Английский

Citations

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction DOI

Wei Zhou, Zhengxiao Yan, Liting Zhang

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: March 11, 2024

Abstract To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) 1918 accessions 42 k SNP (Single Nucleotide Polymorphism) polymorphic (genotype), this study systematically compared AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural Autoencoders MLP (multilayer perceptron) regression) seven machine (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) Random Forest LightGBM GPs (Gaussian processes) Decision Tree Polynomial regression). After being evaluated by valuation metrics: R 2 (R-squared), MAE (Mean Absolute Error), MSE Squared MAPE Percentage it was found that SVR, Regression, DBN, Autoencoder outperformed other could obtain better accuracy when they were used prediction. In assessment approaches, exemplified model, conducting analyses on feature importance gene ontology (GO) enrichment to provide comprehensive support. comprehensively comparing algorithms, no notable distinction observed ranking scores across namely Variable Ranking, Permutation, SHAP, Correlation Matrix, but SHAP value rich information genes with negative contributions, chosen selection. The results offer valuable insights into AI-mediated plant breeding, addressing challenges faced traditional programs. method developed has broad applicability prediction, minor QTL (quantitative trait loci) mining, smart-breeding systems, contributing significantly advancement AI-based transitioning from experience-based data-based breeding.

Language: Английский

Citations

Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data DOI

Justin Guinney, Tao Wang, Teemu D. Laajala

et al.

The Lancet Oncology, Journal Year: 2016, Volume and Issue: 18(1), P. 132 - 142

Published: Nov. 16, 2016

Language: Английский

Citations

139