Cited by Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction

Analysis of polygenic risk score usage and performance in diverse human populations DOI

Laramie E. Duncan, Hanyang Shen, Bizu Gelaye

и другие.

Nature Communications, Год журнала: 2019, Номер 10(1)

Опубликована: Июль 25, 2019

Abstract A historical tendency to use European ancestry samples hinders medical genetics research, including the of polygenic scores, which are individual-level metrics genetic risk. We analyze first decade scoring studies (2008–2017, inclusive), and find that 67% included exclusively participants another 19% only East Asian participants. Only 3.8% were among cohorts African, Hispanic, or Indigenous peoples. predictive performance ancestry-derived scores is lower in non-European (e.g. African samples: t = −5.97, df 24, p 3.7 × 10 −6 ), we demonstrate effects methodological choices score distributions for worldwide populations. These findings highlight need improved treatment linkage disequilibrium variant frequencies when applying ancestry, bolster rationale large-scale GWAS diverse human

Язык: Английский

Процитировано

901

A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction DOI

Nicholas Pudjihartono, Tayaza Fadason, Andreas W. Kempa-Liehr

и другие.

Frontiers in Bioinformatics, Год журнала: 2022, Номер 2

Опубликована: Июнь 27, 2022

Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications machine is precision medicine, where disease risk predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to so-called “curse dimensionality” (i.e., extensively larger number features compared samples). Therefore, generalizability models benefits from feature selection, which aims extract only most “informative” remove noisy “non-informative,” irrelevant redundant features. In this article, we provide a general overview different selection methods, their advantages, disadvantages, use cases, focusing detection relevant SNPs) for prediction.

Язык: Английский

Процитировано

429

Psychiatric genetics and the structure of psychopathology DOI

Jordan W. Smoller, Ole A. Andreassen, Howard J. Edenberg

и другие.

Molecular Psychiatry, Год журнала: 2018, Номер 24(3), С. 409 - 420

Опубликована: Янв. 9, 2018

Язык: Английский

Процитировано

363

Machine Learning SNP Based Prediction for Precision Medicine DOI

Daniel Ho, William Schierding, Melissa Wake

и другие.

Frontiers in Genetics, Год журнала: 2019, Номер 10

Опубликована: Март 27, 2019

In the past decade, precision genomics based medicine has emerged to provide tailored and effective healthcare for patients depending upon their genetic features. Genome Wide Association Studies have also identified population risk variants common complex diseases. order meet full promise of medicine, research is attempting leverage our increasing genomic understanding further develop personalized medical through ever more accurate disease prediction models. Polygenic scoring machine learning are two primary approaches prediction. Despite recent improvements, results polygenic remain limited due that currently used. By contrast, algorithms increased predictive abilities risk. This increase in from ability handle multi-dimensional data. Here, we an overview We highlight application developments describe how can lead improved prediction, which will help incorporate features into future healthcare. Finally, discuss models might manage by providing tissue-specific targets customized, preventive interventions.

Язык: Английский

Процитировано

181

Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods DOI

Bo Li,

Nanxi Zhang,

You‐Gan Wang

и другие.

Frontiers in Genetics, Год журнала: 2018, Номер 9

Опубликована: Июль 4, 2018

The analysis of large genomic data is hampered by issues such as a small number observations and predictive variables (commonly known "large P N"), high dimensionality or highly correlated structures. Machine learning methods are renowned for dealing with these problems. To date machine have been applied in Genome-Wide Association Studies identification candidate genes, epistasis detection, gene network pathway analyses prediction phenotypic values. However, the utility two methods, Gradient Boosting (GBM) Extreme Method (XgBoost), identifying subset SNP makers breeding values has never explored before. In this study, using 38,082 markers body weight phenotypes from 2,093 Brahman cattle (1,097 bulls discovery population 996 cows validation population), we examined efficiency three namely Random Forests (RF), GBM XgBoost, (a) top 400, 1,000, 3,000 ranked SNPs; (b) subsets SNPs to construct relationship matrices (GRMs) estimation (GEBVs). For comparison purposes, also calculated GEBVs (1) that were randomly selected evenly spaced across genome, (2) all SNPs. We found RF especially efficient direct links genes affecting growth trait. estimate accuracy (0.43), identified (0.42) (0.46) had similar those whole panel. performance was substantially better than genome (0.18-0.29). Of consistently outperformed XgBoost accuracy.

Язык: Английский

Процитировано

171

Association mapping in plants in the post-GWAS genomics era DOI

P. K. Gupta, Pawan L. Kulwal, Vandana Jaiswal

и другие.

Advances in genetics, Год журнала: 2019, Номер unknown, С. 75 - 154

Опубликована: Янв. 1, 2019

Язык: Английский

Процитировано

151

An atlas of genetic scores to predict multi-omic traits DOI

Yu Xu, Scott C. Ritchie,

Yujian Liang

и другие.

Nature, Год журнала: 2023, Номер 616(7955), С. 123 - 131

Опубликована: Март 29, 2023

Язык: Английский

Процитировано

Machine learning in rare disease DOI

Jineta Banerjee, Jaclyn Taroni, Robert J. Allaway

и другие.

Nature Methods, Год журнала: 2023, Номер 20(6), С. 803 - 814

Опубликована: Май 29, 2023

Язык: Английский

Процитировано

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction DOI

Wei Zhou, Zhengxiao Yan, Liting Zhang

и другие.

Scientific Reports, Год журнала: 2024, Номер 14(1)

Опубликована: Март 11, 2024

Abstract To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) 1918 accessions 42 k SNP (Single Nucleotide Polymorphism) polymorphic (genotype), this study systematically compared AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural Autoencoders MLP (multilayer perceptron) regression) seven machine (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) Random Forest LightGBM GPs (Gaussian processes) Decision Tree Polynomial regression). After being evaluated by valuation metrics: R 2 (R-squared), MAE (Mean Absolute Error), MSE Squared MAPE Percentage it was found that SVR, Regression, DBN, Autoencoder outperformed other could obtain better accuracy when they were used prediction. In assessment approaches, exemplified model, conducting analyses on feature importance gene ontology (GO) enrichment to provide comprehensive support. comprehensively comparing algorithms, no notable distinction observed ranking scores across namely Variable Ranking, Permutation, SHAP, Correlation Matrix, but SHAP value rich information genes with negative contributions, chosen selection. The results offer valuable insights into AI-mediated plant breeding, addressing challenges faced traditional programs. method developed has broad applicability prediction, minor QTL (quantitative trait loci) mining, smart-breeding systems, contributing significantly advancement AI-based transitioning from experience-based data-based breeding.

Язык: Английский

Процитировано

Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data DOI

Justin Guinney, Tao Wang, Teemu D. Laajala

и другие.

The Lancet Oncology, Год журнала: 2016, Номер 18(1), С. 132 - 142

Опубликована: Ноя. 16, 2016

Язык: Английский

Процитировано

139