Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction DOI
Beatriz López, Ferran Torrent‐Fontbona, Ramón Viñas

et al.

Artificial Intelligence in Medicine, Journal Year: 2017, Volume and Issue: 85, P. 43 - 49

Published: Sept. 22, 2017

Language: Английский

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat DOI Creative Commons
Nastasiya F. Grinberg, Oghenejokpeme I. Orhobor, Ross D. King

et al.

Machine Learning, Journal Year: 2019, Volume and Issue: 109(2), P. 251 - 277

Published: Oct. 23, 2019

In phenotype prediction the physical characteristics of an organism are predicted from knowledge its genotype and environment. Such studies, often called genome-wide association highest societal importance, as they central importance to medicine, crop-breeding, etc. We investigated three problems: one simple clean (yeast), other two complex real-world (rice wheat). compared standard machine learning methods; elastic net, ridge regression, lasso random forest, gradient boosting machines (GBM), support vector (SVM), with state-of-the-art classical statistical genetics genomic BLUP a two-step sequential method based on linear regression. Additionally, using yeast data, we how performance varied complexity biological mechanism, amount observational noise, number examples, missing use different data representations. found that for almost all phenotypes considered, methods outperformed genetics. On problem, most successful was GBM, followed by greater mechanistic GBM best, while in simpler cases superior. wheat rice studies best were SVM BLUP. The robust presence forests. perform well problems where there population structure. This suggests need be refined include structure information when this is present. conclude application holds great promise, but determining which likely any given problem elusive non-trivial.

Language: Английский

Citations

125

Machine learning for genetic prediction of psychiatric disorders: a systematic review DOI
Matthew Bracher‐Smith, Karen Crawford, Valentina Escott‐Price

et al.

Molecular Psychiatry, Journal Year: 2020, Volume and Issue: 26(1), P. 70 - 79

Published: June 26, 2020

Language: Английский

Citations

120

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data DOI Creative Commons
Alberto Romagnoni,

Simon Jégou,

Kristel Van Steen

et al.

Scientific Reports, Journal Year: 2019, Volume and Issue: 9(1)

Published: July 17, 2019

Abstract Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the architecture of trait remains largely unknown. The recent development machine learning (ML) approaches incited us to apply them classify healthy and diseased people according their genomic information. Immunochip dataset containing 18,227 CD patients 34,050 controls enrolled genotyped by international Inflammatory Bowel consortium (IIBDGC) has re-analyzed set ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) artificial neural networks (NN). main score used compare methods was Area Under ROC Curve (AUC) statistics. impact quality control (QC), imputing coding on LR results showed that QC imputation missing genotypes may artificially increase scores. At opposite, neither patient/control ratio nor marker preselection or strategies significantly affected results. methods, including Lasso, Ridge ElasticNet provided similar with maximum AUC 0.80. GBT like XGBoost, LightGBM CatBoost, together dense NN one hidden layers, values, suggesting limited epistatic effects in trait. detected near all variants previously GWAS among best predictors plus additional lower effects. robustness complementarity different are also studied. Compared LR, non-linear models such as provide robust complementary identify markers.

Language: Английский

Citations

95

Mining the Unknown: Assigning Function to Noncoding Single Nucleotide Polymorphisms DOI
Sierra S. Nishizaki, Alan P. Boyle

Trends in Genetics, Journal Year: 2016, Volume and Issue: 33(1), P. 34 - 45

Published: Dec. 6, 2016

Language: Английский

Citations

92

Leveraging Multi-ethnic Evidence for Risk Assessment of Quantitative Traits in Minority Populations DOI Creative Commons
Marc Coram, Huaying Fang,

Sophie I. Candille

et al.

The American Journal of Human Genetics, Journal Year: 2017, Volume and Issue: 101(2), P. 218 - 226

Published: July 27, 2017

Language: Английский

Citations

90

Systematic Review of Polygenic Risk Scores for Type 1 and Type 2 Diabetes DOI Open Access
Felipe Padilla-Martínez, François Collin,

Mirosław Kwaśniewski

et al.

International Journal of Molecular Sciences, Journal Year: 2020, Volume and Issue: 21(5), P. 1703 - 1703

Published: March 2, 2020

Recent studies have led to considerable advances in the identification of genetic variants associated with type 1 and 2 diabetes. An approach for converting data into a predictive measure disease susceptibility is add risk effects loci polygenic score. In order summarize recent findings, we conducted systematic review comparing accuracy scores developed during last two decades. We selected 15 from three databases (Scopus, Web Science PubMed) enrolled this review. identified that discriminate between diabetes patients healthy people, one diabetes, monogenic nine people. Prediction was assessed by area under curve. The actual benefits, potential obstacles possible solutions implementation clinical practice were also discussed. Develop strategies establish validity creating framework interpretation findings their translation evidence, are way demonstrate utility medical practice.

Language: Английский

Citations

82

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer DOI Creative Commons
Louis Lello, Timothy G. Raben, Soke Yuen Yong

et al.

Scientific Reports, Journal Year: 2019, Volume and Issue: 9(1)

Published: Oct. 25, 2019

Abstract We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, L1-penalized regression (also known as LASSO) on case-control data UK Biobank. Among the conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Testicular Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, Heart Attack. obtain values area under receiver operating characteristic curves (AUC) in range ~0.58–0.71 SNP alone. Substantially higher predictor AUCs obtained when incorporating additional variables such age sex. Some alone sufficient to identify outliers (e.g., 99th percentile score, or PGS) with 3–8 times than typical individuals. validate out-of-sample eMERGE dataset, also different ancestry subgroups within Biobank population. Our results indicate that substantial improvements predictive power attainable training sets larger case populations. anticipate rapid improvement genomic prediction more become available analysis.

Language: Английский

Citations

78

Computational Models for Clinical Applications in Personalized Medicine—Guidelines and Recommendations for Data Integration and Model Validation DOI Open Access

Catherine Bjerre Collin,

Tom Gebhardt,

Martin Golebiewski

et al.

Journal of Personalized Medicine, Journal Year: 2022, Volume and Issue: 12(2), P. 166 - 166

Published: Jan. 26, 2022

The future development of personalized medicine depends on a vast exchange data from different sources, as well harmonized integrative analysis large-scale clinical health and sample data. Computational-modelling approaches play key role in the underlying molecular processes pathways that characterize human biology, but they also lead to more profound understanding mechanisms factors drive diseases; hence, allow treatment strategies are guided by central questions. However, despite growing popularity computational-modelling stakeholder communities, there still many hurdles overcome for their routine implementation future. Especially integration heterogeneous multiple sources types challenging tasks require clear guidelines have comply with high ethical legal standards. Here, we discuss most relevant computational models detail can be considered best-practice application care. We define specific challenges provide applicable recommendations study design, acquisition, operation model validation translation other research areas.

Language: Английский

Citations

62

Assessing Predictive Properties of Genome-Wide Selection in Soybeans DOI Creative Commons
Alencar Xavier, William M. Muir, Katy Martin Rainey

et al.

G3 Genes Genomes Genetics, Journal Year: 2016, Volume and Issue: 6(8), P. 2611 - 2616

Published: June 25, 2016

Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was evaluate alternative scenarios implement for yield components soybean (Glycine max L. merr). We used a nested association panel with cross validation the impacts of training population size, genotyping density, prediction model on accuracy prediction. results indicate that size factor most relevant improvement genome-wide prediction, greatest observed sets up 2000 individuals. discuss assumptions influence choice model. Although models had minor accuracy, robust combination reproducing kernel Hilbert space regression BayesB. Higher density marginally improved accuracy. study finds programs seeking efficient soybeans would best allocate resources by investing representative set.

Language: Английский

Citations

86

Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis DOI Creative Commons
Solveig K. Sieberts,

Fan Zhu,

Javier Garcı́a-Garcı́a

et al.

Nature Communications, Journal Year: 2016, Volume and Issue: 7(1)

Published: Aug. 23, 2016

Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment the utility SNP data for predicting efficacy RA patients was performed context a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled comparative evaluation predictions developed by 73 research groups using most comprehensive available and covering wide range state-of-the-art modelling methodologies. Despite significant genetic heritability estimate non-response trait (h(2)=0.18, P value=0.02), no contribution prediction accuracy observed. Results formally confirm expectations rheumatology community information does not significantly improve predictive performance relative standard clinical traits, thereby justifying refocusing future efforts on collection other data.

Language: Английский

Citations

83