Artificial Intelligence in Medicine, Journal Year: 2017, Volume and Issue: 85, P. 43 - 49
Published: Sept. 22, 2017
Language: Английский
Artificial Intelligence in Medicine, Journal Year: 2017, Volume and Issue: 85, P. 43 - 49
Published: Sept. 22, 2017
Language: Английский
Machine Learning, Journal Year: 2019, Volume and Issue: 109(2), P. 251 - 277
Published: Oct. 23, 2019
In phenotype prediction the physical characteristics of an organism are predicted from knowledge its genotype and environment. Such studies, often called genome-wide association highest societal importance, as they central importance to medicine, crop-breeding, etc. We investigated three problems: one simple clean (yeast), other two complex real-world (rice wheat). compared standard machine learning methods; elastic net, ridge regression, lasso random forest, gradient boosting machines (GBM), support vector (SVM), with state-of-the-art classical statistical genetics genomic BLUP a two-step sequential method based on linear regression. Additionally, using yeast data, we how performance varied complexity biological mechanism, amount observational noise, number examples, missing use different data representations. found that for almost all phenotypes considered, methods outperformed genetics. On problem, most successful was GBM, followed by greater mechanistic GBM best, while in simpler cases superior. wheat rice studies best were SVM BLUP. The robust presence forests. perform well problems where there population structure. This suggests need be refined include structure information when this is present. conclude application holds great promise, but determining which likely any given problem elusive non-trivial.
Language: Английский
Citations
125Molecular Psychiatry, Journal Year: 2020, Volume and Issue: 26(1), P. 70 - 79
Published: June 26, 2020
Language: Английский
Citations
120Scientific Reports, Journal Year: 2019, Volume and Issue: 9(1)
Published: July 17, 2019
Abstract Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the architecture of trait remains largely unknown. The recent development machine learning (ML) approaches incited us to apply them classify healthy and diseased people according their genomic information. Immunochip dataset containing 18,227 CD patients 34,050 controls enrolled genotyped by international Inflammatory Bowel consortium (IIBDGC) has re-analyzed set ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) artificial neural networks (NN). main score used compare methods was Area Under ROC Curve (AUC) statistics. impact quality control (QC), imputing coding on LR results showed that QC imputation missing genotypes may artificially increase scores. At opposite, neither patient/control ratio nor marker preselection or strategies significantly affected results. methods, including Lasso, Ridge ElasticNet provided similar with maximum AUC 0.80. GBT like XGBoost, LightGBM CatBoost, together dense NN one hidden layers, values, suggesting limited epistatic effects in trait. detected near all variants previously GWAS among best predictors plus additional lower effects. robustness complementarity different are also studied. Compared LR, non-linear models such as provide robust complementary identify markers.
Language: Английский
Citations
95Trends in Genetics, Journal Year: 2016, Volume and Issue: 33(1), P. 34 - 45
Published: Dec. 6, 2016
Language: Английский
Citations
92The American Journal of Human Genetics, Journal Year: 2017, Volume and Issue: 101(2), P. 218 - 226
Published: July 27, 2017
Language: Английский
Citations
90International Journal of Molecular Sciences, Journal Year: 2020, Volume and Issue: 21(5), P. 1703 - 1703
Published: March 2, 2020
Recent studies have led to considerable advances in the identification of genetic variants associated with type 1 and 2 diabetes. An approach for converting data into a predictive measure disease susceptibility is add risk effects loci polygenic score. In order summarize recent findings, we conducted systematic review comparing accuracy scores developed during last two decades. We selected 15 from three databases (Scopus, Web Science PubMed) enrolled this review. identified that discriminate between diabetes patients healthy people, one diabetes, monogenic nine people. Prediction was assessed by area under curve. The actual benefits, potential obstacles possible solutions implementation clinical practice were also discussed. Develop strategies establish validity creating framework interpretation findings their translation evidence, are way demonstrate utility medical practice.
Language: Английский
Citations
82Scientific Reports, Journal Year: 2019, Volume and Issue: 9(1)
Published: Oct. 25, 2019
Abstract We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, L1-penalized regression (also known as LASSO) on case-control data UK Biobank. Among the conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Testicular Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, Heart Attack. obtain values area under receiver operating characteristic curves (AUC) in range ~0.58–0.71 SNP alone. Substantially higher predictor AUCs obtained when incorporating additional variables such age sex. Some alone sufficient to identify outliers (e.g., 99th percentile score, or PGS) with 3–8 times than typical individuals. validate out-of-sample eMERGE dataset, also different ancestry subgroups within Biobank population. Our results indicate that substantial improvements predictive power attainable training sets larger case populations. anticipate rapid improvement genomic prediction more become available analysis.
Language: Английский
Citations
78Journal of Personalized Medicine, Journal Year: 2022, Volume and Issue: 12(2), P. 166 - 166
Published: Jan. 26, 2022
The future development of personalized medicine depends on a vast exchange data from different sources, as well harmonized integrative analysis large-scale clinical health and sample data. Computational-modelling approaches play key role in the underlying molecular processes pathways that characterize human biology, but they also lead to more profound understanding mechanisms factors drive diseases; hence, allow treatment strategies are guided by central questions. However, despite growing popularity computational-modelling stakeholder communities, there still many hurdles overcome for their routine implementation future. Especially integration heterogeneous multiple sources types challenging tasks require clear guidelines have comply with high ethical legal standards. Here, we discuss most relevant computational models detail can be considered best-practice application care. We define specific challenges provide applicable recommendations study design, acquisition, operation model validation translation other research areas.
Language: Английский
Citations
62G3 Genes Genomes Genetics, Journal Year: 2016, Volume and Issue: 6(8), P. 2611 - 2616
Published: June 25, 2016
Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was evaluate alternative scenarios implement for yield components soybean (Glycine max L. merr). We used a nested association panel with cross validation the impacts of training population size, genotyping density, prediction model on accuracy prediction. results indicate that size factor most relevant improvement genome-wide prediction, greatest observed sets up 2000 individuals. discuss assumptions influence choice model. Although models had minor accuracy, robust combination reproducing kernel Hilbert space regression BayesB. Higher density marginally improved accuracy. study finds programs seeking efficient soybeans would best allocate resources by investing representative set.
Language: Английский
Citations
86Nature Communications, Journal Year: 2016, Volume and Issue: 7(1)
Published: Aug. 23, 2016
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment the utility SNP data for predicting efficacy RA patients was performed context a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled comparative evaluation predictions developed by 73 research groups using most comprehensive available and covering wide range state-of-the-art modelling methodologies. Despite significant genetic heritability estimate non-response trait (h(2)=0.18, P value=0.02), no contribution prediction accuracy observed. Results formally confirm expectations rheumatology community information does not significantly improve predictive performance relative standard clinical traits, thereby justifying refocusing future efforts on collection other data.
Language: Английский
Citations
83