A Comparative Study of Classification-Based Machine Learning Methods for Novel Disease Gene Prediction DOI
Duc‐Hau Le,

Nguyễn Xuân Hoài,

Yung‐Keun Kwon

et al.

Advances in intelligent systems and computing, Journal Year: 2014, Volume and Issue: unknown, P. 577 - 588

Published: Sept. 29, 2014

Language: Английский

Machine learning applications in genetics and genomics DOI
Maxwell W. Libbrecht, William Stafford Noble

Nature Reviews Genetics, Journal Year: 2015, Volume and Issue: 16(6), P. 321 - 332

Published: May 7, 2015

Language: Английский

Citations

1635

What is Machine Learning? A Primer for the Epidemiologist DOI

Qifang Bi,

Katherine E. Goodman, Joshua Kaminsky

et al.

American Journal of Epidemiology, Journal Year: 2019, Volume and Issue: unknown

Published: Aug. 15, 2019

Abstract Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid growing focus on “Big Data,” it offers epidemiologists new tools tackle problems for which classical methods are not well-suited. In order critically evaluate value integrating machine algorithms and existing methods, however, essential address language technical barriers between two fields can make difficult read assess studies. Here, we provide an overview concepts terminology used in literature, encompasses diverse set with goals ranging from prediction classification clustering. We brief introduction 5 common 4 ensemble-based approaches. then summarize applications techniques published literature. recommend approaches incorporate research discuss opportunities challenges methods.

Language: Английский

Citations

446

Developing a practical suicide risk prediction model for targeting high‐risk patients in the Veterans health Administration DOI Open Access
Ronald C. Kessler,

Irving Hwang,

Claire A. Hoffmire

et al.

International Journal of Methods in Psychiatric Research, Journal Year: 2017, Volume and Issue: 26(3)

Published: July 4, 2017

The US Veterans Health Administration (VHA) has begun using predictive modeling to identify at high suicide risk target care. Initial analyses are reported here.A penalized logistic regression model was compared with an earlier proof-of-concept model. Exploratory then considered commonly-used machine learning algorithms. Analyses were based on electronic medical records for all 6,360 individuals classified in the National Death Index as having died by fiscal years 2009-2011 who used VHA services year of their death or prior and a 1% probability sample time-matched service users alive index date (n = 2,112,008).A 61 predictors had sensitivity comparable (which 381 predictors) thresholds. algorithms relatively similar sensitivities, highest being Bayesian additive trees, 10.7% suicides occurred among 1.0% predicted 28.1% 5.0% risk.Based these results, is initial intervention implementation. paper concludes discussion other practical issues that might be explored increase performance.

Language: Английский

Citations

173

Using patient self-reports to study heterogeneity of treatment effects in major depressive disorder DOI Open Access
Ronald C. Kessler, Hanna M. van Loo, Klaas J. Wardenaar

et al.

Epidemiology and Psychiatric Sciences, Journal Year: 2016, Volume and Issue: 26(1), P. 22 - 36

Published: Jan. 26, 2016

Backgrounds. Clinicians need guidance to address the heterogeneity of treatment responses patients with major depressive disorder (MDD). While prediction schemes based on symptom clustering and biomarkers have so far not yielded results sufficient strength inform clinical decision-making, big data predictive analytic models might be more practically useful. Method. We review evidence suggesting that equations symptoms other easily-assessed features found in previous research predict MDD outcomes provide a foundation for developing decision support could help clinicians select optimal (personalised) treatments. These methods also useful targeting patient subsamples expensive biomarker assessments. Results. Approximately two dozen baseline variables obtained from medical records or reports been repeatedly trials overall (i.e., intervention v. control) differential A B). Similar has observational studies persistence-severity. However, no yet attempted develop outcome using full set these predictors. Promising preliminary empirical coupled recent developments statistical methodology suggest developed personalised selection. tools strong increase power focused response subsequent controlled trials. Conclusions. Coordinated efforts are needed protocol systematically collecting information about established predictors large studies, applying refining pragmatic trials, carrying out pooled secondary analyses extract maximum amount coordinated this focus future discovery segment population which continued uncertainty exists.

Language: Английский

Citations

170

Regularized Machine Learning in the Genetic Prediction of Complex Traits DOI Creative Commons

Sebastian Okser,

Tapio Pahikkala, Antti Airola

et al.

PLoS Genetics, Journal Year: 2014, Volume and Issue: 10(11), P. e1004754 - e1004754

Published: Nov. 13, 2014

Compared to univariate analysis of genome-wide association (GWA) studies, machine learning-based models have been shown provide improved means learning such multilocus panels genetic variants and their interactions that are most predictive complex phenotypic traits.Many applications modeling rely on effective variable selection, often implemented through model regularization, which penalizes the complexity enables predictions in individuals outside training dataset.However, different regularization approaches may also lead considerable differences, especially number needed for maximal accuracy, as illustrated here examples from both disease classification quantitative trait prediction.We highlight potential pitfalls regularized models, related issues overfitting data, overoptimistic prediction results, well identifiability variants, is important many medical applications.While risk human diseases used a motivating use case, we argue these widely applicable nonhuman applications, animal plant breeding, where accurate genotype-to-phenotype needed.Finally, discuss some key future advances, open questions challenges this developing field, when moving toward low-frequency cross-phenotype interactions.

Language: Английский

Citations

147

An unsupervised machine learning method for discovering patient clusters based on genetic signatures DOI Creative Commons
Christian López, Scott Tucker, T. Salameh

et al.

Journal of Biomedical Informatics, Journal Year: 2018, Volume and Issue: 85, P. 30 - 39

Published: July 29, 2018

Language: Английский

Citations

105

Discovering symptom patterns of COVID-19 patients using association rule mining DOI Open Access
Meera Tandan, Yogesh Acharya, Suresh Pokharel

et al.

Computers in Biology and Medicine, Journal Year: 2021, Volume and Issue: 131, P. 104249 - 104249

Published: Feb. 2, 2021

Language: Английский

Citations

93

Quantification of continuous flood hazard using random forest classification and flood insurance claims at large spatial scales: a pilot study in southeast Texas DOI Creative Commons
William H. Mobley, Antonia Sebastian, Russell Blessing

et al.

Natural hazards and earth system sciences, Journal Year: 2021, Volume and Issue: 21(2), P. 807 - 822

Published: March 1, 2021

Abstract. Pre-disaster planning and mitigation necessitate detailed spatial information about flood hazards their associated risks. In the US, Federal Emergency Management Agency (FEMA) Special Flood Hazard Area (SFHA) provides important areas subject to flooding during 1 % riverine or coastal event. The binary nature of hazard maps obscures distribution property risk inside SFHA residual outside SFHA, which can undermine efforts. Machine learning techniques provide an alternative approach estimating across large scales at low computational expense. This study presents a pilot for Texas Gulf Coast region using random forest classification predict probability 30 523 km2 area. Using record National Insurance Program (NFIP) claims dating back 1976 high-resolution geospatial data, we generate continuous map 12 US Geological Survey (USGS) eight-digit hydrologic unit code (HUC) watersheds. Results indicate that model predicts with high sensitivity (area under curve, AUC: 0.895), especially compared existing FEMA regulatory floodplain. Our identifies 649 000 structures least annual chance flooding, roughly 3 times more than are currently identified by as flood-prone.

Language: Английский

Citations

57

An information-gain approach to detecting three-way epistatic interactions in genetic association studies DOI Creative Commons

Ting Hu,

Yuanzhu Chen,

Jeff Kiralis

et al.

Journal of the American Medical Informatics Association, Journal Year: 2013, Volume and Issue: 20(4), P. 630 - 636

Published: Feb. 9, 2013

Epistasis has been historically used to describe the phenomenon that effect of a given gene on phenotype can be dependent one or more other genes, and is an essential element for understanding association between genetic phenotypic variations. Quantifying epistasis orders higher than two very challenging due both computational complexity enumerating all possible combinations in genome-wide data lack efficient effective methodologies.In this study, we propose fast, non-parametric, model-free measure three-way epistasis.Such based information gain, able separate lower order effects from pure epistasis.Our method was verified synthetic applied real candidate-gene study tuberculosis West African population. In data, found statistically significant epistatic interaction stronger any lower-order associations.Our provides methodological basis detecting characterizing high-order gene-gene interactions studies.

Language: Английский

Citations

78

A Machine Learning Approach to Predicting Case Duration for Robot-Assisted Surgery DOI
Beiqun Zhao, Ruth S. Waterman, Richard D. Urman

et al.

Journal of Medical Systems, Journal Year: 2019, Volume and Issue: 43(2)

Published: Jan. 5, 2019

Language: Английский

Citations

71