Epi-GTBN: an approach of epistasis mining based on genetic Tabu algorithm and Bayesian network DOI Creative Commons
Yang Guo,

Zhiman Zhong,

Chen Yang

et al.

BMC Bioinformatics, Journal Year: 2019, Volume and Issue: 20(1)

Published: Aug. 28, 2019

Mining epistatic loci which affects specific phenotypic traits is an important research issue in the field of biology. Bayesian network (BN) a graphical model can express relationship between genetic and phenotype. Until now, it has been widely used into epistasis mining many work. However, this method two disadvantages: low learning efficiency easy to fall local optimum. Genetic algorithm excellence rapid global search avoiding falling It scalable integrate with other algorithms. This work proposes approach based on tabu (Epi-GTBN). uses heuristic strategy network. The individual structure be evolved through operations selection, crossover mutation. help find optimal structure, then further mine effectively. In order enhance diversity population obtain more effective solution, we use mutation algorithm. accelerate convergence We compared Epi-GTBN recent algorithms using both simulated real datasets. experimental results demonstrate that our much better detection accuracy case not affecting for different presented methodology (Epi-GTBN) detection, seen as interesting addition arsenal complex analyses.

Language: Английский

Evaluation of Breast Cancer Susceptibility Using Improved Genetic Algorithms to Generate Genotype SNP Barcodes DOI
Cheng‐Hong Yang, Yu‐Da Lin, Li‐Yeh Chuang

et al.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal Year: 2013, Volume and Issue: 10(2), P. 361 - 371

Published: March 1, 2013

Genetic association is a challenging task for the identification and characterization of genes that increase susceptibility to common complex multifactorial diseases. To fully execute genetic studies diseases, modern geneticists face challenge detecting interactions between loci. A algorithm (GA) developed detect genotype frequencies cancer cases noncancer based on statistical analysis. An improved (IGA) proposed improve reliability GA method high-dimensional SNP-SNP interactions. The strategy offers top five results random population process, in which they guide toward significant search course. IGA increases likelihood quickly maximum ratio difference cases. study systematically evaluates joint effect 23 SNP combinations six steroid hormone metabolisms, signaling-related involved breast carcinogenesis pathways were evaluated, with successfully differences possible risks subsequently analyzed by odds-ratio (OR) risk-ratio estimated OR best barcode significantly higher than 1 (between 1.15 7.01) specific two 13 SNPs. Analysis support provides values over 3-SNP 13-SNP more interaction profile risk also provided.

Language: Английский

Citations

56

A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data DOI
Suneetha Uppu, Aneesh Krishna,

Raj P. Gopalan

et al.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal Year: 2016, Volume and Issue: 15(2), P. 599 - 612

Published: Dec. 2, 2016

In this era of genome-wide association studies (GWAS), the quest for understanding genetic architecture complex diseases is rapidly increasing more than ever before. The development high throughput genotyping and next generation sequencing technologies enables epidemiological analysis large scale data. These advances have led to identification a number single nucleotide polymorphisms (SNPs) responsible disease susceptibility. interactions between SNPs associated with are increasingly being explored in current literature. interaction mathematically challenging computationally complex. challenges been addressed by data mining machine learning approaches. This paper reviews methods related software packages detect SNP that contribute diseases. issues need be considered when developing these models review. also achievements simulation evaluate performance models. Further, it discusses future analysis.

Language: Английский

Citations

53

Machine learning approaches for the prediction of obesity using publicly available genetic profiles DOI
Casimiro Curbelo Montañez, Paul Fergus, Abir Hussain

et al.

2022 International Joint Conference on Neural Networks (IJCNN), Journal Year: 2017, Volume and Issue: unknown, P. 2743 - 2750

Published: May 1, 2017

This paper presents a novel approach based on the analysis of genetic variants from publicly available profiles and manually curated database, National Human Genome Research Institute Catalog. Using data science techniques, are identified in collected participant then indexed as risk Indexed or Single Nucleotide Polymorphisms used inputs various machine learning algorithms for prediction obesity. Body mass index status participants is divided into two classes, Normal Class Risk Class. Dimensionality reduction tasks performed to generate set principal variables - 13 SNPs application methods. The models evaluated using receiver operator characteristic curves area under curve. Machine techniques including gradient boosting, generalized linear model, classification regression trees, k-nearest neighbours, support vector machines, random forest multilayer perceptron neural network comparatively assessed terms their ability identify most important factors among initial 6622 describing variants, age gender, classify subject one body related classes defined this study. Our simulation results indicated that generated highest curve value 90.5%.

Language: Английский

Citations

49

Gene-gene and gene-environment interactions: new insights into the prevention, detection and management of coronary artery disease DOI Creative Commons
Matthew B. Lanktree, Robert A. Hegele

Genome Medicine, Journal Year: 2009, Volume and Issue: 1(2), P. 28 - 28

Published: Jan. 1, 2009

Despite the recent success of genome-wide association studies (GWASs) in identifying loci consistently associated with coronary artery disease (CAD), a large proportion genetic components CAD and its metabolic risk factors, including plasma lipids, type 2 diabetes body mass index, remain unattributed. Gene-gene gene-environment interactions might produce meaningful improvement quantification determinants CAD. Testing for gene-gene is thus new frontier large-scale GWASs There are several anecdotal examples monogenic susceptibility to which phenotype was worsened by an adverse environment. In addition, small-scale candidate gene functional hypotheses have identified interactions. For future evaluation achieve same as single associations reported GWASs, it will be important pre-specify agreed standards study design statistical power, environmental exposure measurement, phenomic characterization analytical strategies. Here we discuss these issues, particularly relation investigation potential clinical utility

Language: Английский

Citations

66

An Improved PSO Algorithm for Generating Protective SNP Barcodes in Breast Cancer DOI Creative Commons
Li‐Yeh Chuang, Yu‐Da Lin, Hsueh‐Wei Chang

et al.

PLoS ONE, Journal Year: 2012, Volume and Issue: 7(5), P. e37018 - e37018

Published: May 18, 2012

Possible single nucleotide polymorphism (SNP) interactions in breast cancer are usually not investigated genome-wide association studies. Previously, we proposed a particle swarm optimization (PSO) method to compute these kinds of SNP interactions. However, this PSO does guarantee find the best result every implement, especially when high-dimensional data is for SNP-SNP interactions.In study, propose IPSO algorithm improve reliability identification protective barcodes (SNP combinations and genotypes with maximum difference between cases controls) associated cancer. containing different numbers SNPs were computed. The top five barcode results retained computing next one-SNP-increase each processing step. Based on simulated 23 six steroid hormone metabolisms signalling-related genes, performance our evaluated. Among SNPs, 13 displayed significant odds ratio (OR) values (1.268 0.848; p<0.05) algorithm, jointed effect terms two seven show significantly decreasing OR (0.84 0.57; p<0.05 0.001). Using four 0.77; 20 simulations, medians differences generated by higher than PSO. interquartile ranges boxplot, as well upper lower hinges n-SNP (n = 3∼10) more narrow PSO, suggesting that highly reliable identification.Overall, robust provide exact

Language: Английский

Citations

51

A Survey of Computational Intelligence Techniques in Protein Function Prediction DOI Open Access
Arvind Kumar Tiwari, Rajeev Srivastava

International Journal of Proteomics, Journal Year: 2014, Volume and Issue: 2014, P. 1 - 22

Published: Dec. 11, 2014

During the past, there was a massive growth of knowledge unknown proteins with advancement high throughput microarray technologies. Protein function prediction is most challenging problem in bioinformatics. In homology based approaches were used to predict protein function, but they failed when new different from previous one. Therefore, alleviate problems associated traditional approaches, numerous computational intelligence techniques have been proposed recent past. This paper presents state-of-the-art comprehensive review various for predictions using sequence, structure, protein-protein interaction network, and gene expression data wide areas applications such as DNA RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, pathway analysis datasets. also summarizes result obtained by many researchers solve these appropriate datasets improve performance. The summary shows that ensemble classifiers integration multiple heterogeneous are useful prediction.

Language: Английский

Citations

44

Big Data and Causality DOI
Hossein Hassani, Xu Huang, Mansi Ghodsi

et al.

Annals of Data Science, Journal Year: 2017, Volume and Issue: 5(2), P. 133 - 156

Published: Aug. 1, 2017

Language: Английский

Citations

40

Gene-gene interaction: the curse of dimensionality DOI Open Access
Amrita Chattopadhyay, Tzu‐Pin Lu

Annals of Translational Medicine, Journal Year: 2019, Volume and Issue: 7(24), P. 813 - 813

Published: Dec. 1, 2019

Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to "missing heritability" problem. An avenue, account for a part of this "missingness" is evaluate gene-gene interactions (epistasis) thereby elucidating their effect complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, exhaustive evaluation all possible among millions single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as "curse dimensionality". The dimensionality involved in epistatic analysis such exponentially growing SNPs diminishes usefulness traditional, parametric statistical methods. With immense popularity multifactor reduction (MDR), non-parametric method, proposed 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led emergence fast-growing collection methods were based MDR approach. Moreover, machine-learning (ML) random forests neural networks (NNs), deep-learning (DL) hybrid approaches have also been applied profusely, recent years, tackle issue associated whole interaction studies. searching or variable selection ML methods, still pose risk missing out relevant SNPs. Furthermore, interpretability issues are major hindrance DL To minimize loss information, Python tools PySpark take advantage distributed computing resources cloud, bring back smaller subsets data further local analysis. Parallel be powerful resource stands fight "curse". supports standard libraries C extensions thus making it convenient write codes deliver dramatic improvements processing speed extraordinarily large sets data.

Language: Английский

Citations

40

Considerations in the search for epistasis DOI Creative Commons
Marleen Balvert, Johnathan Cooper‐Knock, Julian Stamp

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: Nov. 19, 2024

Epistasis refers to changes in the effect on phenotype of a unit genetic information, such as single nucleotide polymorphism or gene, dependent context other units. Such interactions are both biologically plausible and good candidates explain observations which not fully explained by an additive heritability model. However, search for epistasis has so far largely failed recover this missing heritability. We identify key challenges propose that future works need leverage idealized systems, known biology even previously identified epistatic interactions, order guide new interactions.

Language: Английский

Citations

4

How can media attention reveal ESG improvement opportunities? A multi-algorithm machine learning-based approach for Taiwan’s electronics industry DOI
Shili Lin, Yueqian Lin, Xiaojun Jin

et al.

The North American Journal of Economics and Finance, Journal Year: 2025, Volume and Issue: unknown, P. 102431 - 102431

Published: April 1, 2025

Language: Английский

Citations

0