GFPrint™: A machine learning tool for transforming genetic data into clinical insights DOI Creative Commons

Guillermo Sanz-Martín,

Daniela Migliore,

Pablo Gómez del Campo

et al.

PLoS ONE, Journal Year: 2024, Volume and Issue: 19(11), P. e0311370 - e0311370

Published: Nov. 27, 2024

The increasing availability of massive genetic sequencing data in the clinical setting has triggered need for appropriate tools to help fully exploit wealth information these possess. GFPrint™ is a proprietary streaming algorithm designed meet that need. By extracting most relevant functional features, transforms high-dimensional, noisy into an embedded representation, allowing unsupervised models create clusters can be re-mapped original information. Ultimately, this allows identification genes and pathways disease onset progression. been tested validated using two cancer genomic datasets publicly available. Analysis TCGA dataset identified panels whose mutations appear negatively influence survival non-metastatic colorectal (15 genes), epidermoid non-small cell lung (167 genes) pheochromocytoma (313 patients. Likewise, analysis Broad Institute 75 involved related extracellular matrix reorganization dictate worse prognosis breast accessible through secure web portal used any therapeutic area where profile patients influences evolution.

Language: Английский

An explainable ensemble approach for advanced brain tumor classification applying Dual-GAN mechanism and feature extraction techniques over highly imbalanced data DOI Creative Commons
Priyanka Roy, Fahim Mohammad Sadique Srijon, Pankaj Bhowmik

et al.

PLoS ONE, Journal Year: 2024, Volume and Issue: 19(9), P. e0310748 - e0310748

Published: Sept. 27, 2024

Brain tumors are one of the leading diseases imposing a huge morbidity rate across world every year. Classifying brain accurately plays crucial role in clinical diagnosis and improves overall healthcare process. ML techniques have shown promise classifying based on medical imaging data such as MRI scans. These aid detecting planning treatment early, improving patient outcomes. However, image datasets frequently affected by significant class imbalance, especially when benign outnumber malignant number. This study presents an explainable ensemble-based pipeline for tumor classification that integrates Dual-GAN mechanism with feature extraction techniques, specifically designed highly imbalanced data. facilitates generation synthetic minority samples, addressing imbalance issue without compromising original quality Additionally, integration different methods capturing precise informative features. proposes novel deep ensemble (DeepEFE) framework surpasses other benchmark learning models accuracy 98.15%. focuses achieving high while prioritizing stable performance. By incorporating Grad-CAM, it enhances transparency interpretability research identifies most relevant contributing parts input images toward accurate outcomes enhancing reliability proposed pipeline. The significantly improved Precision, Sensitivity F1-Score demonstrate effectiveness handling accuracy. Furthermore, explainability process to establish reliable model classification, encouraging their adoption practice promoting trust decision-making processes.

Language: Английский

Citations

0

Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data DOI Creative Commons
Jie-Huei Wang,

Po-Lin Hou,

Yi‐Hau Chen

et al.

Cancer Informatics, Journal Year: 2024, Volume and Issue: 23

Published: Jan. 1, 2024

Under the classification of multicategory survival outcomes cancer patients, it is crucial to identify biomarkers that affect specific outcome categories. The from transcriptomic data has been thoroughly investigated in computational biology. Nevertheless, several challenges must be addressed, including ultra-high-dimensional feature space, contamination, and imbalance, all which contribute instability diagnostic model. Furthermore, although most methods achieve accurate predicted performance for binary with high-dimensional data, their extension multi-class not straightforward.

Language: Английский

Citations

0

Data-adaptive binary classifiers in high dimensions using random partitioning DOI
Vahid Andalib, Seungchul Baek

Journal of Statistical Computation and Simulation, Journal Year: 2024, Volume and Issue: unknown, P. 1 - 24

Published: Oct. 18, 2024

Classification in high dimensions has been highlighted for the past two decades since Fisher's linear discriminant analysis (LDA) is not optimal a smaller sample size n comparing number of covariates p, i.e. p>n, which mostly due to singularity covariance matrix. Rather than modifying how estimate and mean vector constructing classifier, we build types high-dimensional classifiers using data splitting, single splitting (SDS) multiple (MDS). Moreover, introduce weighted version MDS classifier that improves classification performance as illustrated numerical studies. Each split sets compared so LDA applicable, results can be combined with respect minimizing misclassification rate. We present theoretical justification backing up our proposed methods by rates dimension. also conduct wide range simulations analyse four microarray sets, demonstrates outperform some existing or at least yield comparable performances.

Language: Английский

Citations

0

GFPrint™: A machine learning tool for transforming genetic data into clinical insights DOI Creative Commons

Guillermo Sanz-Martín,

Daniela Migliore,

Pablo Gómez del Campo

et al.

PLoS ONE, Journal Year: 2024, Volume and Issue: 19(11), P. e0311370 - e0311370

Published: Nov. 27, 2024

The increasing availability of massive genetic sequencing data in the clinical setting has triggered need for appropriate tools to help fully exploit wealth information these possess. GFPrint™ is a proprietary streaming algorithm designed meet that need. By extracting most relevant functional features, transforms high-dimensional, noisy into an embedded representation, allowing unsupervised models create clusters can be re-mapped original information. Ultimately, this allows identification genes and pathways disease onset progression. been tested validated using two cancer genomic datasets publicly available. Analysis TCGA dataset identified panels whose mutations appear negatively influence survival non-metastatic colorectal (15 genes), epidermoid non-small cell lung (167 genes) pheochromocytoma (313 patients. Likewise, analysis Broad Institute 75 involved related extracellular matrix reorganization dictate worse prognosis breast accessible through secure web portal used any therapeutic area where profile patients influences evolution.

Language: Английский

Citations

0