Survival prediction from imbalanced colorectal cancer dataset using hybrid sampling methods and tree-based classifiers DOI Creative Commons

Sadegh Soleimani,

Mahsa Bahrami, Mansour Vali

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 25, 2025

Colorectal cancer is a high mortality cancer, with rate of 64.5% for all stages combined. Clinical data analysis plays crucial role in predicting the survival colorectal patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical can be challenging, especially when dealing imbalanced outcomes, an aspect often overlooked this context. This paper focuses on developing algorithms predict 1-, 3-, and 5-year patients using datasets, particular emphasis highly 1-year prediction task. We utilized dataset from Surveillance, Epidemiology, End Results (SEER) database, which exhibits imbalance (1:10) 3-year (2:10) analysis, achieving balance analysis. The pre-processing step consists removing records missing values merging categories less than 2% share each categorical feature limit number classes component. Edited Nearest Neighbor, Repeated Neighbor (RENN), Synthetic Minority Over-sampling Technique (SMOTE), pipelines SMOTE RENN approaches were used balancing tree-based classifiers, including Decision Tree, Random Forest, Extra eXtreme Gradient Boosting, Light Boosting Machine (LGBM). performance evaluation utilizes 5-fold cross-validation approach. In case 1-year, our proposed method LGBM significantly outperforms other sampling methods sensitivity 72.30%. For task survival, combination achieves 80.81%, indicating that works best datasets. Additionally, reaches 63.03% LGBM. Our improves minority class patients. followed by yields better as predictor performing 1- survival. task, models terms F1-score.

Language: Английский

Hybrid Ensemble Architecture for Brain Tumor Segmentation Using EfficientNetB4-MobileNetV3 with Multi-Path Decoders DOI
Suhaila Abuowaida, Yazan Alnsour, Zaher Salah

et al.

Data & Metadata, Journal Year: 2025, Volume and Issue: 4, P. 374 - 374

Published: Feb. 26, 2025

Brain tumor segmentation based on multi-modal magnetic resonance imaging is a challenging medical problem due to tumors heterogeneity, irregular boundaries, and inconsistent appearances. For this purpose, we propose hybrid primal dual ensemble architecture leveraging EfficientNetB4 MobileNetV3 through cross-network novel feature interaction mechanism an adaptive learning approach. The proposed method enables by recent attention mechanisms, dedicated decoders, uncertainty estimation techniques. model was extensively evaluated using the BraTS2019-2021 datasets, achieving outstanding performance with mean Dice scores of 0.91, 0.87, 0.83 whole tumor, core enhancing regions respectively. achieves stable over range types sizes, low relative computational cost.

Language: Английский

Citations

0

Polynomial-SHAP as a SMOTE alternative in conglomerate neural networks for realistic data augmentation in cardiovascular and breast cancer diagnosis DOI Creative Commons
Chukwuebuka Joseph Ejiyi, Dongsheng Cai,

Francis Ofoma Eze

et al.

Journal Of Big Data, Journal Year: 2025, Volume and Issue: 12(1)

Published: April 18, 2025

Language: Английский

Citations

0

Trade-offs between machine learning and deep learning for mental illness detection on social media DOI Creative Commons

Z. P. Ding,

Zhongyan Wang,

Yeyubei Zhang

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 25, 2025

Social media platforms provide valuable insights into mental health trends by capturing user-generated discussions on conditions such as depression, anxiety, and suicidal ideation. Machine learning (ML) deep (DL) models have been increasingly applied to classify from textual data, but selecting the most effective model involves trade-offs in accuracy, interpretability, computational efficiency. This study evaluates multiple ML models, including logistic regression, random forest, LightGBM, alongside DL architectures ALBERT Gated Recurrent Units (GRUs), for both binary multi-class classification of conditions. Our findings indicate that achieve comparable performance medium-sized datasets, with offering greater interpretability through variable importance scores, while are more robust complex linguistic patterns. Additionally, require explicit feature engineering, whereas learn hierarchical representations directly text. Logistic regression provides advantage positive negative associations between features conditions, tree-based prioritize decision-making power split-based selection. offers empirical advantages limitations different modeling approaches recommendations appropriate methods based dataset size, needs, constraints.

Language: Английский

Citations

0

Survival prediction from imbalanced colorectal cancer dataset using hybrid sampling methods and tree-based classifiers DOI Creative Commons

Sadegh Soleimani,

Mahsa Bahrami, Mansour Vali

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 25, 2025

Colorectal cancer is a high mortality cancer, with rate of 64.5% for all stages combined. Clinical data analysis plays crucial role in predicting the survival colorectal patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical can be challenging, especially when dealing imbalanced outcomes, an aspect often overlooked this context. This paper focuses on developing algorithms predict 1-, 3-, and 5-year patients using datasets, particular emphasis highly 1-year prediction task. We utilized dataset from Surveillance, Epidemiology, End Results (SEER) database, which exhibits imbalance (1:10) 3-year (2:10) analysis, achieving balance analysis. The pre-processing step consists removing records missing values merging categories less than 2% share each categorical feature limit number classes component. Edited Nearest Neighbor, Repeated Neighbor (RENN), Synthetic Minority Over-sampling Technique (SMOTE), pipelines SMOTE RENN approaches were used balancing tree-based classifiers, including Decision Tree, Random Forest, Extra eXtreme Gradient Boosting, Light Boosting Machine (LGBM). performance evaluation utilizes 5-fold cross-validation approach. In case 1-year, our proposed method LGBM significantly outperforms other sampling methods sensitivity 72.30%. For task survival, combination achieves 80.81%, indicating that works best datasets. Additionally, reaches 63.03% LGBM. Our improves minority class patients. followed by yields better as predictor performing 1- survival. task, models terms F1-score.

Language: Английский

Citations

0