Explainable machine learning to compare the overall survival status between patients receiving mastectomy and breast conserving surgeries DOI Creative Commons

Betelhem Bizuneh Asfaw,

Eyachew Misganew Tegaw

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: March 28, 2025

The most prevalent malignancy among women is breast cancer; hence, treatment approaches are needed in consideration of tumor characteristics and disease stage but also patient preference. Two surgical options, Mastectomy Breast Conserving Surgery (BCS), share the same survival outcomes, clinical or molecular factors; explainable Machine Learning (ML) techniques like SHapley Additive exPlanations (SHAP) offer further insights. To compare overall status cancer patients undergoing versus BCS using ML models SHAP values, identifying key predictors for survival. This study used Molecular Taxonomy Cancer International Consortium (METABRIC) dataset, which contains 2509 with features. preprocessing steps included imputation missing class balancing Synthetic Minority Over-sampling Technique (SMOTE), feature selection. Gradient Boosting was identified as best model, considering metrics such accuracy, precision, Area Under Receiver Operating Characteristic Curve (ROC-AUC). values were importance, detailing contribution to outcomes both groups. achieved a training accuracy 95.4% test 86.4% Mastectomy, 94.6% 82.8% respectively BCS. Strong Relapse Free Status, Nottingham Prognostic Index Age at Diagnosis. analysis indicated that Status an important predictor across surgeries though there specific influences Menopausal State. Younger benefited more while older ones faced higher risks from Mastectomy. performance significantly higher-3.73 than Mastectomy-1.21. SHAP-driven insights pointed toward personalized approach treatment, depending on predictors. will justify tailored adjuvant therapies achieving optimized

Language: Английский

An Integrated Stacking Ensemble Model for Natural Gas Purchase Prediction Incorporating Multiple Features DOI Creative Commons
Junjie Wang, Lei Jiang, Le Zhang

et al.

Applied Sciences, Journal Year: 2025, Volume and Issue: 15(2), P. 778 - 778

Published: Jan. 14, 2025

Accurate prediction of natural gas purchase volumes is crucial for both the economy and environment. It not only facilitates rational allocation resources companies but also helps to reduce operational costs. Although existing methods have achieved some success in addressing nonlinear relationships purchases, there remains potential further improvement. To address this issue, a stacking ensemble learning model was developed enhance ability handle complex problems. This integrates diverse algorithms incorporates weather factors, while regionalizing characteristics usage, thereby achieving accurate forecasts volumes. We selected three distinctly different base models—Informer, multiple linear regression (MLR), support vector (SVR)—for our research. By conducting four feature combination experiments each model, including weather, time, regional, usage features, we constructed 12 foundational models. Subsequently, integrated these models using meta-learner form final model. The experimental results indicate that outperforms individual across key metrics, R2, MRE, RMSE. Notably, R2 values improved by 4–15% compared subsequently applied predict Pi County, Chengdu, China. In November 2024, side-by-side comparison predicted actual data revealed maximum error just 5.39%. exceptional accuracy effectively meets forecasting requirements, underscoring model’s predictive strength energy sector.

Language: Английский

Citations

0

LD-SMOTE: A Novel Local Density Estimation-Based Oversampling Method for Imbalanced Datasets DOI Creative Commons
Jing Lyu, Jie Yang, Zhixun Su

et al.

Symmetry, Journal Year: 2025, Volume and Issue: 17(2), P. 160 - 160

Published: Jan. 22, 2025

Imbalanced data have become an essential stumbling block in the field of machine learning. In this paper, a novel oversampling method based on local density estimation, namely LD-SMOTE, is presented to address constraints popular rebalance technique SMOTE. LD-SMOTE initiates with k-means clustering quantificationally measure classification contribution each feature. Subsequently, distance metric grounded Jaccard similarity defined, which accentuates features that are more intricately linked minority class. Utilizing metric, we estimate Gaussian-like function control quantity synthetic samples around every sample, thus simulating distribution Additionally, generation occurs within triangular region constructed by sample and its two chosen neighbors instead line connecting one neighbors. Experimental comparisons between 16 existing resampling methods 19 datasets reveal significant average increase 6.4% accuracy, 4.4% F-measure, 5.4% G-mean, 4.0% AUC. This result indicates can be alternative for imbalanced datasets.

Language: Английский

Citations

0

Explainable machine learning to compare the overall survival status between patients receiving mastectomy and breast conserving surgeries DOI Creative Commons

Betelhem Bizuneh Asfaw,

Eyachew Misganew Tegaw

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: March 28, 2025

The most prevalent malignancy among women is breast cancer; hence, treatment approaches are needed in consideration of tumor characteristics and disease stage but also patient preference. Two surgical options, Mastectomy Breast Conserving Surgery (BCS), share the same survival outcomes, clinical or molecular factors; explainable Machine Learning (ML) techniques like SHapley Additive exPlanations (SHAP) offer further insights. To compare overall status cancer patients undergoing versus BCS using ML models SHAP values, identifying key predictors for survival. This study used Molecular Taxonomy Cancer International Consortium (METABRIC) dataset, which contains 2509 with features. preprocessing steps included imputation missing class balancing Synthetic Minority Over-sampling Technique (SMOTE), feature selection. Gradient Boosting was identified as best model, considering metrics such accuracy, precision, Area Under Receiver Operating Characteristic Curve (ROC-AUC). values were importance, detailing contribution to outcomes both groups. achieved a training accuracy 95.4% test 86.4% Mastectomy, 94.6% 82.8% respectively BCS. Strong Relapse Free Status, Nottingham Prognostic Index Age at Diagnosis. analysis indicated that Status an important predictor across surgeries though there specific influences Menopausal State. Younger benefited more while older ones faced higher risks from Mastectomy. performance significantly higher-3.73 than Mastectomy-1.21. SHAP-driven insights pointed toward personalized approach treatment, depending on predictors. will justify tailored adjuvant therapies achieving optimized

Language: Английский

Citations

0