NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class DOI Creative Commons
Sang-Woong Han, Haemin Jung

PLoS ONE, Journal Year: 2024, Volume and Issue: 19(12), P. e0316454 - e0316454

Published: Dec. 31, 2024

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used credit due to its robustness, interpretability, computational efficiency; however, predictive power decreases when applied complex or non-linear datasets, resulting reduced accuracy. In contrast, tree-based machine learning often provide enhanced performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent scoring, can adversely impact model accuracy as the majority tends dominate. Despite these challenges, research that comprehensively addresses both explainability aspects within domain remains limited. This paper introduces Non-pArameTric oversampling approach Explainable (NATE), framework designed address challenges by combining techniques classifiers enhance NATE incorporates balancing methods mitigate of data distributions integrates interpretability features elucidate model’s decision-making process. Experimental results show substantially outperforms traditional logistic classification, improvements 19.33% AUC, 71.56% MCC, 85.33% F1 Score. Oversampling approaches, particularly gradient boosting, demonstrated superior effectiveness compared undersampling, achieving optimal metrics AUC: 0.9649, MCC: 0.8104, Score: 0.9072. Moreover, enhances providing detailed insights into feature contributions, aiding understanding individual predictions. These findings highlight NATE’s capability managing imbalance, improving performance, enhancing demonstrating potential reliable transparent tool applications.

Language: Английский

pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning DOI Creative Commons
Muhammad Khalil Shahid, Maqsood Hayat, Wajdi Alghamdi

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: Jan. 2, 2025

Worldwide, Cancer remains a significant health concern due to its high mortality rates. Despite numerous traditional therapies and wet-laboratory methods for treating cancer-affected cells, these approaches often face limitations, including costs substantial side effects. Recently the selectivity of peptides has garnered attention from scientists their reliable targeted actions minimal adverse Furthermore, keeping outcomes existing computational models, we propose highly effective model namely, pACP-HybDeep accurate prediction anticancer peptides. In this model, training are numerically encoded using an attention-based ProtBERT-BFD encoder extract semantic features along with CTDT-based structural information. k-nearest neighbor-based binary tree growth (BTG) algorithm is employed select optimal feature set multi-perspective vector. The selected vector subsequently trained CNN + RNN-based deep learning model. Our proposed demonstrated accuracy 95.33%, AUC 0.97. To validate generalization capabilities our achieved accuracies 94.92%, 92.26%, 91.16% on independent datasets Ind-S1, Ind-S2, Ind-S3, respectively. efficacy, reliability test establish it as valuable tool researchers in academia pharmaceutical drug design.

Language: Английский

Citations

8

XGBoost-enhanced ensemble model using discriminative hybrid features for the prediction of sumoylation sites DOI Creative Commons
Salman Khan, Sumaiya Noor,

Tahir Javed

et al.

BioData Mining, Journal Year: 2025, Volume and Issue: 18(1)

Published: Feb. 3, 2025

Language: Английский

Citations

2

Explainable AI-driven prediction of APE1 inhibitors: enhancing cancer therapy with machine learning models and feature importance analysis DOI

Aga Basit Iqbal,

Tariq Masoodi, Ajaz A. Bhat

et al.

Molecular Diversity, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 21, 2025

Language: Английский

Citations

2

Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE DOI Creative Commons

Javad Hemmatian,

Rassoul Hajizadeh, Fakhroddin Nazari

et al.

PLoS ONE, Journal Year: 2025, Volume and Issue: 20(2), P. e0317396 - e0317396

Published: Feb. 10, 2025

In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting performance classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines for minority classes with cluster-based noise reduction technique. approach, it is crucial that samples from each category form one or two clusters, feature conventional methods do not achieve. The proposed evaluated on four datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen’s kappa, Matthew’s correlation coefficient (MCC), F1-score, precision, recall. Results demonstrate consistently outperformed state-of-the-art (RN-SMOTE), SMOTE-Tomek Link, SMOTE-ENN across all datasets, particularly notable improvements observed QSAR Risk indicating its effectiveness enhancing performance. Overall, experimental findings indicate RN-SMOTE 100% cases, achieving average 6.6% Kappa, 4.01% MCC, 1.87% 1.7% 2.05% recall, setting SMOTE’s neighbors’ number 5.

Language: Английский

Citations

1

DeepAIPs-Pred: Predicting Anti-Inflammatory Peptides Using Local Evolutionary Transformation Images and Structural Embedding-Based Optimal Descriptors with Self-Normalized BiTCNs DOI
Shahid Akbar, Matee Ullah, Ali Raza

et al.

Journal of Chemical Information and Modeling, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 3, 2024

Inflammation is a biological response to harmful stimuli, playing crucial role in facilitating tissue repair by eradicating pathogenic microorganisms. However, when inflammation becomes chronic, it leads numerous serious disorders, particularly autoimmune diseases. Anti-inflammatory peptides (AIPs) have emerged as promising therapeutic agents due their high specificity, potency, and low toxicity. identifying AIPs using traditional vivo methods time-consuming expensive. Recent advancements computational-based intelligent models for offered cost-effective alternative various inflammatory diseases, owing selectivity toward targeted cells with side effects. In this paper, we propose novel computational model, namely, DeepAIPs-Pred, the accurate prediction of AIP sequences. The training samples are represented LBP-PSSM- LBP-SMR-based evolutionary image transformation methods. Additionally, capture contextual semantic features, employed attention-based ProtBERT-BFD embedding QLC structural features. Furthermore, differential evolution (DE)-based weighted feature integration utilized produce multiview vector. SMOTE-Tomek Links introduced address class imbalance problem, two-layer selection technique proposed reduce select optimal Finally, self-normalized bidirectional temporal convolutional networks (SnBiTCN) trained achieving significant predictive accuracy 94.92% an AUC 0.97. generalization our model validated two independent datasets, demonstrating higher performance improvement ∼2 ∼10% accuracies than existing state-of-the-art Ind-I Ind-II, respectively. efficacy reliability DeepAIPs-Pred highlight its potential valuable tool drug development research academia.

Language: Английский

Citations

6

Deep-m5U: a deep learning-based approach for RNA 5-methyluridine modification prediction using optimized feature integration DOI Creative Commons
Sumaiya Noor, Afshan Naseem,

Hamid Hussain Awan

et al.

BMC Bioinformatics, Journal Year: 2024, Volume and Issue: 25(1)

Published: Nov. 19, 2024

RNA 5-methyluridine (m5U) modifications play a crucial role in biological processes, making their accurate identification key focus computational biology. This paper introduces Deep-m5U, robust predictor designed to enhance the prediction of m5U modifications. The proposed method, named utilizes hybrid pseudo-K-tuple nucleotide composition (PseKNC) for sequence formulation, Shapley Additive exPlanations (SHAP) algorithm discriminant feature selection, and deep neural network (DNN) as classifier. model was evaluated using two benchmark datasets, i.e., Full Transcript Mature mRNA. Deep-m5U achieved overall accuracies 91.47% 95.86% mRNA datasets with 10-fold cross-validation, independent samples, attained 92.94% 95.17% accuracy. Compared existing models, showed approximately 5.23% 3.73% higher accuracy on training data 3.95% 3.26% samples respectively. reliability effectiveness make it valuable tool scientists potential asset pharmaceutical design research.

Language: Английский

Citations

4

Sequential recommendation via agent-based irrelevancy skipping DOI

Yu Cheng,

Jiawei Zheng,

B.H. Wu

et al.

Neural Networks, Journal Year: 2025, Volume and Issue: 185, P. 107134 - 107134

Published: Jan. 9, 2025

Language: Английский

Citations

0

FL-W3S: Cross-domain federated learning for weakly supervised semantic segmentation of white blood cells DOI Creative Commons
Hussain Ahmad Madni, Rao Muhammad Umer, Silvia Zottin

et al.

International Journal of Medical Informatics, Journal Year: 2025, Volume and Issue: 195, P. 105806 - 105806

Published: Jan. 23, 2025

Segmentation models for clinical data experience severe performance degradation when trained on a single client from one domain and distributed to other clients different domain. Federated Learning (FL) provides solution by enabling multi-party collaborative learning without compromising the confidentiality of clients' private data. In this paper, we propose cross-domain FL method Weakly Supervised Semantic (FL-W3S) white blood cells in microscopic images. We perform model training multiple with distributions obtain global aggregated using only image-level class labels semantic segmentation cells. A multi-class token transformer learns relationship between patch tokens during generates class-specific localization maps mask predictions. To rectify maps, use patch-level pairwise affinity obtained patch-to-patch attention. evaluate proposed two datasets domains. Our experimental results show that datasets, there is 2.56% 1.39% increase over existing state-of-the-art methods. The combination federated while preserving privacy, alongside cell techniques precise identification, enhances diagnostic accuracy personalized treatment strategies applications, particularly hematology pathology. More specifically, it involves isolating smear further analysis such as automated counting, morphological analysis, classification, disease diagnosis monitoring.

Language: Английский

Citations

0

Enhanced ResNet-50 for garbage classification: Feature fusion and depth-separable convolutions DOI Creative Commons
Lingbo Li,

Runpu Wang,

Miaojie Zou

et al.

PLoS ONE, Journal Year: 2025, Volume and Issue: 20(1), P. e0317999 - e0317999

Published: Jan. 27, 2025

As people’s material living standards continue to improve, the types and quantities of household garbage they generate rapidly increase. Therefore, it is urgent develop a reasonable effective method for classification. This important resource recycling environmental improvement contributes sustainable development production economy. However, existing deep learning-based image classification models generally suffer from low accuracy, insufficient robustness, slow detection speed due large number model parameters. To this end, new proposed, with ResNet-50 network as core architecture. Specifically, first, redundancy-weighted feature fusion module enabling fully leverage valuable information, thereby improving its performance. At same time, filters out redundant information multi-scale features, reducing Second, standard 3×3 convolutions in are replaced depth-separable convolutions, significantly model’s computational efficiency while preserving extraction capability original convolutional structure. Finally, address issue class imbalance, weighting factor added Focal Loss, aiming mitigate negative impact imbalance on performance enhance robustness. Experimental results TrashNet dataset show that proposed effectively reduces parameters, improves speed, achieves an accuracy 94.13%, surpassing vast majority waste models, demonstrating solid practical value.

Language: Английский

Citations

0

SurvBeNIM: The Beran-Based Neural Importance Model for Explaining Survival Models DOI Creative Commons
Lev V. Utkin,

Danila Y. Eremenko,

Andrei V. Konstantinov

et al.

IEEE Access, Journal Year: 2025, Volume and Issue: 13, P. 24137 - 24157

Published: Jan. 1, 2025

Language: Английский

Citations

0