Predicting compressive strength of pervious concrete with fly ash: a machine learning approach and analysis of fly ash compositional influence DOI
Navaratnarajah Sathiparan, Pratheeba Jeyananthan, Daniel Niruban Subramaniam

и другие.

Multiscale and Multidisciplinary Modeling Experiments and Design, Год журнала: 2024, Номер 7(6), С. 5651 - 5671

Опубликована: Июль 25, 2024

Язык: Английский

Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers DOI

Mohammad Reza Abbaszadeh Bavil Soflaei,

Arash Salehpour,

Karim Samadzamini

и другие.

The Journal of Supercomputing, Год журнала: 2024, Номер 80(11), С. 16301 - 16333

Опубликована: Апрель 10, 2024

Язык: Английский

Процитировано

9

Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis DOI Open Access
Fatih Gürcan, Ahmet Soylu

Cancers, Год журнала: 2024, Номер 16(19), С. 3417 - 3417

Опубликована: Окт. 8, 2024

This study aims to evaluate the performance of various classification algorithms and resampling methods across multiple diagnostic prognostic cancer datasets, addressing challenges class imbalance.

Язык: Английский

Процитировано

9

Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction DOI Creative Commons

Zixue Zhao,

Tianxiang Cui, Shusheng Ding

и другие.

Mathematics, Год журнала: 2024, Номер 12(5), С. 701 - 701

Опубликована: Фев. 28, 2024

Credit risk prediction heavily relies on historical data provided by financial institutions. The goal is to identify commonalities among defaulting users based existing information. However, defaulters often limited, leading a concentration of credit where positive samples (defaults) are significantly fewer than negative (nondefaults). It poses serious challenge known as the class imbalance problem, which can substantially impact quality and predictive model effectiveness. To address various resampling techniques have been proposed studied extensively. despite ongoing research, there no consensus most effective technique. choice technique closely related dataset size ratio, its effectiveness varies across different classifiers. Moreover, notable gap in research concerning suitable for extremely imbalanced datasets. Therefore, this study aims compare popular datasets classifiers while also proposing novel hybrid sampling method tailored Our experimental results demonstrate that new enhances classifier performance, shedding light strategies managing problem prediction.

Язык: Английский

Процитировано

8

Deep learning framework with Bayesian data imputation for modelling and forecasting groundwater levels DOI
Eric Chen, Martin S. Andersen, Rohitash Chandra

и другие.

Environmental Modelling & Software, Год журнала: 2024, Номер 178, С. 106072 - 106072

Опубликована: Май 19, 2024

Язык: Английский

Процитировано

8

EfficientNet-ECA: A lightweight network based on efficient channel attention for class-imbalanced welding defects classification DOI
Yue Zhang, Qiang Zhan, Ma Zhi

и другие.

Advanced Engineering Informatics, Год журнала: 2024, Номер 62, С. 102737 - 102737

Опубликована: Июль 27, 2024

Язык: Английский

Процитировано

8

Predictive analytics of wear performance in high entropy alloy coatings through machine learning DOI
S. Sivaraman,

N. Radhika

Physica Scripta, Год журнала: 2024, Номер 99(7), С. 076014 - 076014

Опубликована: Июнь 10, 2024

Abstract High-entropy alloys (HEAs) are increasingly renowned for their distinct microstructural compositions and exceptional properties. These HEAs employed surface modification as coatings exhibit phenomenal mechanical characteristics including wear corrosion resistance which extensively utilized in various industrial applications. However, assessing the behaviour of HEA through conventional methods remains challenging time-consuming due to complexity structures. In this study, a novel methodology has been proposed predicting using Machine Learning (ML) algorithms such Support Vector (SVM), Linear Regression (LR), Gaussian Process (GPR), Least Absolute Shrinkage Selection Operator (LASSO), Bagging (BR), Gradient Boosting Tree (GBRT), Robust regressions (RR). The analysis integrates 75 combinations with processing parameters test results from peer-reviewed journals model training validation. Among ML models utilized, GBRT was found be more effective rate Coefficient Friction (COF) highest correlation coefficient R 2 value 0.95 ∼ 0.97 minimal errors. optimum is used predict unknown properties conducted experiments validate results, making crucial resource engineers materials sector.

Язык: Английский

Процитировано

7

Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets DOI Open Access
Nazim Razali, Nureize Arbaiy, Pei‐Chun Lin

и другие.

Electronics, Год журнала: 2025, Номер 14(4), С. 705 - 705

Опубликована: Фев. 12, 2025

Multiclass classification in machine learning often faces significant challenges due to unbalanced datasets. This situation leads biased predictions and reduced model performance. research addresses this issue by proposing a novel approach that combines convolutional neural networks (CNNs) with class weights early-stopping techniques. The motivation behind study stems from the need improve performance, especially for minority classes, which are neglected existing methodologies. Although various strategies such as resampling, ensemble methods, data augmentation have been explored, they frequently limitations based on characteristics of specific type. Our focuses optimizing loss function via give greater importance classes. Therefore, it reduces bias improves overall accuracy. Furthermore, we implement early stopping avoid overfitting generalization continuously monitoring validation performance during training. contributes body knowledge demonstrating effectiveness combined technique improving multiclass scenarios. proposed is tested oil palm leaves analysis identify deficiencies nitrogen (N), boron (B), magnesium (Mg), potassium (K). CNN three layers SoftMax activation was trained 200 epochs each. compared scenarios: training imbalanced dataset, weights, stopping. results showed applying significantly improved accuracy, trade-off other predictions. indicates that, while weight has positive impact, further necessary across all categories study.

Язык: Английский

Процитировано

1

AE-XGBoost: A Novel Approach for Machine Tool Machining Size Prediction Combining XGBoost, AE and SHAP DOI Creative Commons
Mu Gu, Sung-Kwan Kang,

Zishuo Xu

и другие.

Mathematics, Год журнала: 2025, Номер 13(5), С. 835 - 835

Опубликована: Март 2, 2025

To achieve intelligent manufacturing and improve the machining quality of machine tools, this paper proposes an interpretable size prediction model combining eXtreme Gradient Boosting (XGBoost), autoencoder (AE), Shapley additive explanation (SHAP) analysis. In study, XGBoost was used to establish evaluation system for actual computer numerical control (CNC) tools. The combined with SHAP approximation effectively capture local global features in data using autoencoders transform preprocessed into more representative feature vectors. Grey correlation analysis (GRA) principal component (PCA) were reduce dimensions original features, synthetic minority overstimulation technique Gaussian noise regression (SMOGN) method deal problem imbalance. Taking tool as response parameter, based on parameters milling process CNC tool, effectiveness is verified. experimental results show that proposed AE-XGBoost superior traditional method, accuracy 7.11% higher than method. subsequent reveals importance interrelationship provides a reliable decision support processing personnel, helping manufacturing.

Язык: Английский

Процитировано

1

TransECA-Net: A Transformer-Based Model for Encrypted Traffic Classification DOI Creative Commons

Z. Liu,

Yuanyuan Xie,

Yanyan Luo

и другие.

Applied Sciences, Год журнала: 2025, Номер 15(6), С. 2977 - 2977

Опубликована: Март 10, 2025

Encrypted network traffic classification remains a critical component in security monitoring. However, existing approaches face two fundamental limitations: (1) conventional methods rely on manual feature engineering and are inadequate handling high-dimensional features; (2) they lack the capability to capture dynamic temporal patterns. This paper introduces TransECA-Net, novel hybrid deep learning architecture that addresses these limitations through key innovations. First, we integrate ECA-Net modules with CNN enable automated extraction efficient dimension reduction via channel selection. Second, incorporate Transformer encoder model global dependencies multi-head self-attention, supplemented by residual connections for optimal gradient flow. Extensive experiments ISCX VPN-nonVPN dataset demonstrate superiority of our approach. TransECA-Net achieved an average accuracy 98.25% classifying 12 types encrypted traffic, outperforming classical baseline models such as 1D-CNN, + LSTM, TFE-GNN 6.2–14.8%. Additionally, it demonstrated 37.44–48.84% improvement convergence speed during training process. Our proposed framework presents new paradigm disentanglement representation learning. enables cybersecurity systems achieve fine-grained service identification (e.g., 98.9% VPN detection) real-time responsiveness (48.8% faster than methods), providing technical support combating emerging cybercrimes monitoring illegal transactions darknet networks contributing significantly adaptive systems.

Язык: Английский

Процитировано

1

Improving GBDT performance on imbalanced datasets: An empirical study of class-balanced loss functions DOI
Jiaqi Luo, Yuan Yuan, Shixin Xu

и другие.

Neurocomputing, Год журнала: 2025, Номер unknown, С. 129896 - 129896

Опубликована: Март 1, 2025

Язык: Английский

Процитировано

1