Computers & Industrial Engineering, Год журнала: 2024, Номер unknown, С. 110754 - 110754
Опубликована: Ноя. 1, 2024
Язык: Английский
Computers & Industrial Engineering, Год журнала: 2024, Номер unknown, С. 110754 - 110754
Опубликована: Ноя. 1, 2024
Язык: Английский
Multimedia Tools and Applications, Год журнала: 2024, Номер 83(23), С. 63243 - 63290
Опубликована: Янв. 11, 2024
Язык: Английский
Процитировано
11Information Sciences, Год журнала: 2024, Номер 662, С. 120263 - 120263
Опубликована: Фев. 1, 2024
Язык: Английский
Процитировано
4Expert Systems, Год журнала: 2024, Номер 41(11)
Опубликована: Июль 30, 2024
Abstract Class imbalance and class overlap create difficulties in the training phase of standard machine learning algorithm. Its performance is not well minority classes, especially when there a high significant overlap. Recently it has been observed by researchers that, joint effects are more harmful as compared to their direct impact. To handle these problems, many methods have proposed past years that can be broadly categorized data‐level, algorithm‐level, ensemble learning, hybrid methods. Existing data‐level often suffer from problems like information loss overfitting. overcome we introduce novel entropy‐based sampling (EHS) method highly imbalanced datasets. The EHS eliminates less informative majority instances region during undersampling regenerates synthetic oversampling near borderline. achieved improvement F1‐score, G‐mean, AUC metrics value DT, NB, SVM classifiers well‐established state‐of‐the‐art Classifiers performances tested on 28 datasets with extreme ranges
Язык: Английский
Процитировано
3Expert Systems with Applications, Год журнала: 2025, Номер 276, С. 126942 - 126942
Опубликована: Март 16, 2025
Язык: Английский
Процитировано
0Concurrency and Computation Practice and Experience, Год журнала: 2024, Номер unknown
Опубликована: Июнь 30, 2024
Summary Imbalanced samples are widespread, which impairs the generalization and fairness of models. Semi‐supervised learning can overcome deficiency rare labeled samples, but it is challenging to select high‐quality pseudo‐label data. Unlike discrete labels that be matched one‐to‐one with points on a numerical axis, in regression tasks consecutive cannot directly chosen. Besides, distribution unlabeled data imbalanced, easily leads an imbalanced data, exacerbating imbalance semi‐supervised dataset. To solve this problem, article proposes network (SIRN), consists two components: A, designed learn relationship between features (targets), B, dedicated target deviations. measure deviations under distribution, deviation function introduced. continuous pseudo‐labels, matching strategy designed. Furthermore, adaptive selection developed mitigate risk skewed distributions due Finally, effectiveness proposed method validated through evaluations tasks. The results show great reduction predicted value error, particularly few‐shot regions. This empirical evidence confirms efficacy our addressing issue
Язык: Английский
Процитировано
1Neurocomputing, Год журнала: 2024, Номер unknown, С. 128959 - 128959
Опубликована: Ноя. 1, 2024
Язык: Английский
Процитировано
1Information Sciences, Год журнала: 2024, Номер 675, С. 120752 - 120752
Опубликована: Май 18, 2024
Язык: Английский
Процитировано
0International Journal of Advanced Computer Science and Applications, Год журнала: 2024, Номер 15(6)
Опубликована: Янв. 1, 2024
Anomaly detection aims to build a decision model that estimates the class of new data based on historical sample features. However, distance between samples in feature space is very close sometimes, resulting being invisible overlap problem. To address this issue, an anomaly Pearson correlation coefficient and gradient booster mechanism proposed paper. Different from traditional resampling methods, method groups sorts features different dimensions such as correlation, importance, exclusivity firstly. Then, it selects with higher lower importance for deletion improve training accuracy detector. Furthermore, through unilateral sampling mechanism, ineffective or inefficient can be further reduced efficiency Finally, was compared three selection methods six ensemble models datasets. The experimental results showed has significant advantages selection, performance, stability, computational cost.
Язык: Английский
Процитировано
0Foods, Год журнала: 2024, Номер 13(20), С. 3300 - 3300
Опубликована: Окт. 17, 2024
Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck machine learning/data mining and is becoming serious issue concern food processing applications. Inappropriate analysis agricultural was limiting the robustness predictive models built from agri-food As result rare cases occurring infrequently, classification rules that detect small groups are scarce, so samples belonging to classes largely misclassified. Most existing learning algorithms including K-means, decision trees, support vector machines (SVMs) not optimal handling imbalanced data. Consequently, developed such very prone rejection non-adoptability real industrial commercial settings. This paper showcases reality applications therefore proposes some state-of-the-art artificial intelligence algorithm approaches for using methods resampling, one-class learning, ensemble methods, feature selection, deep techniques. further evaluates newer metrics well suited Rightly analyzing application research works will improve accuracy results model developments. consequently enhance acceptability adoptability innovations/inventions.
Язык: Английский
Процитировано
0Information Sciences, Год журнала: 2024, Номер unknown, С. 121548 - 121548
Опубликована: Окт. 1, 2024
Язык: Английский
Процитировано
0