Information Processing & Management, Journal Year: 2024, Volume and Issue: 62(2), P. 103975 - 103975
Published: Nov. 23, 2024
Language: Английский
Information Processing & Management, Journal Year: 2024, Volume and Issue: 62(2), P. 103975 - 103975
Published: Nov. 23, 2024
Language: Английский
Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)
Published: March 4, 2025
For most classifiers, overlapping regions, where various classes are difficult to distinguish, affect the classifier's overall performance in multi-class imbalanced data more than imbalance itself. In problem-data space, overlapped samples share similar characteristics, resulting a complex boundary, making it separate of from each other, causing degradation. The research community agreed upon relationship class issues with classifier performance, but how much is affected still unanswered. There also gap literature demonstrate different levels problems. Accordingly, this paper, four algorithms implemented synthetically generate controlled be used multiclass datasets using schemes show worst effect overlapping. Experiments involve state-of-the-art non-parametric support vector machines, k-nearest neighbor, and random forest, classify these validate on their learning. models test suitability, stability, versatility proposed for highlight growing problems having an distribution issues. experimental results 20 real-world datasets, level underlying classifiers.
Language: Английский
Citations
0Journal Of Big Data, Journal Year: 2025, Volume and Issue: 12(1)
Published: March 23, 2025
Language: Английский
Citations
0Applied Sciences, Journal Year: 2024, Volume and Issue: 14(13), P. 5845 - 5845
Published: July 4, 2024
An innovative strategy for organizations to obtain value from their large datasets, allowing them guide future strategic actions and improve initiatives, is the use of machine learning algorithms. This has led a growing rapid application various algorithms with predominant focus on building improving performance these models. However, this data-centric approach ignores fact that data quality crucial robust accurate Several dataset issues, such as class imbalance, high dimensionality, overlapping, affect quality, introducing bias Therefore, adopting essential constructing better datasets producing effective Besides Big Data imposes new challenges, scalability paper proposes scalable hybrid jointly addressing overlapping in domains. The proposal based well-known data-level solutions whose main operation calculating nearest neighbor using Euclidean distance similarity metric. strategies may lose effectiveness dimensionality. Hence, achieved by combining transformation fractional norms SMOTE balanced reduced dataset. Experiments carried out nine two-class imbalanced high-dimensional showed our methodology implemented Spark outperforms traditional approach.
Language: Английский
Citations
3Neurocomputing, Journal Year: 2024, Volume and Issue: 609, P. 128492 - 128492
Published: Aug. 28, 2024
Language: Английский
Citations
3Expert Systems, Journal Year: 2024, Volume and Issue: 41(11)
Published: July 30, 2024
Abstract Class imbalance and class overlap create difficulties in the training phase of standard machine learning algorithm. Its performance is not well minority classes, especially when there a high significant overlap. Recently it has been observed by researchers that, joint effects are more harmful as compared to their direct impact. To handle these problems, many methods have proposed past years that can be broadly categorized data‐level, algorithm‐level, ensemble learning, hybrid methods. Existing data‐level often suffer from problems like information loss overfitting. overcome we introduce novel entropy‐based sampling (EHS) method highly imbalanced datasets. The EHS eliminates less informative majority instances region during undersampling regenerates synthetic oversampling near borderline. achieved improvement F1‐score, G‐mean, AUC metrics value DT, NB, SVM classifiers well‐established state‐of‐the‐art Classifiers performances tested on 28 datasets with extreme ranges
Language: Английский
Citations
2Neurocomputing, Journal Year: 2024, Volume and Issue: unknown, P. 128959 - 128959
Published: Nov. 1, 2024
Language: Английский
Citations
1Information Sciences, Journal Year: 2024, Volume and Issue: 675, P. 120752 - 120752
Published: May 18, 2024
Language: Английский
Citations
0Neurocomputing, Journal Year: 2024, Volume and Issue: 610, P. 128538 - 128538
Published: Sept. 3, 2024
Language: Английский
Citations
0Information Sciences, Journal Year: 2024, Volume and Issue: unknown, P. 121548 - 121548
Published: Oct. 1, 2024
Language: Английский
Citations
0Ore Geology Reviews, Journal Year: 2024, Volume and Issue: 175, P. 106329 - 106329
Published: Nov. 16, 2024
Language: Английский
Citations
0