Overlap to equilibrium: Oversampling imbalanced datasets using overlapping degree DOI
Sidra Jubair, Jie Yang, Bilal Ali

et al.

Information Processing & Management, Journal Year: 2024, Volume and Issue: 62(2), P. 103975 - 103975

Published: Nov. 23, 2024

Language: Английский

Algorithmic and mathematical modeling for synthetically controlled overlapping DOI Creative Commons

Zafar Mahmood,

Mejdl Safran,

Abdussamad

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: March 4, 2025

For most classifiers, overlapping regions, where various classes are difficult to distinguish, affect the classifier's overall performance in multi-class imbalanced data more than imbalance itself. In problem-data space, overlapped samples share similar characteristics, resulting a complex boundary, making it separate of from each other, causing degradation. The research community agreed upon relationship class issues with classifier performance, but how much is affected still unanswered. There also gap literature demonstrate different levels problems. Accordingly, this paper, four algorithms implemented synthetically generate controlled be used multiclass datasets using schemes show worst effect overlapping. Experiments involve state-of-the-art non-parametric support vector machines, k-nearest neighbor, and random forest, classify these validate on their learning. models test suitability, stability, versatility proposed for highlight growing problems having an distribution issues. experimental results 20 real-world datasets, level underlying classifiers.

Language: Английский

Citations

0

Resampling approaches to handle class imbalance: a review from a data perspective DOI Creative Commons
Miguel Martins Carvalho, Armando J. Pinho, Susana Brás

et al.

Journal Of Big Data, Journal Year: 2025, Volume and Issue: 12(1)

Published: March 23, 2025

Language: Английский

Citations

0

Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlapping DOI Creative Commons

Armando Bolívar,

Vicente García, R. Alejo

et al.

Applied Sciences, Journal Year: 2024, Volume and Issue: 14(13), P. 5845 - 5845

Published: July 4, 2024

An innovative strategy for organizations to obtain value from their large datasets, allowing them guide future strategic actions and improve initiatives, is the use of machine learning algorithms. This has led a growing rapid application various algorithms with predominant focus on building improving performance these models. However, this data-centric approach ignores fact that data quality crucial robust accurate Several dataset issues, such as class imbalance, high dimensionality, overlapping, affect quality, introducing bias Therefore, adopting essential constructing better datasets producing effective Besides Big Data imposes new challenges, scalability paper proposes scalable hybrid jointly addressing overlapping in domains. The proposal based well-known data-level solutions whose main operation calculating nearest neighbor using Euclidean distance similarity metric. strategies may lose effectiveness dimensionality. Hence, achieved by combining transformation fractional norms SMOTE balanced reduced dataset. Experiments carried out nine two-class imbalanced high-dimensional showed our methodology implemented Spark outperforms traditional approach.

Language: Английский

Citations

3

Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap DOI

Peiqi Sun,

Yanhui Du,

Siyun Xiong

et al.

Neurocomputing, Journal Year: 2024, Volume and Issue: 609, P. 128492 - 128492

Published: Aug. 28, 2024

Language: Английский

Citations

3

Entropy‐based hybrid sampling (EHS) method to handle class overlap in highly imbalanced dataset DOI Open Access
Anil Kumar,

Dinesh Singh,

Rama Shankar Yadav

et al.

Expert Systems, Journal Year: 2024, Volume and Issue: 41(11)

Published: July 30, 2024

Abstract Class imbalance and class overlap create difficulties in the training phase of standard machine learning algorithm. Its performance is not well minority classes, especially when there a high significant overlap. Recently it has been observed by researchers that, joint effects are more harmful as compared to their direct impact. To handle these problems, many methods have proposed past years that can be broadly categorized data‐level, algorithm‐level, ensemble learning, hybrid methods. Existing data‐level often suffer from problems like information loss overfitting. overcome we introduce novel entropy‐based sampling (EHS) method highly imbalanced datasets. The EHS eliminates less informative majority instances region during undersampling regenerates synthetic oversampling near borderline. achieved improvement F1‐score, G‐mean, AUC metrics value DT, NB, SVM classifiers well‐established state‐of‐the‐art Classifiers performances tested on 28 datasets with extreme ranges

Language: Английский

Citations

2

Newton cooling theorem-based local overlapping regions cleaning and oversampling techniques for imbalanced datasets DOI
Liangliang Tao, Qingya Wang,

Fen Yu

et al.

Neurocomputing, Journal Year: 2024, Volume and Issue: unknown, P. 128959 - 128959

Published: Nov. 1, 2024

Language: Английский

Citations

1

Generative adversarial networks for overlapped and imbalanced problems in impact damage classification DOI
Quoc Hoan Doan, Behrooz Keshtegar,

Seung-Eock Kim

et al.

Information Sciences, Journal Year: 2024, Volume and Issue: 675, P. 120752 - 120752

Published: May 18, 2024

Language: Английский

Citations

0

NCLWO: Newton’s cooling law-based weighted oversampling algorithm for imbalanced datasets with feature noise DOI
Liangliang Tao, Qingya Wang, Zhicheng Zhu

et al.

Neurocomputing, Journal Year: 2024, Volume and Issue: 610, P. 128538 - 128538

Published: Sept. 3, 2024

Language: Английский

Citations

0

PRO-SMOTEBoost: An Adaptive SMOTEBoost Probabilistic Algorithm for Rebalancing and Improving Imbalanced Data Classification DOI
Laouni Djafri

Information Sciences, Journal Year: 2024, Volume and Issue: unknown, P. 121548 - 121548

Published: Oct. 1, 2024

Language: Английский

Citations

0

A balanced mineral prospectivity model of Canadian magmatic Ni (± Cu ± Co ± PGE) sulphide mineral systems using conditional variational autoencoders DOI Creative Commons

Lahiru M.A. Nagasingha,

Charles L. Bérubé, C J M Lawley

et al.

Ore Geology Reviews, Journal Year: 2024, Volume and Issue: 175, P. 106329 - 106329

Published: Nov. 16, 2024

Language: Английский

Citations

0