
IEEE Access, Год журнала: 2024, Номер 12, С. 62341 - 62357
Опубликована: Янв. 1, 2024
The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits
Язык: Английский