Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search DOI Creative Commons
Ashis Kumar Mandal, Md Nadim, Hasi Saha

и другие.

IEEE Access, Год журнала: 2024, Номер 12, С. 62341 - 62357

Опубликована: Янв. 1, 2024

The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits

Язык: Английский

Integrated risk assessment framework for transformation products of emerging contaminants: what we know and what we should know DOI
Shengqi Zhang, Qian Yin, Siqin Wang

и другие.

Frontiers of Environmental Science & Engineering, Год журнала: 2023, Номер 17(7)

Опубликована: Май 10, 2023

Язык: Английский

Процитировано

11

GC × GC and computational strategies for detecting and analyzing environmental contaminants DOI
Teruyo Ieda, Shunji Hashimoto

TrAC Trends in Analytical Chemistry, Год журнала: 2023, Номер 165, С. 117118 - 117118

Опубликована: Июнь 1, 2023

Язык: Английский

Процитировано

11

Using machine learning to explore oxyanion adsorption ability of goethite with different specific surface area DOI
Kai Chen, Chuling Guo,

Chaoping Wang

и другие.

Environmental Pollution, Год журнала: 2023, Номер 343, С. 123162 - 123162

Опубликована: Дек. 16, 2023

Язык: Английский

Процитировано

10

POPs identification using simple low-code machine learning DOI
Xin Lei,

Haiying Yu,

Sisi Liu

и другие.

The Science of The Total Environment, Год журнала: 2024, Номер 921, С. 171143 - 171143

Опубликована: Фев. 20, 2024

Язык: Английский

Процитировано

4

Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search DOI Creative Commons
Ashis Kumar Mandal, Md Nadim, Hasi Saha

и другие.

IEEE Access, Год журнала: 2024, Номер 12, С. 62341 - 62357

Опубликована: Янв. 1, 2024

The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits

Язык: Английский

Процитировано

4