Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search DOI Creative Commons
Ashis Kumar Mandal, Md Nadim, Hasi Saha

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 62341 - 62357

Published: Jan. 1, 2024

The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits

Language: Английский

Integrated risk assessment framework for transformation products of emerging contaminants: what we know and what we should know DOI
Shengqi Zhang, Qian Yin, Siqin Wang

et al.

Frontiers of Environmental Science & Engineering, Journal Year: 2023, Volume and Issue: 17(7)

Published: May 10, 2023

Language: Английский

Citations

11

GC × GC and computational strategies for detecting and analyzing environmental contaminants DOI
Teruyo Ieda, Shunji Hashimoto

TrAC Trends in Analytical Chemistry, Journal Year: 2023, Volume and Issue: 165, P. 117118 - 117118

Published: June 1, 2023

Language: Английский

Citations

11

Using machine learning to explore oxyanion adsorption ability of goethite with different specific surface area DOI
Kai Chen, Chuling Guo,

Chaoping Wang

et al.

Environmental Pollution, Journal Year: 2023, Volume and Issue: 343, P. 123162 - 123162

Published: Dec. 16, 2023

Language: Английский

Citations

10

POPs identification using simple low-code machine learning DOI
Xin Lei,

Haiying Yu,

Sisi Liu

et al.

The Science of The Total Environment, Journal Year: 2024, Volume and Issue: 921, P. 171143 - 171143

Published: Feb. 20, 2024

Language: Английский

Citations

4

Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search DOI Creative Commons
Ashis Kumar Mandal, Md Nadim, Hasi Saha

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 62341 - 62357

Published: Jan. 1, 2024

The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits

Language: Английский

Citations

4