Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search DOI Creative Commons
Ashis Kumar Mandal, Md Nadim, Hasi Saha

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 62341 - 62357

Published: Jan. 1, 2024

The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits

Language: Английский

Data-Driven Machine Learning in Environmental Pollution: Gains and Problems DOI
Xian Liu, Dawei Lü, Aiqian Zhang

et al.

Environmental Science & Technology, Journal Year: 2022, Volume and Issue: 56(4), P. 2124 - 2133

Published: Jan. 27, 2022

The complexity and dynamics of the environment make it extremely difficult to directly predict trace temporal spatial changes in pollution. In past decade, unprecedented accumulation data, development high-performance computing power, rise diverse machine learning (ML) methods provide new opportunities for environmental pollution research. ML methodology has been used satellite data processing obtain ground-level concentrations atmospheric pollutants, source apportionment, distribution modeling water pollutants. However, unlike active practices chemical toxicity prediction, advanced algorithms such as deep neural networks process studies pollutants are still deficient. addition, over 40% applications go air pollution, its application range acceptance other aspects science remain be increased. use revolutionize problem-solving scenarios own challenges. Several issues should taken into consideration, tradeoff between model performance interpretability, prerequisites model, selection, sharing.

Language: Английский

Citations

300

Machine Learning in Environmental Research: Common Pitfalls and Best Practices DOI
Jun‐Jie Zhu, Meiqi Yang, Zhiyong Jason Ren

et al.

Environmental Science & Technology, Journal Year: 2023, Volume and Issue: 57(46), P. 17671 - 17689

Published: June 29, 2023

Machine learning (ML) is increasingly used in environmental research to process large data sets and decipher complex relationships between system variables. However, due the lack of familiarity methodological rigor, inadequate ML studies may lead spurious conclusions. In this study, we synthesized literature analysis with our own experience provided a tutorial-like compilation common pitfalls along best practice guidelines for research. We identified more than 30 key items evidence-based based on 148 highly cited articles exhibit misconceptions terminologies, proper sample size feature size, enrichment selection, randomness assessment, leakage management, splitting, method selection comparison, model optimization evaluation, explainability causality. By analyzing good examples supervised reference modeling paradigms, hope help researchers adopt rigorous preprocessing development standards accurate, robust, practicable uses applications.

Language: Английский

Citations

254

Integrating low-cost sensor monitoring, satellite mapping, and geospatial artificial intelligence for intra-urban air pollution predictions DOI Creative Commons
Lü Liang, Jacob Daniels, Colleen P. Bailey

et al.

Environmental Pollution, Journal Year: 2023, Volume and Issue: 331, P. 121832 - 121832

Published: May 18, 2023

There is a growing need to apply geospatial artificial intelligence analysis disparate environmental datasets find solutions that benefit frontline communities. One such critically needed solution the prediction of health-relevant ambient ground-level air pollution concentrations. However, many challenges exist surrounding size and representativeness limited ground reference stations for model development, reconciling multi-source data, interpretability deep learning models. This research addresses these by leveraging strategically deployed, extensive low-cost sensor (LCS) network was rigorously calibrated through an optimized neural network. A set raster predictors with varying data quality spatial scales retrieved processed, including gap-filled satellite aerosol optical depth products airborne LiDAR-derived 3D urban form. We developed multi-scale, attention-enhanced convolutional reconcile LCS measurements estimating daily PM2.5 concentration at 30-m resolution. employs advanced approach using geostatistical kriging method generate baseline pattern multi-scale residual identify both regional patterns localized events high-frequency feature retention. further used permutation tests quantify importance, which has rarely been done in DL applications science. Finally, we demonstrated one application investigating inequality issue across within various urbanization levels block group scale. Overall, this demonstrates potential AI provide actionable addressing critical issues.

Language: Английский

Citations

50

Multimodal Machine Learning Guides Low Carbon Aeration Strategies in Urban Wastewater Treatment DOI Creative Commons
Hongcheng Wang, Yuqi Wang, Xu Wang

et al.

Engineering, Journal Year: 2024, Volume and Issue: 36, P. 51 - 62

Published: Feb. 9, 2024

The potential for reducing greenhouse gas (GHG) emissions and energy consumption in wastewater treatment can be realized through intelligent control, with machine learning (ML) multimodality emerging as a promising solution. Here, we introduce an ML technique based on multimodal strategies, focusing specifically aeration control plants (WWTPs). generalization of the strategy is demonstrated eight models. results demonstrate that this significantly enhances model indicators environmental science efficiency exhibiting exceptional performance interpretability. Integrating random forest visual models achieves highest accuracy forecasting quantity models, mean absolute percentage error 4.4% coefficient determination 0.948. Practical testing full-scale plant reveals reduce operation costs by 19.8% compared to traditional fuzzy methods. application these strategies critical water domains discussed. To foster accessibility promote widespread adoption, are freely available GitHub, thereby eliminating technical barriers encouraging artificial intelligence urban treatment.

Language: Английский

Citations

31

Glyphosate Separating and Sensing for Precision Agriculture and Environmental Protection in the Era of Smart Materials DOI Creative Commons
Jarosław Mazuryk, Katarzyna Klepacka, Włodzimierz Kutner

et al.

Environmental Science & Technology, Journal Year: 2023, Volume and Issue: 57(27), P. 9898 - 9924

Published: June 29, 2023

The present article critically and comprehensively reviews the most recent reports on smart sensors for determining glyphosate (GLP), an active agent of GLP-based herbicides (GBHs) traditionally used in agriculture over past decades. Commercialized 1974, GBHs have now reached 350 million hectares crops 140 countries with annual turnover 11 billion USD worldwide. However, rolling exploitation GLP last decades has led to environmental pollution, animal intoxication, bacterial resistance, sustained occupational exposure herbicide farm companies' workers. Intoxication these dysregulates microbiome-gut-brain axis, cholinergic neurotransmission, endocrine system, causing paralytic ileus, hyperkalemia, oliguria, pulmonary edema, cardiogenic shock. Precision agriculture, i.e., (information technology)-enhanced approach crop management, including a site-specific determination agrochemicals, derives from benefits materials (SMs), data science, nanosensors. Those typically feature fluorescent molecularly imprinted polymers or immunochemical aptamer artificial receptors integrated electrochemical transducers. Fabricated as portable wearable lab-on-chips, smartphones, soft robotics connected SM-based devices that provide machine learning algorithms online databases, they integrate, process, analyze, interpret massive amounts spatiotemporal user-friendly decision-making manner. Exploited ultrasensitive toxins, GLP, will become practical tools farmlands point-of-care testing. Expectedly, can be personalized diagnostics, real-time water, food, soil, air quality monitoring, control.

Language: Английский

Citations

28

Machine learning strategy secures urban smart drinking water treatment plant through incremental advances DOI
Yuqi Wang, Hongcheng Wang, Zijie Xiao

et al.

Water Research, Journal Year: 2025, Volume and Issue: unknown, P. 123541 - 123541

Published: March 1, 2025

Language: Английский

Citations

2

Comprehensive characterization of per- and polyfluoroalkyl substances in wastewater by liquid chromatography-mass spectrometry and screening algorithms DOI Creative Commons
Caiming Tang,

Yutao Liang,

Kai Wang

et al.

npj Clean Water, Journal Year: 2023, Volume and Issue: 6(1)

Published: Feb. 9, 2023

Abstract Per- and polyfluoroalkyl substances (PFASs) constitute a large category of synthetic environmental pollutants, many which remain unknown warrant comprehensive investigation. This study comprehensively characterized PFASs in fluorinated-industrial wastewater by nontarget, quasi-target target analyses using liquid chromatography-high-resolution mass spectrometry data-processing algorithms. The algorithms based on characteristic in-source neutral losses isotopologue distributions were applied to screening identifying PFASs, while semiquantitative quantitative utilized determine their concentrations the wastewater. In total, 175 formulae including traditional, little-known species identified further ascertained terms distributions. total 5.3–33.4 μg mL −1 , indicating serious pollution PFASs. not only provides an efficient approach for identification but also presents practicable simple way depict signatures

Language: Английский

Citations

23

Machine learning-assisted data filtering and QSAR models for prediction of chemical acute toxicity on rat and mouse DOI
Tao Bo,

Yaohui Lin,

Jinglong Han

et al.

Journal of Hazardous Materials, Journal Year: 2023, Volume and Issue: 452, P. 131344 - 131344

Published: April 1, 2023

Language: Английский

Citations

20

Machine learning-enhanced photocatalysis for environmental sustainability: Integration and applications DOI
Augustine Jaison, Anandhu Mohan, Young‐Chul Lee

et al.

Materials Science and Engineering R Reports, Journal Year: 2024, Volume and Issue: 161, P. 100880 - 100880

Published: Nov. 14, 2024

Language: Английский

Citations

9

Machine-learning-driven discovery of metal–organic framework adsorbents for hexavalent chromium removal from aqueous environments DOI
Mingxing Jiang, Wei‐Wei Fu, Ying Wang

et al.

Journal of Colloid and Interface Science, Journal Year: 2024, Volume and Issue: 662, P. 836 - 845

Published: Feb. 14, 2024

Language: Английский

Citations

7