Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search DOI Creative Commons
Ashis Kumar Mandal, Md Nadim, Hasi Saha

и другие.

IEEE Access, Год журнала: 2024, Номер 12, С. 62341 - 62357

Опубликована: Янв. 1, 2024

The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection an optimal subset a vast space creates significant computational challenge. In domain HDLSS data, conventional methods face challenges in achieving balance between reducing number features preserving high accuracy. Addressing these issues, study introduces effective framework that employs filter wrapper-based strategy specifically designed address inherent data. adopts multi-step approach where ensemble integrates five ranking approaches: Chi-square (χ 2 ), Gini index (GI), F-score, Mutual Information (MI), Symmetric uncertainty (SU) identify top-ranking features. subsequent stage, search method utilized, which Differential Evaluation (DE) metaheuristic algorithm as strategy. fitness during assessed based on weighted combination error rate Support Vector Machine (SVM) classifier cardinality subset. datasets, now with reduced dimensionality, are subsequently employed build models SVM, K-Nearest Neighbors (KNN), Logistic Regression (LR).The proposed was evaluated 13 datasets assess its efficacy selecting appropriate improving Classification Accuracy (ACC) analog Area Under Curve (AUC).The produces smaller (ranging 2 9 for all datasets), while maintaining commendable average AUC ACC (between 98% 100%). comparative results demonstrate outperforms both non-feature approaches terms ACC. Furthermore, when compared several other state-of-the-art approaches, exhibits

Язык: Английский

Data-Driven Machine Learning in Environmental Pollution: Gains and Problems DOI
Xian Liu, Dawei Lü, Aiqian Zhang

и другие.

Environmental Science & Technology, Год журнала: 2022, Номер 56(4), С. 2124 - 2133

Опубликована: Янв. 27, 2022

The complexity and dynamics of the environment make it extremely difficult to directly predict trace temporal spatial changes in pollution. In past decade, unprecedented accumulation data, development high-performance computing power, rise diverse machine learning (ML) methods provide new opportunities for environmental pollution research. ML methodology has been used satellite data processing obtain ground-level concentrations atmospheric pollutants, source apportionment, distribution modeling water pollutants. However, unlike active practices chemical toxicity prediction, advanced algorithms such as deep neural networks process studies pollutants are still deficient. addition, over 40% applications go air pollution, its application range acceptance other aspects science remain be increased. use revolutionize problem-solving scenarios own challenges. Several issues should taken into consideration, tradeoff between model performance interpretability, prerequisites model, selection, sharing.

Язык: Английский

Процитировано

300

Machine Learning in Environmental Research: Common Pitfalls and Best Practices DOI
Jun‐Jie Zhu, Meiqi Yang, Zhiyong Jason Ren

и другие.

Environmental Science & Technology, Год журнала: 2023, Номер 57(46), С. 17671 - 17689

Опубликована: Июнь 29, 2023

Machine learning (ML) is increasingly used in environmental research to process large data sets and decipher complex relationships between system variables. However, due the lack of familiarity methodological rigor, inadequate ML studies may lead spurious conclusions. In this study, we synthesized literature analysis with our own experience provided a tutorial-like compilation common pitfalls along best practice guidelines for research. We identified more than 30 key items evidence-based based on 148 highly cited articles exhibit misconceptions terminologies, proper sample size feature size, enrichment selection, randomness assessment, leakage management, splitting, method selection comparison, model optimization evaluation, explainability causality. By analyzing good examples supervised reference modeling paradigms, hope help researchers adopt rigorous preprocessing development standards accurate, robust, practicable uses applications.

Язык: Английский

Процитировано

254

Integrating low-cost sensor monitoring, satellite mapping, and geospatial artificial intelligence for intra-urban air pollution predictions DOI Creative Commons
Lü Liang, Jacob Daniels, Colleen P. Bailey

и другие.

Environmental Pollution, Год журнала: 2023, Номер 331, С. 121832 - 121832

Опубликована: Май 18, 2023

There is a growing need to apply geospatial artificial intelligence analysis disparate environmental datasets find solutions that benefit frontline communities. One such critically needed solution the prediction of health-relevant ambient ground-level air pollution concentrations. However, many challenges exist surrounding size and representativeness limited ground reference stations for model development, reconciling multi-source data, interpretability deep learning models. This research addresses these by leveraging strategically deployed, extensive low-cost sensor (LCS) network was rigorously calibrated through an optimized neural network. A set raster predictors with varying data quality spatial scales retrieved processed, including gap-filled satellite aerosol optical depth products airborne LiDAR-derived 3D urban form. We developed multi-scale, attention-enhanced convolutional reconcile LCS measurements estimating daily PM2.5 concentration at 30-m resolution. employs advanced approach using geostatistical kriging method generate baseline pattern multi-scale residual identify both regional patterns localized events high-frequency feature retention. further used permutation tests quantify importance, which has rarely been done in DL applications science. Finally, we demonstrated one application investigating inequality issue across within various urbanization levels block group scale. Overall, this demonstrates potential AI provide actionable addressing critical issues.

Язык: Английский

Процитировано

50

Multimodal Machine Learning Guides Low Carbon Aeration Strategies in Urban Wastewater Treatment DOI Creative Commons
Hongcheng Wang, Yuqi Wang, Xu Wang

и другие.

Engineering, Год журнала: 2024, Номер 36, С. 51 - 62

Опубликована: Фев. 9, 2024

The potential for reducing greenhouse gas (GHG) emissions and energy consumption in wastewater treatment can be realized through intelligent control, with machine learning (ML) multimodality emerging as a promising solution. Here, we introduce an ML technique based on multimodal strategies, focusing specifically aeration control plants (WWTPs). generalization of the strategy is demonstrated eight models. results demonstrate that this significantly enhances model indicators environmental science efficiency exhibiting exceptional performance interpretability. Integrating random forest visual models achieves highest accuracy forecasting quantity models, mean absolute percentage error 4.4% coefficient determination 0.948. Practical testing full-scale plant reveals reduce operation costs by 19.8% compared to traditional fuzzy methods. application these strategies critical water domains discussed. To foster accessibility promote widespread adoption, are freely available GitHub, thereby eliminating technical barriers encouraging artificial intelligence urban treatment.

Язык: Английский

Процитировано

31

Glyphosate Separating and Sensing for Precision Agriculture and Environmental Protection in the Era of Smart Materials DOI Creative Commons
Jarosław Mazuryk, Katarzyna Klepacka, Włodzimierz Kutner

и другие.

Environmental Science & Technology, Год журнала: 2023, Номер 57(27), С. 9898 - 9924

Опубликована: Июнь 29, 2023

The present article critically and comprehensively reviews the most recent reports on smart sensors for determining glyphosate (GLP), an active agent of GLP-based herbicides (GBHs) traditionally used in agriculture over past decades. Commercialized 1974, GBHs have now reached 350 million hectares crops 140 countries with annual turnover 11 billion USD worldwide. However, rolling exploitation GLP last decades has led to environmental pollution, animal intoxication, bacterial resistance, sustained occupational exposure herbicide farm companies' workers. Intoxication these dysregulates microbiome-gut-brain axis, cholinergic neurotransmission, endocrine system, causing paralytic ileus, hyperkalemia, oliguria, pulmonary edema, cardiogenic shock. Precision agriculture, i.e., (information technology)-enhanced approach crop management, including a site-specific determination agrochemicals, derives from benefits materials (SMs), data science, nanosensors. Those typically feature fluorescent molecularly imprinted polymers or immunochemical aptamer artificial receptors integrated electrochemical transducers. Fabricated as portable wearable lab-on-chips, smartphones, soft robotics connected SM-based devices that provide machine learning algorithms online databases, they integrate, process, analyze, interpret massive amounts spatiotemporal user-friendly decision-making manner. Exploited ultrasensitive toxins, GLP, will become practical tools farmlands point-of-care testing. Expectedly, can be personalized diagnostics, real-time water, food, soil, air quality monitoring, control.

Язык: Английский

Процитировано

28

Machine learning strategy secures urban smart drinking water treatment plant through incremental advances DOI
Yuqi Wang, Hongcheng Wang, Zijie Xiao

и другие.

Water Research, Год журнала: 2025, Номер unknown, С. 123541 - 123541

Опубликована: Март 1, 2025

Язык: Английский

Процитировано

2

Comprehensive characterization of per- and polyfluoroalkyl substances in wastewater by liquid chromatography-mass spectrometry and screening algorithms DOI Creative Commons
Caiming Tang,

Yutao Liang,

Kai Wang

и другие.

npj Clean Water, Год журнала: 2023, Номер 6(1)

Опубликована: Фев. 9, 2023

Abstract Per- and polyfluoroalkyl substances (PFASs) constitute a large category of synthetic environmental pollutants, many which remain unknown warrant comprehensive investigation. This study comprehensively characterized PFASs in fluorinated-industrial wastewater by nontarget, quasi-target target analyses using liquid chromatography-high-resolution mass spectrometry data-processing algorithms. The algorithms based on characteristic in-source neutral losses isotopologue distributions were applied to screening identifying PFASs, while semiquantitative quantitative utilized determine their concentrations the wastewater. In total, 175 formulae including traditional, little-known species identified further ascertained terms distributions. total 5.3–33.4 μg mL −1 , indicating serious pollution PFASs. not only provides an efficient approach for identification but also presents practicable simple way depict signatures

Язык: Английский

Процитировано

23

Machine learning-assisted data filtering and QSAR models for prediction of chemical acute toxicity on rat and mouse DOI
Tao Bo,

Yaohui Lin,

Jinglong Han

и другие.

Journal of Hazardous Materials, Год журнала: 2023, Номер 452, С. 131344 - 131344

Опубликована: Апрель 1, 2023

Язык: Английский

Процитировано

20

Machine learning-enhanced photocatalysis for environmental sustainability: Integration and applications DOI
Augustine Jaison, Anandhu Mohan, Young‐Chul Lee

и другие.

Materials Science and Engineering R Reports, Год журнала: 2024, Номер 161, С. 100880 - 100880

Опубликована: Ноя. 14, 2024

Язык: Английский

Процитировано

9

Machine-learning-driven discovery of metal–organic framework adsorbents for hexavalent chromium removal from aqueous environments DOI
Mingxing Jiang, Wei‐Wei Fu, Ying Wang

и другие.

Journal of Colloid and Interface Science, Год журнала: 2024, Номер 662, С. 836 - 845

Опубликована: Фев. 14, 2024

Язык: Английский

Процитировано

7