IEEE Access,
Год журнала:
2024,
Номер
12, С. 62341 - 62357
Опубликована: Янв. 1, 2024
The
identification
of
suitable
feature
subsets
from
High-Dimensional
Low-Sample-Size
(HDLSS)
data
is
paramount
importance
because
this
dataset
often
contains
numerous
redundant
and
irrelevant
features,
leading
to
poor
classification
performance.
However,
the
selection
an
optimal
subset
a
vast
space
creates
significant
computational
challenge.
In
domain
HDLSS
data,
conventional
methods
face
challenges
in
achieving
balance
between
reducing
number
features
preserving
high
accuracy.
Addressing
these
issues,
study
introduces
effective
framework
that
employs
filter
wrapper-based
strategy
specifically
designed
address
inherent
data.
adopts
multi-step
approach
where
ensemble
integrates
five
ranking
approaches:
Chi-square
(χ
2
),
Gini
index
(GI),
F-score,
Mutual
Information
(MI),
Symmetric
uncertainty
(SU)
identify
top-ranking
features.
subsequent
stage,
search
method
utilized,
which
Differential
Evaluation
(DE)
metaheuristic
algorithm
as
strategy.
fitness
during
assessed
based
on
weighted
combination
error
rate
Support
Vector
Machine
(SVM)
classifier
cardinality
subset.
datasets,
now
with
reduced
dimensionality,
are
subsequently
employed
build
models
SVM,
K-Nearest
Neighbors
(KNN),
Logistic
Regression
(LR).The
proposed
was
evaluated
13
datasets
assess
its
efficacy
selecting
appropriate
improving
Classification
Accuracy
(ACC)
analog
Area
Under
Curve
(AUC).The
produces
smaller
(ranging
2
9
for
all
datasets),
while
maintaining
commendable
average
AUC
ACC
(between
98%
100%).
comparative
results
demonstrate
outperforms
both
non-feature
approaches
terms
ACC.
Furthermore,
when
compared
several
other
state-of-the-art
approaches,
exhibits
Environmental Science & Technology,
Год журнала:
2022,
Номер
56(4), С. 2124 - 2133
Опубликована: Янв. 27, 2022
The
complexity
and
dynamics
of
the
environment
make
it
extremely
difficult
to
directly
predict
trace
temporal
spatial
changes
in
pollution.
In
past
decade,
unprecedented
accumulation
data,
development
high-performance
computing
power,
rise
diverse
machine
learning
(ML)
methods
provide
new
opportunities
for
environmental
pollution
research.
ML
methodology
has
been
used
satellite
data
processing
obtain
ground-level
concentrations
atmospheric
pollutants,
source
apportionment,
distribution
modeling
water
pollutants.
However,
unlike
active
practices
chemical
toxicity
prediction,
advanced
algorithms
such
as
deep
neural
networks
process
studies
pollutants
are
still
deficient.
addition,
over
40%
applications
go
air
pollution,
its
application
range
acceptance
other
aspects
science
remain
be
increased.
use
revolutionize
problem-solving
scenarios
own
challenges.
Several
issues
should
taken
into
consideration,
tradeoff
between
model
performance
interpretability,
prerequisites
model,
selection,
sharing.
Environmental Science & Technology,
Год журнала:
2023,
Номер
57(46), С. 17671 - 17689
Опубликована: Июнь 29, 2023
Machine
learning
(ML)
is
increasingly
used
in
environmental
research
to
process
large
data
sets
and
decipher
complex
relationships
between
system
variables.
However,
due
the
lack
of
familiarity
methodological
rigor,
inadequate
ML
studies
may
lead
spurious
conclusions.
In
this
study,
we
synthesized
literature
analysis
with
our
own
experience
provided
a
tutorial-like
compilation
common
pitfalls
along
best
practice
guidelines
for
research.
We
identified
more
than
30
key
items
evidence-based
based
on
148
highly
cited
articles
exhibit
misconceptions
terminologies,
proper
sample
size
feature
size,
enrichment
selection,
randomness
assessment,
leakage
management,
splitting,
method
selection
comparison,
model
optimization
evaluation,
explainability
causality.
By
analyzing
good
examples
supervised
reference
modeling
paradigms,
hope
help
researchers
adopt
rigorous
preprocessing
development
standards
accurate,
robust,
practicable
uses
applications.
Environmental Pollution,
Год журнала:
2023,
Номер
331, С. 121832 - 121832
Опубликована: Май 18, 2023
There
is
a
growing
need
to
apply
geospatial
artificial
intelligence
analysis
disparate
environmental
datasets
find
solutions
that
benefit
frontline
communities.
One
such
critically
needed
solution
the
prediction
of
health-relevant
ambient
ground-level
air
pollution
concentrations.
However,
many
challenges
exist
surrounding
size
and
representativeness
limited
ground
reference
stations
for
model
development,
reconciling
multi-source
data,
interpretability
deep
learning
models.
This
research
addresses
these
by
leveraging
strategically
deployed,
extensive
low-cost
sensor
(LCS)
network
was
rigorously
calibrated
through
an
optimized
neural
network.
A
set
raster
predictors
with
varying
data
quality
spatial
scales
retrieved
processed,
including
gap-filled
satellite
aerosol
optical
depth
products
airborne
LiDAR-derived
3D
urban
form.
We
developed
multi-scale,
attention-enhanced
convolutional
reconcile
LCS
measurements
estimating
daily
PM2.5
concentration
at
30-m
resolution.
employs
advanced
approach
using
geostatistical
kriging
method
generate
baseline
pattern
multi-scale
residual
identify
both
regional
patterns
localized
events
high-frequency
feature
retention.
further
used
permutation
tests
quantify
importance,
which
has
rarely
been
done
in
DL
applications
science.
Finally,
we
demonstrated
one
application
investigating
inequality
issue
across
within
various
urbanization
levels
block
group
scale.
Overall,
this
demonstrates
potential
AI
provide
actionable
addressing
critical
issues.
Engineering,
Год журнала:
2024,
Номер
36, С. 51 - 62
Опубликована: Фев. 9, 2024
The
potential
for
reducing
greenhouse
gas
(GHG)
emissions
and
energy
consumption
in
wastewater
treatment
can
be
realized
through
intelligent
control,
with
machine
learning
(ML)
multimodality
emerging
as
a
promising
solution.
Here,
we
introduce
an
ML
technique
based
on
multimodal
strategies,
focusing
specifically
aeration
control
plants
(WWTPs).
generalization
of
the
strategy
is
demonstrated
eight
models.
results
demonstrate
that
this
significantly
enhances
model
indicators
environmental
science
efficiency
exhibiting
exceptional
performance
interpretability.
Integrating
random
forest
visual
models
achieves
highest
accuracy
forecasting
quantity
models,
mean
absolute
percentage
error
4.4%
coefficient
determination
0.948.
Practical
testing
full-scale
plant
reveals
reduce
operation
costs
by
19.8%
compared
to
traditional
fuzzy
methods.
application
these
strategies
critical
water
domains
discussed.
To
foster
accessibility
promote
widespread
adoption,
are
freely
available
GitHub,
thereby
eliminating
technical
barriers
encouraging
artificial
intelligence
urban
treatment.
Environmental Science & Technology,
Год журнала:
2023,
Номер
57(27), С. 9898 - 9924
Опубликована: Июнь 29, 2023
The
present
article
critically
and
comprehensively
reviews
the
most
recent
reports
on
smart
sensors
for
determining
glyphosate
(GLP),
an
active
agent
of
GLP-based
herbicides
(GBHs)
traditionally
used
in
agriculture
over
past
decades.
Commercialized
1974,
GBHs
have
now
reached
350
million
hectares
crops
140
countries
with
annual
turnover
11
billion
USD
worldwide.
However,
rolling
exploitation
GLP
last
decades
has
led
to
environmental
pollution,
animal
intoxication,
bacterial
resistance,
sustained
occupational
exposure
herbicide
farm
companies'
workers.
Intoxication
these
dysregulates
microbiome-gut-brain
axis,
cholinergic
neurotransmission,
endocrine
system,
causing
paralytic
ileus,
hyperkalemia,
oliguria,
pulmonary
edema,
cardiogenic
shock.
Precision
agriculture,
i.e.,
(information
technology)-enhanced
approach
crop
management,
including
a
site-specific
determination
agrochemicals,
derives
from
benefits
materials
(SMs),
data
science,
nanosensors.
Those
typically
feature
fluorescent
molecularly
imprinted
polymers
or
immunochemical
aptamer
artificial
receptors
integrated
electrochemical
transducers.
Fabricated
as
portable
wearable
lab-on-chips,
smartphones,
soft
robotics
connected
SM-based
devices
that
provide
machine
learning
algorithms
online
databases,
they
integrate,
process,
analyze,
interpret
massive
amounts
spatiotemporal
user-friendly
decision-making
manner.
Exploited
ultrasensitive
toxins,
GLP,
will
become
practical
tools
farmlands
point-of-care
testing.
Expectedly,
can
be
personalized
diagnostics,
real-time
water,
food,
soil,
air
quality
monitoring,
control.
Abstract
Per-
and
polyfluoroalkyl
substances
(PFASs)
constitute
a
large
category
of
synthetic
environmental
pollutants,
many
which
remain
unknown
warrant
comprehensive
investigation.
This
study
comprehensively
characterized
PFASs
in
fluorinated-industrial
wastewater
by
nontarget,
quasi-target
target
analyses
using
liquid
chromatography-high-resolution
mass
spectrometry
data-processing
algorithms.
The
algorithms
based
on
characteristic
in-source
neutral
losses
isotopologue
distributions
were
applied
to
screening
identifying
PFASs,
while
semiquantitative
quantitative
utilized
determine
their
concentrations
the
wastewater.
In
total,
175
formulae
including
traditional,
little-known
species
identified
further
ascertained
terms
distributions.
total
5.3–33.4
μg
mL
−1
,
indicating
serious
pollution
PFASs.
not
only
provides
an
efficient
approach
for
identification
but
also
presents
practicable
simple
way
depict
signatures