Identifying
primary
estrogen
receptor
(ER)
agonists
in
municipal
sewage
is
essential
for
ensuring
the
health
of
aquatic
environments.
Given
complex
and
variable
chemical
composition
sewage,
predominant
ER
remain
unclear.
High-resolution
mass
spectrometry
(HRMS)-based
models
have
been
developed
to
predict
compound
bioactivity
matrices,
but
further
optimization
needed
effectively
bridge
HRMS
features
with
agonists.
To
address
this
challenge,
an
FT-GNN
(fragmentation
tree-based
graph
neural
network)
model
was
proposed.
limited
data
class
imbalance,
augmentation
performed
using
predictions
within
applicability
domain
(AD)
oversampling
technique
(OTE).
Model
development
results
demonstrated
that
integrating
improved
balanced
accuracy
(bACC)
value
by
6%-31%.
The
model,
a
high
bACC
identify
more
true
agonists,
efficiently
classified
tens
thousands
unidentified
reducing
postprocessing
workload
nontargeted
screening.
Analysis
agonist
transformation
during
treatment
revealed
anaerobic
stage
as
key
both
their
removal
formation.
Estrogenic
effect
balance
analysis
suggests
α-E2
9,11-didehydroestriol
may
be
two
previously
overlooked
Collectively,
application
are
crucial
advancements
toward
credible
tracking
efficient
control
estrogenic
risks
water.
Environment International,
Journal Year:
2025,
Volume and Issue:
unknown, P. 109404 - 109404
Published: March 1, 2025
Emerging
environmental
contaminants
(EECs)
such
as
pharmaceuticals,
pesticides,
and
industrial
chemicals
pose
significant
challenges
for
detection
identification
due
to
their
structural
diversity
lack
of
analytical
standards.
Traditional
targeted
screening
methods
often
fail
detect
these
compounds,
making
non-target
analysis
(NTA)
using
high-resolution
mass
spectrometry
(HRMS)
essential
identifying
unknown
or
suspected
contaminants.
However,
interpreting
the
vast
datasets
generated
by
HRMS
is
complex
requires
advanced
data
processing
techniques.
Recent
advancements
in
machine
learning
(ML)
models
offer
great
potential
enhancing
NTA
applications.
As
such,
we
reviewed
key
developments,
including
optimizing
workflows
computational
tools,
improved
chemical
structure
identification,
quantification
methods,
enhanced
toxicity
prediction
capabilities.
It
also
discusses
future
perspectives
field,
refining
ML
tools
mixtures,
improving
inter-laboratory
validation,
further
integrating
into
risk
assessment
frameworks.
By
addressing
challenges,
ML-assisted
can
significantly
enhance
detection,
quantification,
evaluation
EECs,
ultimately
contributing
more
effective
monitoring
public
health
protection.
Identifying
primary
estrogen
receptor
(ER)
agonists
in
municipal
sewage
is
essential
for
ensuring
the
health
of
aquatic
environments.
Given
complex
and
variable
chemical
composition
sewage,
predominant
ER
remain
unclear.
High-resolution
mass
spectrometry
(HRMS)-based
models
have
been
developed
to
predict
compound
bioactivity
matrices,
but
further
optimization
needed
effectively
bridge
HRMS
features
with
agonists.
To
address
this
challenge,
an
FT-GNN
(fragmentation
tree-based
graph
neural
network)
model
was
proposed.
limited
data
class
imbalance,
augmentation
performed
using
predictions
within
applicability
domain
(AD)
oversampling
technique
(OTE).
Model
development
results
demonstrated
that
integrating
improved
balanced
accuracy
(bACC)
value
by
6%-31%.
The
model,
a
high
bACC
identify
more
true
agonists,
efficiently
classified
tens
thousands
unidentified
reducing
postprocessing
workload
nontargeted
screening.
Analysis
agonist
transformation
during
treatment
revealed
anaerobic
stage
as
key
both
their
removal
formation.
Estrogenic
effect
balance
analysis
suggests
α-E2
9,11-didehydroestriol
may
be
two
previously
overlooked
Collectively,
application
are
crucial
advancements
toward
credible
tracking
efficient
control
estrogenic
risks
water.