Active learning of molecular data for task-specific objectives
The Journal of Chemical Physics,
Journal Year:
2025,
Volume and Issue:
162(1)
Published: Jan. 2, 2025
Active
learning
(AL)
has
shown
promise
to
be
a
particularly
data-efficient
machine
approach.
Yet,
its
performance
depends
on
the
application,
and
it
is
not
clear
when
AL
practitioners
can
expect
computational
savings.
Here,
we
carry
out
systematic
assessment
for
three
diverse
molecular
datasets
two
common
scientific
tasks:
compiling
compact,
informative
targeted
searches.
We
implemented
with
Gaussian
processes
(GP)
used
many-body
tensor
as
representation.
For
first
task,
tested
different
data
acquisition
strategies,
batch
sizes,
GP
noise
settings.
was
insensitive
size,
observed
best
strategy
that
combines
uncertainty
reduction
clustering
promote
diversity.
However,
optimal
settings,
did
outperform
randomized
selection
of
points.
Conversely,
searches,
outperformed
random
sampling
achieved
savings
up
64%.
Our
analysis
provides
insight
into
this
task-specific
difference
in
terms
target
distributions
collection
strategies.
established
relative
distribution
molecules
comparison
total
dataset
distribution,
largest
their
overlap
minimal.
Language: Английский
Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
Atmospheric chemistry and physics,
Journal Year:
2025,
Volume and Issue:
25(1), P. 685 - 704
Published: Jan. 17, 2025
Abstract.
Chemical
ionization
mass
spectrometry
(CIMS)
is
widely
used
in
atmospheric
chemistry
studies.
However,
due
to
the
complex
interactions
between
reagent
ions
and
target
compounds,
chemical
understanding
remains
limited
compound
identification
difficult.
In
this
study,
we
apply
machine
learning
a
reference
dataset
of
pesticides
two
standard
solutions
build
model
that
can
provide
insights
from
CIMS
analyses
science.
The
measurements
were
performed
with
an
Orbitrap
spectrometer
coupled
thermal
desorption
multi-scheme
inlet
unit
(TD-MION-MS)
both
negative
positive
modes
utilizing
Br−,
O2-,
H3O+
(CH3)2COH+
(AceH+)
as
ions.
We
then
trained
methods
on
these
data:
(1)
random
forest
(RF)
for
classifying
if
pesticide
be
detected
(2)
kernel
ridge
regression
(KRR)
predicting
expected
signals.
compared
their
performance
five
different
representations
molecular
structure:
topological
fingerprint
(TopFP),
access
system
keys
(MACCS),
custom
descriptor
based
properties
(RDKitPROP),
Coulomb
matrix
(CM)
many-body
tensor
representation
(MBTR).
results
indicate
MACCS
outperforms
other
descriptors.
Our
best
classification
reaches
prediction
accuracy
0.85
±
0.02
receiver
operating
characteristic
curve
area
0.91
0.01.
0.44
0.03
logarithmic
units
signal
intensity.
Subsequent
feature
importance
analysis
classifiers
reveals
most
important
sub-structures
are
NH
OH
schemes
nitrogen-containing
groups
schemes.
Language: Английский
Accurate Modeling of the Potential Energy Surface of Molecular Clusters Boosted by Neural Networks
Environmental Science Advances,
Journal Year:
2024,
Volume and Issue:
3(10), P. 1438 - 1451
Published: Jan. 1, 2024
We
present
the
application
of
machine
learning
methods
to
alleviate
computational
cost
quantum
chemistry
calculations
required
for
modeling
atmospheric
molecular
clusters.
Language: Английский
Predicting Composition Evolution for a Sulfuric Acid-Dimethylamine System from Monomer to Nanoparticle Using Machine Learning
The Journal of Physical Chemistry A,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 25, 2024
Experimental
and
theoretical
studies
on
the
compositional
changes
of
new
particle
formation
in
nucleation
initial
growth
stages
acid–base
systems
(2
5
nm)
are
extremely
challenging.
This
study
proposes
a
machine
learning
method
for
predicting
composition
change
sulfuric
acid-dimethylamine
system
transformation
from
monomer
to
nanoparticle
by
structure
information
small-sized
acid
(SA)–dimethylamine
(DMA)
molecular
clusters.
Based
this
components,
we
found
that
was
mainly
through
alternate
adsorption
(SA)1(DMA)1,
(SA)1(DMA)2,
(SA)1
clusters
at
early
stage
nucleation,
which
accounted
about
70,
20,
10%,
respectively.
can
explain
nature
possible
cluster
acidity
during
system.
also
predict
base-stabilization
mechanism
without
relying
any
experimental
data,
thereby
yielding
results
consistent
with
those
previous
measurement.
Language: Английский