An algorithmic framework for synthetic cost-aware decision making in molecular design
Jenna C. Fromer,
No information about this author
Connor W. Coley
No information about this author
Nature Computational Science,
Journal Year:
2024,
Volume and Issue:
4(6), P. 440 - 450
Published: June 17, 2024
Language: Английский
Graph-Based Deep Learning Models for Thermodynamic Property Prediction: The Interplay between Target Definition, Data Distribution, Featurization, and Model Architecture
Journal of Chemical Information and Modeling,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 9, 2025
In
this
contribution,
we
examine
the
interplay
between
target
definition,
data
distribution,
featurization
approaches,
and
model
architectures
on
graph-based
deep
learning
models
for
thermodynamic
property
prediction.
Through
consideration
of
five
curated
sets,
exhibiting
diversity
in
elemental
composition,
multiplicity,
charge
state,
size,
impact
each
these
factors
accuracy.
We
observe
that
i.e.,
using
formation
instead
atomization
energy/enthalpy,
is
a
decisive
factor,
so
careful
selection
approach.
Our
attempts
at
directly
modifying
result
more
modest,
though
not
negligible,
accuracy
gains.
Remarkably,
molecule-level
predictions
tend
to
outperform
atom-level
increment
predictions,
contrast
previous
findings.
Overall,
work
paves
way
toward
development
robust
with
universal
capabilities,
can
reach
excellent
across
sets
compound
domains.
Language: Английский
Improving the Reliability of, and Confidence in, DFT Functional Benchmarking through Active Learning
Journal of Chemical Theory and Computation,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 2, 2025
Validating
the
performance
of
exchange-correlation
functionals
is
vital
to
ensure
reliability
density
functional
theory
(DFT)
calculations.
Typically,
these
validations
involve
benchmarking
data
sets.
Currently,
such
sets
are
usually
assembled
in
an
unprincipled
manner,
suffering
from
uncontrolled
chemical
bias,
and
limiting
transferability
results
a
broader
space.
In
this
work,
data-efficient
solution
based
on
active
learning
explored
address
issue.
Focusing─as
proof
principle─on
pericyclic
reactions,
we
start
BH9
set
design
reaction
space
around
initial
by
combinatorially
combining
templates
substituents.
Next,
surrogate
model
trained
predict
standard
deviation
activation
energies
computed
across
selection
20
distinct
DFT
functionals.
With
model,
designed
explored,
enabling
identification
challenging
regions,
i.e.,
regions
with
large
divergence,
for
which
representative
reactions
subsequently
acquired
as
additional
training
points.
Remarkably,
it
turns
out
that
function
mapping
molecular
structure
divergence
readily
learnable;
convergence
reached
upon
acquisition
fewer
than
100
reactions.
our
final
updated
more
challenging─and
arguably
representative─pericyclic
curated,
demonstrate
has
changed
significantly
compared
original
subset.
Language: Английский
Repurposing quantum chemical descriptor datasets for on-the-fly generation of informative reaction representations: application to hydrogen atom transfer reactions
Digital Discovery,
Journal Year:
2024,
Volume and Issue:
3(5), P. 919 - 931
Published: Jan. 1, 2024
In
this
work,
we
explore
how
existing
datasets
of
quantum
chemical
properties
can
be
repurposed
to
build
data-efficient
downstream
ML
models,
with
a
particular
focus
on
predicting
the
activation
energy
hydrogen
atom
transfer
reactions.
Language: Английский
Uncertainty Qualification for Deep Learning-Based Elementary Reaction Property Prediction
Journal of Chemical Information and Modeling,
Journal Year:
2024,
Volume and Issue:
64(21), P. 8131 - 8141
Published: Oct. 23, 2024
The
prediction
of
the
thermodynamic
and
kinetic
properties
elementary
reactions
has
shown
rapid
improvement
due
to
implementation
deep
learning
(DL)
methods.
While
various
studies
have
reported
success
in
predicting
reaction
properties,
quantification
uncertainty
seldom
been
investigated,
thus
compromising
confidence
using
these
predicted
practical
applications.
Here,
we
integrated
graph
convolutional
neural
networks
(GCNN)
with
three
techniques,
including
ensemble,
Monte
Carlo
(MC)-dropout,
evidential
learning,
provide
insights
into
utility.
ensemble
model
outperforms
others
accuracy
shows
highest
reliability
estimating
across
all
property
data
sets.
We
also
verified
that
showed
a
satisfactory
capability
recognizing
epistemic
aleatoric
uncertainties.
Additionally,
adopted
Tree
Search
method
for
extracting
explainable
substructures,
providing
chemical
explanation
DL
corresponding
Finally,
demonstrate
utility
qualification
applications,
performed
an
uncertainty-guided
calibration
DL-constructed
model,
which
achieved
25%
higher
hit
ratio
identifying
dominant
pathways
compared
without
guidance.
Language: Английский