Journal of Chemical Information and Modeling,
Год журнала:
2022,
Номер
62(6), С. 1376 - 1387
Опубликована: Март 10, 2022
There
is
significant
interest
and
importance
to
develop
robust
machine
learning
models
assist
organic
chemistry
synthesis.
Typically,
task-specific
for
distinct
reaction
prediction
tasks
have
been
developed.
In
this
work,
we
a
unified
deep
model,
T5Chem,
variety
of
chemical
predictions
by
adapting
the
"Text-to-Text
Transfer
Transformer"
(T5)
framework
in
natural
language
processing
(NLP).
On
basis
self-supervised
pretraining
with
PubChem
molecules,
T5Chem
model
can
achieve
state-of-the-art
performances
four
types
using
different
open-source
data
sets,
including
type
classification
on
USPTO_TPL,
forward
USPTO_MIT,
single-step
retrosynthesis
USPTO_50k,
yield
high-throughput
C–N
coupling
reactions.
Meanwhile,
introduced
new
multitask
set
USPTO_500_MT,
which
be
used
train
test
five
tasks,
above
as
well
reagent
suggestion
task.
Our
results
showed
that
trained
multiple
are
more
benefit
from
mutual
related
tasks.
Furthermore,
demonstrated
use
SHAP
(SHapley
Additive
exPlanations)
explain
at
functional
group
level,
provides
way
demystify
sequence-based
chemistry.
accessible
through
https://yzhang.hpc.nyu.edu/T5Chem.
Chemical Science,
Год журнала:
2022,
Номер
13(22), С. 6655 - 6668
Опубликована: Янв. 1, 2022
Transfer
learning
is
combined
with
active
to
discover
synthetic
reaction
conditions
in
a
small-data
regime.
This
strategy
tested
on
cross-coupling
reactions
from
high-throughput
experimentation
dataset
and
shows
promising
results.
Nature Chemistry,
Год журнала:
2023,
Номер
16(2), С. 239 - 248
Опубликована: Ноя. 23, 2023
Abstract
Late-stage
functionalization
is
an
economical
approach
to
optimize
the
properties
of
drug
candidates.
However,
chemical
complexity
molecules
often
makes
late-stage
diversification
challenging.
To
address
this
problem,
a
platform
based
on
geometric
deep
learning
and
high-throughput
reaction
screening
was
developed.
Considering
borylation
as
critical
step
in
functionalization,
computational
model
predicted
yields
for
diverse
conditions
with
mean
absolute
error
margin
4–5%,
while
reactivity
novel
reactions
known
unknown
substrates
classified
balanced
accuracy
92%
67%,
respectively.
The
regioselectivity
major
products
accurately
captured
classifier
F
-score
67%.
When
applied
23
commercial
molecules,
successfully
identified
numerous
opportunities
structural
diversification.
influence
steric
electronic
information
performance
quantified,
comprehensive
simple
user-friendly
format
introduced
that
proved
be
key
enabler
seamlessly
integrating
experimentation
functionalization.
ACS Central Science,
Год журнала:
2023,
Номер
9(12), С. 2196 - 2204
Опубликована: Дек. 8, 2023
Models
can
codify
our
understanding
of
chemical
reactivity
and
serve
a
useful
purpose
in
the
development
new
synthetic
processes
via,
for
example,
evaluating
hypothetical
reaction
conditions
or
silico
substrate
tolerance.
Perhaps
most
determining
factor
is
composition
training
data
whether
it
sufficient
to
train
model
that
make
accurate
predictions
over
full
domain
interest.
Here,
we
discuss
design
datasets
ways
are
conducive
data-driven
modeling,
emphasizing
idea
set
diversity
generalizability
rely
on
choice
molecular
representation.
We
additionally
experimental
constraints
associated
with
generating
common
types
chemistry
how
these
considerations
should
influence
dataset
building.
Molecular
representation
learning
(MRL)
is
a
key
step
to
build
the
connection
between
machine
and
chemical
science.
In
particular,
it
encodes
molecules
as
numerical
vectors
preserving
molecular
structures
features,
on
top
of
which
downstream
tasks
(e.g.,
property
prediction)
can
be
performed.
Recently,
MRL
has
achieved
considerable
progress,
especially
in
methods
based
deep
graph
learning.
this
survey,
we
systematically
review
these
graph-based
techniques,
incorporating
domain
knowledge.
Specifically,
first
introduce
features
2D
3D
graphs.
Then
summarize
categorize
into
three
groups
their
input.
Furthermore,
discuss
some
typical
applications
supported
by
MRL.
To
facilitate
studies
fast-developing
area,
also
list
benchmarks
commonly
used
datasets
paper.
Finally,
share
our
thoughts
future
research
directions.
In
the
past
decade,
computational
tools
have
become
integral
to
catalyst
design.
They
continue
offer
significant
support
experimental
organic
synthesis
and
catalysis
researchers
aiming
for
optimal
reaction
outcomes.
More
recently,
data-driven
approaches
utilizing
machine
learning
garnered
considerable
attention
their
expansive
capabilities.
This
Perspective
provides
an
overview
of
diverse
initiatives
in
realm
design
introduces
our
automated
tailored
high-throughput
silico
exploration
chemical
space.
While
valuable
insights
are
gained
through
methods
analysis
space,
degree
automation
modularity
key.
We
argue
that
integration
data-driven,
modular
workflows
is
key
enhancing
homogeneous
on
unprecedented
scale,
contributing
advancement
research.
Beilstein Journal of Organic Chemistry,
Год журнала:
2024,
Номер
20, С. 2476 - 2492
Опубликована: Окт. 4, 2024
This
review
surveys
the
recent
advances
and
challenges
in
predicting
optimizing
reaction
conditions
using
machine
learning
techniques.
The
paper
emphasizes
importance
of
acquiring
processing
large
diverse
datasets
chemical
reactions,
use
both
global
local
models
to
guide
design
synthetic
processes.
Global
exploit
information
from
comprehensive
databases
suggest
general
for
new
while
fine-tune
specific
parameters
a
given
family
improve
yield
selectivity.
also
identifies
current
limitations
opportunities
this
field,
such
as
data
quality
availability,
integration
high-throughput
experimentation.
demonstrates
how
combination
engineering,
science,
ML
algorithms
can
enhance
efficiency
effectiveness
design,
enable
novel
discoveries
chemistry.
Journal of Chemical Information and Modeling,
Год журнала:
2024,
Номер
64(8), С. 2955 - 2970
Опубликована: Март 15, 2024
Chemical
reactions
serve
as
foundational
building
blocks
for
organic
chemistry
and
drug
design.
In
the
era
of
large
AI
models,
data-driven
approaches
have
emerged
to
innovate
design
novel
reactions,
optimize
existing
ones
higher
yields,
discover
new
pathways
synthesizing
chemical
structures
comprehensively.
To
effectively
address
these
challenges
with
machine
learning
it
is
imperative
derive
robust
informative
representations
or
engage
in
feature
engineering
using
extensive
data
sets
reactions.
This
work
aims
provide
a
comprehensive
review
established
reaction
featurization
approaches,
offering
insights
into
selection
features
wide
array
tasks.
The
advantages
limitations
employing
SMILES,
molecular
fingerprints,
graphs,
physics-based
properties
are
meticulously
elaborated.
Solutions
bridge
gap
between
different
will
also
be
critically
evaluated.
Additionally,
we
introduce
frontier
pretraining,
holding
promise
an
innovative
yet
unexplored
avenue.
Journal of the American Chemical Society,
Год журнала:
2025,
Номер
147(9), С. 7476 - 7484
Опубликована: Фев. 21, 2025
The
development
of
machine
learning
models
to
predict
the
regioselectivity
C(sp3)-H
functionalization
reactions
is
reported.
A
data
set
for
dioxirane
oxidations
was
curated
from
literature
and
used
generate
a
model
C-H
oxidation.
To
assess
whether
smaller,
intentionally
designed
sets
could
provide
accuracy
on
complex
targets,
series
acquisition
functions
were
developed
select
most
informative
molecules
specific
target.
Active
learning-based
that
leverage
predicted
reactivity
uncertainty
found
outperform
those
based
molecular
site
similarity
alone.
use
elaboration
significantly
reduced
number
points
needed
perform
accurate
prediction,
it
machine-designed
can
give
predictions
when
larger,
randomly
selected
fail.
Finally,
workflow
experimentally
validated
five
substrates
shown
be
applicable
predicting
arene
radical
borylation.
These
studies
quantitative
alternative
intuitive
extrapolation
"model
substrates"
frequently
estimate
molecules.