The Journal of Physical Chemistry A,
Год журнала:
2023,
Номер
127(40), С. 8253 - 8271
Опубликована: Сен. 28, 2023
Burgeoning
developments
in
machine
learning
(ML)
and
its
rapidly
growing
adaptations
chemistry
are
noteworthy.
Motivated
by
the
successful
deployments
of
ML
realm
molecular
property
prediction
(MPP)
chemical
reaction
(CRP),
herein
we
highlight
some
most
recent
applications
predictive
chemistry.
We
present
a
nonmathematical
concise
overview
progression
implementations,
ranging
from
an
ensemble-based
random
forest
model
to
advanced
graph
neural
network
algorithms.
Similarly,
prospects
various
feature
engineering
approaches
that
work
conjunction
with
models
described.
Highly
accurate
predictions
reported
MPP
tasks
(e.g.,
lipophilicity,
solubility,
distribution
coefficient),
using
methods
such
as
D-MPNN,
MolCLR,
SMILES-BERT,
MolBERT,
offer
promising
avenues
design
drug
discovery.
Whereas
pertains
given
molecule,
reactions
different
level
challenge,
primarily
arising
simultaneous
involvement
multiple
molecules
their
diverse
roles
setting.
The
RMSEs
range
0.287
2.20,
while
those
for
yield
well
over
4.9
lower
end,
reaching
thresholds
>10.0
several
examples.
Our
Review
concludes
set
persisting
challenges
dealing
data
sets
overall
optimistic
outlook
on
benefits
ML-driven
workflows
CRP
tasks.
Journal of Chemical Information and Modeling,
Год журнала:
2023,
Номер
64(1), С. 42 - 56
Опубликована: Дек. 20, 2023
Machine
Learning
(ML)
techniques
face
significant
challenges
when
predicting
advanced
chemical
properties,
such
as
yield,
feasibility
of
synthesis,
and
optimal
reaction
conditions.
These
stem
from
the
high-dimensional
nature
prediction
task
myriad
essential
variables
involved,
ranging
reactants
reagents
to
catalysts,
temperature,
purification
processes.
Successfully
developing
a
reliable
predictive
model
not
only
holds
potential
for
optimizing
high-throughput
experiments
but
can
also
elevate
existing
retrosynthetic
approaches
bolster
plethora
applications
within
field.
In
this
review,
we
systematically
evaluate
efficacy
current
ML
methodologies
in
chemoinformatics,
shedding
light
on
their
milestones
inherent
limitations.
Additionally,
detailed
examination
representative
case
study
provides
insights
into
prevailing
issues
related
data
availability
transferability
discipline.
Journal of the American Chemical Society,
Год журнала:
2024,
Номер
146(22), С. 15070 - 15084
Опубликована: Май 20, 2024
Despite
the
increased
use
of
computational
tools
to
supplement
medicinal
chemists'
expertise
and
intuition
in
drug
design,
predicting
synthetic
yields
chemistry
endeavors
remains
an
unsolved
challenge.
Existing
design
workflows
could
profoundly
benefit
from
reaction
yield
prediction,
as
precious
material
waste
be
reduced,
a
greater
number
relevant
compounds
delivered
advance
make,
test,
analyze
(DMTA)
cycle.
In
this
work,
we
detail
evaluation
AbbVie's
library
data
set
build
machine
learning
models
for
prediction
Suzuki
coupling
yields.
The
combination
density
functional
theory
(DFT)-derived
features
Morgan
fingerprints
was
identified
perform
better
than
one-hot
encoded
baseline
modeling,
furnishing
encouraging
results.
Overall,
observe
modest
generalization
unseen
reactant
structures
within
15-year
retrospective
set.
Additionally,
compare
predictions
made
by
model
those
expert
chemists,
finding
that
can
often
predict
both
success
with
accuracy.
Finally,
demonstrate
application
approach
suggest
structurally
electronically
similar
building
blocks
replace
predicted
or
observed
unsuccessful
prior
after
synthesis,
respectively.
used
select
monomers
have
higher
yields,
resulting
synthesis
efficiency
drug-like
molecules.
Journal of Cheminformatics,
Год журнала:
2024,
Номер
16(1)
Опубликована: Фев. 25, 2024
Developing
machine
learning
models
with
high
generalization
capability
for
predicting
chemical
reaction
yields
is
of
significant
interest
and
importance.
The
efficacy
such
depends
heavily
on
the
representation
reactions,
which
has
commonly
been
learned
from
SMILES
or
graphs
molecules
using
deep
neural
networks.
However,
progression
reactions
inherently
determined
by
molecular
3D
geometric
properties,
have
recently
highlighted
as
crucial
features
in
accurately
properties
reactions.
Additionally,
large-scale
pre-training
shown
to
be
essential
enhancing
complex
models.
Based
these
considerations,
we
propose
Reaction
Multi-View
Pre-training
(ReaMVP)
framework,
leverages
self-supervised
techniques
a
two-stage
strategy
predict
yields.
By
incorporating
multi-view
information,
ReaMVP
achieves
state-of-the-art
performance
two
benchmark
datasets.
Notably,
experimental
results
indicate
that
advantage
out-of-sample
data,
suggesting
an
enhanced
ability
new
Scientific
Contribution:
This
study
presents
improves
integrating
sequential
views
leveraging
strategy,
framework
demonstrates
superior
predictive
data
enhances
prediction
Journal of Cheminformatics,
Год журнала:
2023,
Номер
15(1)
Опубликована: Апрель 10, 2023
Artificial
intelligence
has
deeply
revolutionized
the
field
of
medicinal
chemistry
with
many
impressive
applications,
but
success
these
applications
requires
a
massive
amount
training
samples
high-quality
annotations,
which
seriously
limits
wide
usage
data-driven
methods.
In
this
paper,
we
focus
on
reaction
yield
prediction
problem,
assists
chemists
in
selecting
high-yield
reactions
new
chemical
space
only
few
experimental
trials.
To
attack
challenge,
first
put
forth
MetaRF,
an
attention-based
random
forest
model
specially
designed
for
few-shot
prediction,
where
attention
weight
is
automatically
optimized
by
meta-learning
framework
and
can
be
quickly
adapted
to
predict
performance
reagents
while
given
additional
samples.
improve
learning
performance,
further
introduce
dimension-reduction
based
sampling
method
determine
valuable
experimentally
tested
then
learned.
Our
methodology
evaluated
three
different
datasets
acquires
satisfactory
prediction.
high-throughput
experimentation
(HTE)
datasets,
average
our
methodology's
top
10
relatively
close
results
ideal
selection.
ACS Catalysis,
Год журнала:
2023,
Номер
13(21), С. 14285 - 14299
Опубликована: Окт. 26, 2023
The
application
of
computational
methods
in
enantioselective
catalysis
has
evolved
from
the
rationalization
observed
stereochemical
outcome
to
their
prediction
and
design
chiral
ligands.
This
Perspective
provides
an
overview
current
used,
ranging
atomistic
modeling
transition
structures
involved
correlation-based
with
particular
emphasis
placed
on
Q2MM/CatVS
method.
Using
three
palladium-catalyzed
reactions,
namely,
conjugate
addition
arylboronic
acids
enones,
redox
relay
Heck
reaction,
Tsuji–Trost
allylic
amination
as
case
studies,
we
argue
that
have
become
truly
equal
partners
experimental
studies
that,
some
cases,
they
are
able
correct
published
assignments.
Finally,
consequences
this
approach
data-driven
discussed.
Chemical Science,
Год журнала:
2024,
Номер
15(34), С. 13618 - 13630
Опубликована: Янв. 1, 2024
Enantioselective
hydrogenation
of
olefins
by
Rh-based
chiral
catalysts
has
been
extensively
studied
for
more
than
50
years.
Naively,
one
would
expect
that
everything
about
this
transformation
is
known
and
selecting
a
catalyst
induces
the
desired
reactivity
or
selectivity
trivial
task.
Nonetheless,
ligand
engineering
selection
any
new
prochiral
olefin
remains
an
empirical
trial-error
exercise.
In
study,
we
investigated
whether
machine
learning
techniques
could
be
used
to
accelerate
identification
most
efficient
ligand.
For
purpose,
high
throughput
experimentation
build
large
dataset
consisting
results
Rh-catalyzed
asymmetric
hydrogenation,
specially
designed
applications
in
learning.
We
showcased
its
alignment
with
existing
literature
while
addressing
observed
discrepancies.
Additionally,
computational
framework
automated
reproducible
quantum-chemistry
based
featurization
structures
was
created.
Together
less
computationally
demanding
representations,
these
descriptors
were
fed
into
our
pipeline
both
out-of-domain
in-domain
prediction
tasks
reactivity.
purposes,
models
provided
limited
efficacy.
It
found
even
expensive
do
not
impart
significant
meaning
model
predictions.
The
application,
partly
successful
predictions
conversion,
emphasizes
need
evaluating
cost-benefit
ratio
intensive
tailored
descriptor
design.
Challenges
persist
predicting
enantioselectivity,
calling
caution
interpreting
from
small
datasets.
Our
insights
underscore
importance
diversity
broad
substrate
inclusion
suggest
mechanistic
considerations
improve
accuracy
statistical
models.
Journal of Cheminformatics,
Год журнала:
2023,
Номер
15(1)
Опубликована: Фев. 11, 2023
Artificial
Intelligence
is
revolutionizing
many
aspects
of
the
pharmaceutical
industry.
Deep
learning
models
are
now
routinely
applied
to
guide
drug
discovery
projects
leading
faster
and
improved
findings,
but
there
still
tasks
with
enormous
unrealized
potential.
One
such
task
reaction
yield
prediction.
Every
year
more
than
one
fifth
all
synthesis
attempts
result
in
product
yields
which
either
zero
or
too
low.
This
equates
chemical
human
resources
being
spent
on
activities
ultimately
do
not
progress
programs,
a
triple
loss
when
accounting
for
cost
opportunity
time
wasted.
In
this
work
we
pre-train
BERT
model
16
million
reactions
from
4
different
data
sources,
fine
tune
it
achieve
an
uncertainty
calibrated
global
prediction
model.
improvement
upon
state
art
just
increase
also
by
introducing
new
embedding
layer
solves
few
limitations
SMILES
enables
integration
additional
information
as
equivalents
molecule
role
into
encoding,
called
Enriched
Embedding
(BEE).
The
benchmarked
open-source
dataset
against
state-of-the-art
focused
showing
near
20-point
r2
score.
fine-tuned
tested
internal
company
benchmark,
prospective
study
shows
that
application
can
reduce
total
number
negative
(yield
under
5%)
ran
Janssen
at
least
34%.
Lastly,
corroborate
previous
results
through
experimental
validation,
directly
deploying
on-going
project
be
used
successfully
reagent
recommender
due
its
fast
inference
speed
reliable
confidence
estimation,
critical
feature
industry
application.
Chemical Science,
Год журнала:
2023,
Номер
14(38), С. 10378 - 10384
Опубликована: Янв. 1, 2023
The
quest
for
generating
novel
chemistry
knowledge
is
critical
in
scientific
advancement,
and
machine
learning
(ML)
has
emerged
as
an
asset
this
pursuit.