Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: April 15, 2025
Abstract
Transition
metal-catalyzed
asymmetric
reactions
are
of
high
contemporary
importance
in
organic
synthesis.
Recently,
machine
learning
(ML)
has
shown
promise
accelerating
the
development
newer
catalytic
protocols.
However,
need
for
large
amount
experimental
data
can
present
a
bottleneck
implementing
ML
models.
Here,
we
propose
meta-learning
workflow
that
harness
literature-derived
to
extract
shared
reaction
features
and
requires
only
few
examples
predict
outcome
new
reactions.
Prototypical
networks
used
as
method
enantioselectivity
hydrogenation
olefins.
This
model
consistently
provides
significant
performance
improvement
over
other
popular
methods
such
random
forests
graph
neural
networks.
The
our
meta-model
is
analyzed
with
varying
sizes
training
demonstrate
its
utility
even
limited
data.
A
good
on
an
out-of-sample
test
set
further
indicates
general
applicability
approach.
We
believe
this
work
will
provide
leap
forward
identifying
promising
early
phases
when
minimal
available.
Journal of the American Chemical Society,
Journal Year:
2025,
Volume and Issue:
147(9), P. 7476 - 7484
Published: Feb. 21, 2025
The
development
of
machine
learning
models
to
predict
the
regioselectivity
C(sp3)-H
functionalization
reactions
is
reported.
A
data
set
for
dioxirane
oxidations
was
curated
from
literature
and
used
generate
a
model
C-H
oxidation.
To
assess
whether
smaller,
intentionally
designed
sets
could
provide
accuracy
on
complex
targets,
series
acquisition
functions
were
developed
select
most
informative
molecules
specific
target.
Active
learning-based
that
leverage
predicted
reactivity
uncertainty
found
outperform
those
based
molecular
site
similarity
alone.
use
elaboration
significantly
reduced
number
points
needed
perform
accurate
prediction,
it
machine-designed
can
give
predictions
when
larger,
randomly
selected
fail.
Finally,
workflow
experimentally
validated
five
substrates
shown
be
applicable
predicting
arene
radical
borylation.
These
studies
quantitative
alternative
intuitive
extrapolation
"model
substrates"
frequently
estimate
molecules.
ACS Applied Materials & Interfaces,
Journal Year:
2022,
Volume and Issue:
14(49), P. 55004 - 55016
Published: Dec. 1, 2022
Despite
advances
in
machine
learning
for
accurately
predicting
material
properties,
forecasting
the
performance
of
thermosetting
polymers
remains
a
challenge
due
to
sparsity
historical
experimental
data
and
their
complicated
crosslinked
structures.
We
proposed
machine-learning-assisted
materials
genome
approach
(MGA)
rapidly
designing
novel
epoxy
thermosets
with
excellent
mechanical
properties
(high
tensile
moduli,
high
strength,
toughness)
through
high-throughput
screening
vast
chemical
space.
Machine-learning
models
were
established
by
combining
attention-
gate-augmented
graph
convolutional
networks,
multilayer
perceptrons,
classical
gel
theory,
transfer
from
small
molecules
polymers.
Proof-of-concept
experiments
carried
out,
structures
designed
MGA
verified.
Gene
substructures
affecting
modulus,
toughness
also
extracted,
revealing
mechanisms
properties.
The
developed
strategy
can
be
employed
design
other
efficiently.
ACS Central Science,
Journal Year:
2023,
Volume and Issue:
9(12), P. 2196 - 2204
Published: Dec. 8, 2023
Models
can
codify
our
understanding
of
chemical
reactivity
and
serve
a
useful
purpose
in
the
development
new
synthetic
processes
via,
for
example,
evaluating
hypothetical
reaction
conditions
or
silico
substrate
tolerance.
Perhaps
most
determining
factor
is
composition
training
data
whether
it
sufficient
to
train
model
that
make
accurate
predictions
over
full
domain
interest.
Here,
we
discuss
design
datasets
ways
are
conducive
data-driven
modeling,
emphasizing
idea
set
diversity
generalizability
rely
on
choice
molecular
representation.
We
additionally
experimental
constraints
associated
with
generating
common
types
chemistry
how
these
considerations
should
influence
dataset
building.
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: July 3, 2023
High-throughput
experimentation
(HTE)
is
an
increasingly
important
tool
in
reaction
discovery.
While
the
hardware
for
running
HTE
chemical
laboratory
has
evolved
significantly
recent
years,
there
remains
a
need
software
solutions
to
navigate
data-rich
experiments.
Here
we
have
developed
phactor™,
that
facilitates
performance
and
analysis
of
laboratory.
phactor™
allows
experimentalists
rapidly
design
arrays
reactions
or
direct-to-biology
experiments
24,
96,
384,
1,536
wellplates.
Users
can
access
online
reagent
data,
such
as
inventory,
virtually
populate
wells
with
produce
instructions
perform
array
manually,
assistance
liquid
handling
robot.
After
completion
array,
analytical
results
be
uploaded
facile
evaluation,
guide
next
series
All
metadata,
are
stored
machine-readable
formats
readily
translatable
various
software.
We
also
demonstrate
use
discovery
several
chemistries,
including
identification
low
micromolar
inhibitor
SARS-CoV-2
main
protease.
Furthermore,
been
made
available
free
academic
24-
96-well
via
interface.
Journal of Chemical Information and Modeling,
Journal Year:
2024,
Volume and Issue:
64(8), P. 2955 - 2970
Published: March 15, 2024
Chemical
reactions
serve
as
foundational
building
blocks
for
organic
chemistry
and
drug
design.
In
the
era
of
large
AI
models,
data-driven
approaches
have
emerged
to
innovate
design
novel
reactions,
optimize
existing
ones
higher
yields,
discover
new
pathways
synthesizing
chemical
structures
comprehensively.
To
effectively
address
these
challenges
with
machine
learning
it
is
imperative
derive
robust
informative
representations
or
engage
in
feature
engineering
using
extensive
data
sets
reactions.
This
work
aims
provide
a
comprehensive
review
established
reaction
featurization
approaches,
offering
insights
into
selection
features
wide
array
tasks.
The
advantages
limitations
employing
SMILES,
molecular
fingerprints,
graphs,
physics-based
properties
are
meticulously
elaborated.
Solutions
bridge
gap
between
different
will
also
be
critically
evaluated.
Additionally,
we
introduce
frontier
pretraining,
holding
promise
an
innovative
yet
unexplored
avenue.
Chemical Science,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 1, 2025
Label
ranking
is
introduced
as
a
conceptually
new
means
for
prioritizing
experiments.
Their
simplicity,
ease
of
application,
and
the
use
aggregation
facilitate
their
ability
to
make
accurate
predictions
with
small
datasets.
Angewandte Chemie International Edition,
Journal Year:
2023,
Volume and Issue:
62(48)
Published: Oct. 10, 2023
A
novel
and
convenient
approach
that
combines
high-throughput
experimentation
(HTE)
with
machine
learning
(ML)
technologies
to
achieve
the
first
selective
cross-dimerization
of
sulfoxonium
ylides
via
iridium
catalysis
is
presented.
variety
valuable
amide-,
ketone-,
ester-,
N-heterocycle-substituted
unsymmetrical
E-alkenes
are
synthesized
in
good
yields
high
stereoselectivities.
This
mild
method
avoids
use
diazo
compounds
characterized
by
simple
operation,
step-economy,
excellent
chemoselectivity
functional
group
compatibility.
The
combined
experimental
computational
studies
identify
an
amide-sulfoxonium
ylide
as
a
carbene
precursor.
Furthermore,
comprehensive
exploration
reaction
space
also
performed
(600
reactions)
model
for
yield
prediction
has
been
constructed.
Journal of Cheminformatics,
Journal Year:
2023,
Volume and Issue:
15(1)
Published: April 10, 2023
Artificial
intelligence
has
deeply
revolutionized
the
field
of
medicinal
chemistry
with
many
impressive
applications,
but
success
these
applications
requires
a
massive
amount
training
samples
high-quality
annotations,
which
seriously
limits
wide
usage
data-driven
methods.
In
this
paper,
we
focus
on
reaction
yield
prediction
problem,
assists
chemists
in
selecting
high-yield
reactions
new
chemical
space
only
few
experimental
trials.
To
attack
challenge,
first
put
forth
MetaRF,
an
attention-based
random
forest
model
specially
designed
for
few-shot
prediction,
where
attention
weight
is
automatically
optimized
by
meta-learning
framework
and
can
be
quickly
adapted
to
predict
performance
reagents
while
given
additional
samples.
improve
learning
performance,
further
introduce
dimension-reduction
based
sampling
method
determine
valuable
experimentally
tested
then
learned.
Our
methodology
evaluated
three
different
datasets
acquires
satisfactory
prediction.
high-throughput
experimentation
(HTE)
datasets,
average
our
methodology's
top
10
relatively
close
results
ideal
selection.
Journal of Chemical Information and Modeling,
Journal Year:
2023,
Volume and Issue:
63(12), P. 3659 - 3668
Published: June 14, 2023
Machine
learning
models
are
increasingly
being
utilized
to
predict
outcomes
of
organic
chemical
reactions.
A
large
amount
reaction
data
is
used
train
these
models,
which
in
stark
contrast
how
expert
chemists
discover
and
develop
new
reactions
by
leveraging
information
from
a
small
number
relevant
transformations.
Transfer
active
two
strategies
that
can
operate
low-data
situations,
may
help
fill
this
gap
promote
the
use
machine
for
tackling
real-world
challenges
synthesis.
This
Perspective
introduces
transfer
connects
potential
opportunities
directions
further
research,
especially
area
prospective
development