Chemical Science,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 1, 2025
A
general
image
foundation
model
was
used
as
the
basis
for
molecular
representation
learning,
showcasing
its
benefits
in
chemical
property
prediction
through
a
stratified
pretraining
workflow.
Science Advances,
Journal Year:
2025,
Volume and Issue:
11(1)
Published: Jan. 1, 2025
The
application
of
statistical
modeling
in
organic
chemistry
is
emerging
as
a
standard
practice
for
probing
structure-activity
relationships
and
predictive
tool
many
optimization
objectives.
This
review
aimed
tutorial
those
entering
the
area
chemistry.
We
provide
case
studies
to
highlight
considerations
approaches
that
can
be
used
successfully
analyze
datasets
low
data
regimes,
common
situation
encountered
given
experimental
demands
Statistical
hinges
on
(what
being
modeled),
descriptors
(how
are
represented),
algorithms
modeled).
Herein,
we
focus
how
various
reaction
outputs
(e.g.,
yield,
rate,
selectivity,
solubility,
stability,
turnover
number)
structures
binned,
heavily
skewed,
distributed)
influence
choice
algorithm
constructing
chemically
insightful
models.
Journal of the American Chemical Society,
Journal Year:
2024,
Volume and Issue:
146(12), P. 8536 - 8546
Published: March 13, 2024
Methods
to
access
chiral
sulfur(VI)
pharmacophores
are
of
interest
in
medicinal
and
synthetic
chemistry.
We
report
the
desymmetrization
unprotected
sulfonimidamides
via
asymmetric
acylation
with
a
cinchona-phosphinate
catalyst.
The
desired
products
formed
excellent
yield
enantioselectivity
no
observed
bis-acylation.
A
data-science-driven
approach
substrate
scope
evaluation
was
coupled
high
throughput
experimentation
(HTE)
facilitate
statistical
modeling
order
inform
mechanistic
studies.
Reaction
kinetics,
catalyst
structural
studies,
density
functional
theory
(DFT)
transition
state
analysis
elucidated
turnover-limiting
step
be
collapse
tetrahedral
intermediate
provided
key
insights
into
catalyst-substrate
structure–activity
relationships
responsible
for
origin
enantioselectivity.
This
study
offers
reliable
method
accessing
enantioenriched
propel
their
application
as
serves
an
example
insight
that
can
gleaned
from
integrating
data
science
traditional
physical
organic
techniques.
Beilstein Journal of Organic Chemistry,
Journal Year:
2024,
Volume and Issue:
20, P. 2476 - 2492
Published: Oct. 4, 2024
This
review
surveys
the
recent
advances
and
challenges
in
predicting
optimizing
reaction
conditions
using
machine
learning
techniques.
The
paper
emphasizes
importance
of
acquiring
processing
large
diverse
datasets
chemical
reactions,
use
both
global
local
models
to
guide
design
synthetic
processes.
Global
exploit
information
from
comprehensive
databases
suggest
general
for
new
while
fine-tune
specific
parameters
a
given
family
improve
yield
selectivity.
also
identifies
current
limitations
opportunities
this
field,
such
as
data
quality
availability,
integration
high-throughput
experimentation.
demonstrates
how
combination
engineering,
science,
ML
algorithms
can
enhance
efficiency
effectiveness
design,
enable
novel
discoveries
chemistry.
Journal of Chemical Information and Modeling,
Journal Year:
2024,
Volume and Issue:
64(8), P. 2955 - 2970
Published: March 15, 2024
Chemical
reactions
serve
as
foundational
building
blocks
for
organic
chemistry
and
drug
design.
In
the
era
of
large
AI
models,
data-driven
approaches
have
emerged
to
innovate
design
novel
reactions,
optimize
existing
ones
higher
yields,
discover
new
pathways
synthesizing
chemical
structures
comprehensively.
To
effectively
address
these
challenges
with
machine
learning
it
is
imperative
derive
robust
informative
representations
or
engage
in
feature
engineering
using
extensive
data
sets
reactions.
This
work
aims
provide
a
comprehensive
review
established
reaction
featurization
approaches,
offering
insights
into
selection
features
wide
array
tasks.
The
advantages
limitations
employing
SMILES,
molecular
fingerprints,
graphs,
physics-based
properties
are
meticulously
elaborated.
Solutions
bridge
gap
between
different
will
also
be
critically
evaluated.
Additionally,
we
introduce
frontier
pretraining,
holding
promise
an
innovative
yet
unexplored
avenue.
Journal of the American Chemical Society,
Journal Year:
2025,
Volume and Issue:
147(9), P. 7476 - 7484
Published: Feb. 21, 2025
The
development
of
machine
learning
models
to
predict
the
regioselectivity
C(sp3)-H
functionalization
reactions
is
reported.
A
data
set
for
dioxirane
oxidations
was
curated
from
literature
and
used
generate
a
model
C-H
oxidation.
To
assess
whether
smaller,
intentionally
designed
sets
could
provide
accuracy
on
complex
targets,
series
acquisition
functions
were
developed
select
most
informative
molecules
specific
target.
Active
learning-based
that
leverage
predicted
reactivity
uncertainty
found
outperform
those
based
molecular
site
similarity
alone.
use
elaboration
significantly
reduced
number
points
needed
perform
accurate
prediction,
it
machine-designed
can
give
predictions
when
larger,
randomly
selected
fail.
Finally,
workflow
experimentally
validated
five
substrates
shown
be
applicable
predicting
arene
radical
borylation.
These
studies
quantitative
alternative
intuitive
extrapolation
"model
substrates"
frequently
estimate
molecules.
ACS Central Science,
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 8, 2024
With
over
10,000
new
reaction
protocols
arising
every
year,
only
a
handful
of
these
procedures
transition
from
academia
to
application.
A
major
reason
for
this
gap
stems
the
lack
comprehensive
knowledge
about
reaction's
scope,
i.e.,
which
substrates
protocol
can
or
cannot
be
applied.
Even
though
chemists
invest
substantial
effort
assess
scope
protocols,
resulting
tables
involve
significant
biases,
reducing
their
expressiveness.
Herein
we
report
standardized
substrate
selection
strategy
designed
mitigate
biases
and
evaluate
applicability,
as
well
limits,
any
chemical
reaction.
Unsupervised
learning
is
utilized
map
space
industrially
relevant
molecules.
Subsequently,
potential
candidates
are
projected
onto
universal
map,
enabling
structurally
diverse
set
with
optimal
relevance
coverage.
By
testing
our
methodology
on
different
reactions,
were
able
demonstrate
its
effectiveness
in
finding
general
reactivity
trends
by
using
few
highly
representative
examples.
The
developed
empowers
showcase
unbiased
applicability
novel
methodologies,
facilitating
practical
applications.
We
hope
that
work
will
trigger
interdisciplinary
discussions
synthetic
chemistry,
leading
improved
data
quality.
Journal of the American Chemical Society,
Journal Year:
2024,
Volume and Issue:
146(22), P. 15070 - 15084
Published: May 20, 2024
Despite
the
increased
use
of
computational
tools
to
supplement
medicinal
chemists'
expertise
and
intuition
in
drug
design,
predicting
synthetic
yields
chemistry
endeavors
remains
an
unsolved
challenge.
Existing
design
workflows
could
profoundly
benefit
from
reaction
yield
prediction,
as
precious
material
waste
be
reduced,
a
greater
number
relevant
compounds
delivered
advance
make,
test,
analyze
(DMTA)
cycle.
In
this
work,
we
detail
evaluation
AbbVie's
library
data
set
build
machine
learning
models
for
prediction
Suzuki
coupling
yields.
The
combination
density
functional
theory
(DFT)-derived
features
Morgan
fingerprints
was
identified
perform
better
than
one-hot
encoded
baseline
modeling,
furnishing
encouraging
results.
Overall,
observe
modest
generalization
unseen
reactant
structures
within
15-year
retrospective
set.
Additionally,
compare
predictions
made
by
model
those
expert
chemists,
finding
that
can
often
predict
both
success
with
accuracy.
Finally,
demonstrate
application
approach
suggest
structurally
electronically
similar
building
blocks
replace
predicted
or
observed
unsuccessful
prior
after
synthesis,
respectively.
used
select
monomers
have
higher
yields,
resulting
synthesis
efficiency
drug-like
molecules.
JACS Au,
Journal Year:
2024,
Volume and Issue:
4(7), P. 2492 - 2502
Published: July 3, 2024
Illuminating
synthetic
pathways
is
essential
for
producing
valuable
chemicals,
such
as
bioactive
molecules.
Chemical
and
biological
syntheses
are
crucial,
their
integration
often
leads
to
more
efficient
sustainable
pathways.
Despite
the
rapid
development
of
retrosynthesis
models,
few
them
consider
both
chemical
syntheses,
hindering
pathway
design
high-value
chemicals.
Here,
we
propose
BioNavi
by
innovating
multitask
learning
reaction
templates
into
deep
learning-driven
model
hybrid
synthesis
in
a
interpretable
manner.
outperforms
existing
approaches
on
different
data
sets,
achieving
75%
hit
rate
replicating
reported
biosynthetic
displaying
superior
ability
designing
Additional
case
studies
further
illustrate
potential
application
de
novo
design.
The
enhanced
web
server
(http://biopathnavi.qmclab.com/bionavi/)
simplifies
input
operations
implements
step-by-step
exploration
according
user
experience.
We
show
that
handy
navigator
various
Journal of the American Chemical Society,
Journal Year:
2024,
Volume and Issue:
146(15), P. 10581 - 10590
Published: April 5, 2024
Positron
emission
tomography
is
a
widely
used
imaging
platform
for
studying
physiological
processes.
Despite
the
proliferation
of
modern
synthetic
methodologies
radiolabeling,
optimization
these
reactions
still
primarily
relies
on
inefficient
one-factor-at-a-time
approaches.
High-throughput
experimentation
(HTE)
has
proven
to
be
powerful
approach
optimizing
in
many
areas
chemical
synthesis.
However,
date,
HTE
rarely
been
applied
radiochemistry.
This
largely
because
short
lifetime
common
radioisotopes,
which
presents
major
challenges
efficient
parallel
reaction
setup
and
analysis
using
standard
equipment
workflows.
Herein,
we
demonstrate
an
effective
workflow
apply
it
copper-mediated
radiofluorination
pharmaceutically
relevant
boronate
ester
substrates.
The
utilizes
commercial
allows
rapid
reactions,
exploring
space
aryl
boronates
radiofluorinations,
constructing
large
radiochemistry
data
sets.