Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates
Journal of the American Chemical Society,
Journal Year:
2025,
Volume and Issue:
147(9), P. 7476 - 7484
Published: Feb. 21, 2025
The
development
of
machine
learning
models
to
predict
the
regioselectivity
C(sp3)-H
functionalization
reactions
is
reported.
A
data
set
for
dioxirane
oxidations
was
curated
from
literature
and
used
generate
a
model
C-H
oxidation.
To
assess
whether
smaller,
intentionally
designed
sets
could
provide
accuracy
on
complex
targets,
series
acquisition
functions
were
developed
select
most
informative
molecules
specific
target.
Active
learning-based
that
leverage
predicted
reactivity
uncertainty
found
outperform
those
based
molecular
site
similarity
alone.
use
elaboration
significantly
reduced
number
points
needed
perform
accurate
prediction,
it
machine-designed
can
give
predictions
when
larger,
randomly
selected
fail.
Finally,
workflow
experimentally
validated
five
substrates
shown
be
applicable
predicting
arene
radical
borylation.
These
studies
quantitative
alternative
intuitive
extrapolation
"model
substrates"
frequently
estimate
molecules.
Language: Английский
Data Science-Driven Discovery of Optimal Conditions and a Condition-Selection Model for the Chan–Lam Coupling of Primary Sulfonamides
ACS Catalysis,
Journal Year:
2025,
Volume and Issue:
unknown, P. 2292 - 2304
Published: Jan. 24, 2025
Language: Английский
Dedenser: A Python Package for Clustering and Downsampling Chemical Libraries
Journal of Chemical Information and Modeling,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 30, 2025
The
screening
of
chemical
libraries
is
an
essential
starting
point
in
the
drug
discovery
process.
While
some
researchers
desire
a
more
thorough
targets
against
narrower
scope
molecules,
it
not
uncommon
for
diverse
sets
to
be
favored
during
early
stages
discovery.
However,
cost
burden
associated
with
potential
drawbacks
if
particular
areas
space
are
needlessly
overrepresented.
To
facilitate
triaged
sampling
and
other
collections
we
have
developed
Dedenser,
tool
downsampling
clusters.
Dedenser
functions
by
reducing
membership
clusters
within
clouds
while
maintaining
initial
topology
or
distribution
space.
Python
package
that
utilizes
Hierarchical
Density-Based
Spatial
Clustering
Applications
Noise
first
identify
present
3D
then
downsamples
applying
Poisson
disk
based
on
either
their
volume
density
A
command
line
interface
graphic
user
available
which
allow
generation
clouds,
using
Mordred
QSAR
descriptor
calculations
uniform
manifold
approximation
projection
embedding,
as
well
visualization.
We
hope
will
serve
community
enabling
quick
access
reduced
molecules
representative
larger
selecting
even
distributions
rather
than
single
from
All
code
open
source
at
https://github.com/MSDLLCpapers/dedenser.
Language: Английский
Probability Guided Chemical Reaction Scopes
Published: Jan. 1, 2025
Language: Английский
Revealing the Relationship between Publication Bias and Chemical Reactivity with Contrastive Learning
Journal of the American Chemical Society,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 2, 2025
A
synthetic
method's
substrate
tolerance
and
generality
are
often
showcased
in
a
"substrate
scope"
table.
However,
selection
exhibits
frequently
discussed
publication
bias:
unsuccessful
experiments
or
low-yielding
results
rarely
reported.
In
this
work,
we
explore
more
deeply
the
relationship
between
such
bias
chemical
reactivity
beyond
simple
analysis
of
yield
distributions
using
novel
neural
network
training
strategy,
scope
contrastive
learning.
By
treating
reported
substrates
as
positive
samples
nonreported
negative
samples,
our
learning
strategy
teaches
model
to
group
molecules
within
numerical
embedding
space,
based
on
historical
trends
published
tables.
Training
20,798
aryl
halides
CAS
Content
CollectionTM,
spanning
thousands
publications
from
2010
2015,
demonstrate
that
learned
embeddings
exhibit
correlation
with
physical
organic
descriptors
through
both
intuitive
visualizations
quantitative
regression
analyses.
Additionally,
these
applicable
various
reaction
modeling
tasks
like
prediction
regioselectivity
prediction,
underscoring
potential
use
data
pretraining
task.
This
work
not
only
presents
chemistry-specific
machine
learn
literature
new
way
but
also
represents
unique
approach
uncover
reflected
by
publications.
Language: Английский
The Implementation and Impact of Chemical High-Throughput Experimentation at AstraZeneca
ACS Catalysis,
Journal Year:
2025,
Volume and Issue:
unknown, P. 5229 - 5256
Published: March 13, 2025
Language: Английский
Applying Active Learning toward Building a Generalizable Model for Ni-Photoredox Cross-Electrophile Coupling of Aryl and Alkyl Bromides
Journal of the American Chemical Society,
Journal Year:
2025,
Volume and Issue:
unknown
Published: May 22, 2025
When
developing
machine
learning
models
for
yield
prediction,
the
two
main
challenges
are
effectively
exploring
condition
space
and
substrate
space.
In
this
article,
we
disclose
an
approach
mapping
Ni/photoredox-catalyzed
cross-electrophile
coupling
of
alkyl
bromides
aryl
in
a
high-throughput
experimentation
(HTE)
context.
This
model
employs
active
(in
particular,
uncertainty
querying)
as
strategy
to
rapidly
construct
model.
Given
vastness
space,
focused
on
that
builds
initial
then
uses
minimal
data
set
expand
into
new
chemical
spaces.
built
virtual
22,240
compounds
using
less
than
400
points.
We
demonstrated
can
be
expanded
33,312
by
adding
information
around
24
building
blocks
(<100
additional
reactions).
Comparing
learning-based
one
constructed
randomly
selected
showed
was
significantly
better
at
predicting
which
reactions
will
successful.
A
combination
density
function
theory
(DFT)
difference
Morgan
fingerprints
employed
random
forest
Feature
importance
analysis
indicates
key
DFT
features
related
reaction
mechanism
(e.g.,
radical
LUMO
energy)
were
crucial
performance
predictions
outside
training
set.
anticipate
combining
featurization
uncertainty-based
querying
help
synthetic
organic
community
build
predictive
data-efficient
manner
other
feature
large
diverse
scopes.
Language: Английский
Catalysing (organo-)catalysis: Trends in the application of machine learning to enantioselective organocatalysis
Beilstein Journal of Organic Chemistry,
Journal Year:
2024,
Volume and Issue:
20, P. 2280 - 2304
Published: Sept. 10, 2024
Organocatalysis
has
established
itself
as
a
third
pillar
of
homogeneous
catalysis,
besides
transition
metal
catalysis
and
biocatalysis,
its
use
for
enantioselective
reactions
gathered
significant
interest
over
the
last
decades.
Concurrent
to
this
development,
machine
learning
(ML)
been
increasingly
applied
in
chemical
domain
efficiently
uncover
hidden
patterns
data
accelerate
scientific
discovery.
While
uptake
ML
organocatalysis
comparably
slow,
two
decades
have
showed
an
increased
from
community.
This
review
gives
overview
work
field
organocatalysis.
The
starts
by
giving
short
primer
on
experimental
chemists,
before
discussing
application
predicting
selectivity
organocatalytic
transformations.
Subsequently,
we
employed
privileged
catalysts,
focusing
catalyst
reaction
design.
Concluding,
give
our
view
current
challenges
future
directions
field,
drawing
inspiration
other
domains.
Language: Английский
Data-Driven Insights into the Transition-Metal-Catalyzed Asymmetric Hydrogenation of Olefins
The Journal of Organic Chemistry,
Journal Year:
2024,
Volume and Issue:
89(17), P. 12467 - 12478
Published: Aug. 16, 2024
The
transition-metal-catalyzed
asymmetric
hydrogenation
of
olefins
is
one
the
key
transformations
with
great
utility
in
various
industrial
applications.
field
has
been
dominated
by
use
noble
metal
catalysts,
such
as
iridium
and
rhodium.
reactions
earth-abundant
cobalt
have
increased
only
recent
years.
In
this
work,
we
analyze
large
amount
literature
data
available
on
iridium-
rhodium-catalyzed
hydrogenation.
limited
using
Co
catalysts
are
then
examined
context
Ir
Rh
to
obtain
a
better
understanding
reactivity
pattern.
A
detailed
data-driven
study
types
olefins,
ligands,
reaction
conditions
solvent,
temperature,
pressure
carried
out.
Our
analysis
provides
an
trends
demonstrates
that
few
olefin–ligand
combinations
or
frequently
used.
knowledge
bias
toward
certain
group
substrates
can
be
useful
for
practitioners
design
new
sets
suitable
meaningful
predictions
from
machine-learning
models.
Language: Английский