Journal of Chemical Information and Modeling,
Год журнала:
2024,
Номер
64(23), С. 8824 - 8837
Опубликована: Ноя. 25, 2024
The
discovery
of
small
organic
compounds
for
inducing
stem
cell
differentiation
is
a
time-
and
resource-intensive
process.
While
data
science
could,
in
principle,
streamline
the
these
compounds,
novel
approaches
are
required
due
to
difficulty
acquiring
training
from
large
numbers
example
compounds.
In
this
paper,
we
present
design
new
compound
cardiomyocyte
using
simple
regression
models
trained
with
set
containing
only
80
examples.
We
introduce
decorated
shape
descriptors,
an
information-rich
molecular
feature
representation
that
integrates
both
hydrophilicity
information.
These
demonstrate
improved
performance
compared
ones
standard
descriptors
based
on
alone.
Model
overtraining
diagnosed
type
sensitivity
analysis.
Our
designed
conservative
strategy,
its
effectiveness
confirmed
through
expression
profiles
cardiomyocyte-related
marker
genes
real-time
polymerase
chain
reaction
experiments
human
iPS
lines.
This
work
demonstrates
viable
data-driven
strategy
designing
protocols
will
be
useful
situations
where
limited.
Beilstein Journal of Organic Chemistry,
Год журнала:
2024,
Номер
20, С. 2476 - 2492
Опубликована: Окт. 4, 2024
This
review
surveys
the
recent
advances
and
challenges
in
predicting
optimizing
reaction
conditions
using
machine
learning
techniques.
The
paper
emphasizes
importance
of
acquiring
processing
large
diverse
datasets
chemical
reactions,
use
both
global
local
models
to
guide
design
synthetic
processes.
Global
exploit
information
from
comprehensive
databases
suggest
general
for
new
while
fine-tune
specific
parameters
a
given
family
improve
yield
selectivity.
also
identifies
current
limitations
opportunities
this
field,
such
as
data
quality
availability,
integration
high-throughput
experimentation.
demonstrates
how
combination
engineering,
science,
ML
algorithms
can
enhance
efficiency
effectiveness
design,
enable
novel
discoveries
chemistry.
Journal of Chemical Information and Modeling,
Год журнала:
2025,
Номер
65(1), С. 312 - 325
Опубликована: Янв. 2, 2025
Despite
remarkable
advancements
in
the
organic
synthesis
field
facilitated
by
use
of
machine
learning
(ML)
techniques,
prediction
reaction
outcomes,
including
yield
estimation,
catalyst
optimization,
and
mechanism
identification,
continues
to
pose
a
significant
challenge.
This
challenge
arises
primarily
from
lack
appropriate
descriptors
capable
retaining
crucial
molecular
information
for
accurate
while
also
ensuring
computational
efficiency.
study
presents
successful
application
ML
predicting
performance
Ir-catalyzed
allylic
substitution
reactions.
We
introduce
SubA,
an
innovative
substrate-aware
descriptor
that
is
inspired
fact
specific
atoms
or
motifs
reactants
drive
outcomes.
By
employing
graph
matching
algorithms
backbone
identification
incorporating
atomic
properties
derived
density
functional
theory
calculations,
SubA
extracts
essential
at
both
level
level.
Compared
four
mainstream
descriptors,
achieves
reduced
dimensionality
enhanced
accuracy
with
over
2%
mean
absolute
error
reduction
random
scaffold
splitting
evaluations.
It
demonstrates
better
generalization
when
confronted
previously
unreported
substrate
combinations
extended
experiments.
Furthermore,
interpretable
analysis
shows
predictor
focuses
on
key
features,
offering
insights
into
mechanisms.
Journal of Chemical Information and Modeling,
Год журнала:
2025,
Номер
unknown
Опубликована: Янв. 25, 2025
Accurately
predicting
activation
energies
is
crucial
for
understanding
chemical
reactions
and
modeling
complex
reaction
systems.
However,
the
high
computational
cost
of
quantum
chemistry
methods
often
limits
feasibility
large-scale
studies,
leading
to
a
scarcity
high-quality
energy
data.
In
this
work,
we
explore
compare
three
innovative
approaches
(transfer
learning,
delta
feature
engineering)
enhance
accuracy
predictions
using
graph
neural
networks,
specifically
focusing
on
that
incorporate
low-cost,
low-level
Using
Chemprop
model,
systematically
evaluated
how
these
leverage
data
from
semiempirical
mechanics
(SQM)
calculations
improve
predictions.
Delta
which
adjusts
SQM
align
with
high-level
CCSD(T)-F12a
targets,
emerged
as
most
effective
method,
achieving
substantially
reduced
requirements.
Notably,
learning
trained
just
20–30%
matched
or
exceeded
performance
other
full
sets,
making
it
advantageous
in
data-scarce
scenarios.
its
reliance
transition
state
searches
imposes
significant
demands
during
model
application.
Transfer
pretrains
models
large
sets
data,
provided
mixed
results,
particularly
when
there
was
mismatch
distributions
between
training
target
sets.
Feature
engineering,
involves
adding
computed
molecular
properties
input
features,
showed
modest
gains,
thermodynamic
properties.
Our
study
highlights
trade-offs
demand
selecting
best
approach
enhancing
These
insights
provide
valuable
guidelines
researchers
aiming
apply
machine
helping
balance
resource
constraints.
Chemical Science,
Год журнала:
2025,
Номер
unknown
Опубликована: Янв. 1, 2025
This
article
reviews
computational
tools
for
the
prediction
of
regio-
and
site-selectivity
organic
reactions.
It
spans
from
quantum
chemical
procedures
to
deep
learning
models
showcases
application
presented
tools.
Journal of Chemical Information and Modeling,
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 15, 2025
Accurate
solubility
prediction
in
supercritical
carbon
dioxide
(scCO2)
is
crucial
for
optimizing
experimental
design
by
eliminating
unnecessary
and
costly
trials
at
an
early
stage,
thereby
streamlining
the
workflow.
A
comprehensive
database
containing
31,975
records
has
been
compiled,
providing
a
foundation
developing
predictive
models
applicable
to
diverse
class
of
chemical
compounds,
with
particular
focus
on
drug-like
substances.
In
this
study,
we
propose
domain-aware
machine
learning
approach
that
incorporates
thermodynamic
properties
governing
phase
transitions
predictions
scCO2.
Predictive
were
developed
using
CatBoost
algorithm
graph-based
architecture
employing
directed
message
passing
identify
most
effective
approach.
Furthermore,
auxiliary
solute,
including
melting
point,
critical
parameters,
enthalpy
vaporization,
Gibbs
free
energy
solvation,
predicted
as
part
work.
The
findings
underscore
efficacy
incorporating
domain-specific
features
enhance
accuracy
scCO2
modeling.
interpretation
applicability
domain
assessment
have
confirmed
qualitative
selection
employed
descriptors,
demonstrating
their
ability
generalize
unique
compounds
fall
outside
defined
domain.
Journal of Chemical Theory and Computation,
Год журнала:
2025,
Номер
unknown
Опубликована: Май 13, 2025
Understanding
complex,
multistep
chemical
reactions
at
the
molecular
level
is
a
major
challenge
whose
solution
would
greatly
benefit
design
and
optimization
of
numerous
processes.
The
separation
rare-earth
(4f)
actinide
(5f)
elements
an
example
where
improving
our
understanding
important
for
designing
optimizing
new
chemistries,
even
with
limited
number
observations.
In
this
work,
we
leverage
data-driven
artificial
intelligence
machine-learning
approaches
to
develop
kinetic
reaction
networks
that
describe
liquid-liquid
extraction
mechanism
uranium
using
N,N-di-2-ethylhexyl-isobutyramide
(DEHiBA).
Specifically,
compare
contrast
properties
two
classes
models:
(1)
purely
models
are
regularized
chemistry-agnostic,
L1
regression
(2)
chemistry-informed
relative
energies
provided
by
quantum
mechanical
calculations.
We
observe
unbiased,
simple,
accurate
in
their
predictions
experimental
measurements
when
sufficient
data
but
difficult
fully
constrain
interpret.
contrast,
exhibit
significantly
improved
interpretability
consistency,
providing
detailed
description
process
while
achieving
high
accuracy
through
ensemble
averaging.
Overall,
dominant
species
predicted
be
extracted
into
organic
phase
UO2(NO3)2(DEHiBA)2,
agreeing
slope
analysis,
thermodynamic
modeling,
EXAFS,
crystal
structures.
This
work
demonstrates
leveraging
fundamental
structure
problem
can
lead
efficient
learning
schemes
provide
both
insights
low
computational
cost.
Accounts of Chemical Research,
Год журнала:
2025,
Номер
unknown
Опубликована: Май 21, 2025
ConspectusThe
advancement
of
machine
learning
and
the
availability
large-scale
reaction
datasets
have
accelerated
development
data-driven
models
for
computer-aided
synthesis
planning
(CASP)
in
past
decade.
In
this
Account,
we
describe
range
methods
that
been
incorporated
into
newest
version
ASKCOS,
an
open-source
software
suite
developing
since
2016.
This
ongoing
effort
has
driven
by
importance
bridging
gap
between
research
development,
making
advances
available
through
a
freely
practical
tool.
ASKCOS
integrates
modules
retrosynthetic
planning,
complementary
capabilities
condition
prediction
product
prediction,
several
supplementary
utilities
with
various
roles
planning.
For
developed
Interactive
Path
Planner
(IPP)
user-guided
search
as
well
Tree
Builder
automatic
two
well-known
tree
algorithms,
Monte
Carlo
Search
(MCTS)
Retro*.
Four
one-step
retrosynthesis
covering
template-based
template-free
strategies
form
basis
predictions
can
be
used
simultaneously
to
combine
their
advantages
propose
diverse
suggestions.
Strategies
assessing
feasibility
proposed
steps
evaluating
full
pathways
are
built
on
top
pioneering
efforts
made
subtasks
recommendation,
pathway
scoring
clustering,
outcomes
including
major
product,
impurities,
site
selectivity,
regioselectivity.
addition,
also
auxiliary
based
our
work
solubility
quantum
mechanical
descriptor
which
provide
more
insight
suitability
solvents
or
hypothetical
selectivity
desired
transformations.
each
these
capabilities,
highlight
its
relevance
context
present
comprehensive
overview
how
it
is
not
only
but
other
recent
advancements
field.
We
detail
chemists
easily
interact
via
user-friendly
interfaces.
assisted
hundreds
medicinal,
synthetic,
process
day-to-day
tasks
complementing
expert
decision
route
ideation.
It
belief
CASP
tools
important
part
modern
chemistry
offer
ever-increasing
utility
accessibility.
Accurate
prediction
of
toluene/water
partition
coefficients
neutral
species
is
crucial
in
drug
discovery
and
separation
processes;
however,
data-driven
modeling
these
remains
challenging
due
to
limited
available
experimental
data.
To
address
the
limitation
data,
we
apply
multi-fidelity
learning
approaches
leveraging
a
quantum
chemical
dataset
(low
fidelity)
approximately
9000
entries
generated
by
COSMO-RS
an
(high
about
250
collected
from
literature.
We
explore
transfer
learning,
feature-augmented
multi-target
combination
with
graph
neural
networks,
validating
them
on
two
external
datasets:
one
molecules
similar
training
data
(EXT-Zamora)
more
(EXT-SAMPL9).
Our
results
show
that
significantly
improves
predictive
accuracy,
achieving
Root-Mean-Square
Error
(RMSE)
0.44
logP
units
for
EXT-Zamora,
compared
RMSE
0.63
single-task
models.
For
EXT-SAMPL9
dataset,
achieves
1.02
units,
indicating
reasonable
performance
even
complex
molecular
structures.
These
findings
highlight
potential
leverage
improve
coefficient
predictions
challenges
posed
expect
applicability
methods
used
beyond
just
coefficients.