bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 25, 2024
Abstract
This
paper
presents
the
Smart
Distributed
Data
Factory
(SDDF),
an
AI-driven
distributed
computing
platform
designed
to
address
challenges
in
drug
discovery
by
creating
comprehensive
datasets
of
molecular
conformations
and
their
properties.
SDDF
uses
volunteer
computing,
leveraging
processing
power
personal
computers
worldwide
accelerate
quantum
chemistry
(DFT)
calculations.
To
tackle
vast
chemical
space
limited
high-quality
data,
employs
ensemble
machine
learning
models
predict
properties
selectively
choose
most
challenging
data
points
for
further
DFT
The
also
generates
new
using
dynamics
with
forces
derived
from
these
models.
makes
several
contributions:
calculations;
active
framework
constructing
a
dataset
conformations;
large
public
diverse
ENAMINE
molecules
calculated
energies;
state-of-the-art
ML
accurate
energy
prediction.
was
generated
validate
approach
reducing
need
extensive
With
its
strict
scaffold
split,
can
be
used
training
benchmarking
By
combining
learning,
chemistry,
offers
scalable,
cost-effective
solution
developing
ultimately
accelerating
discovery.
The Journal of Chemical Physics,
Journal Year:
2025,
Volume and Issue:
162(8)
Published: Feb. 28, 2025
In
this
work,
we
present
MOLPIPx,
a
versatile
library
designed
to
seamlessly
integrate
permutationally
invariant
polynomials
with
modern
machine
learning
frameworks,
enabling
the
efficient
development
of
linear
models,
neural
networks,
and
Gaussian
process
models.
These
methodologies
are
widely
employed
for
parameterizing
potential
energy
surfaces
across
diverse
molecular
systems.
MOLPIPx
leverages
two
powerful
automatic
differentiation
engines—JAX
EnzymeAD-Rust—to
facilitate
computation
gradients
higher-order
derivatives,
which
essential
tasks
such
as
force
field
dynamic
simulations.
is
available
at
https://github.com/ChemAI-Lab/molpipx.
Molecules,
Journal Year:
2024,
Volume and Issue:
29(19), P. 4626 - 4626
Published: Sept. 29, 2024
The
field
of
computational
protein
engineering
has
been
transformed
by
recent
advancements
in
machine
learning,
artificial
intelligence,
and
molecular
modeling,
enabling
the
design
proteins
with
unprecedented
precision
functionality.
Computational
methods
now
play
a
crucial
role
enhancing
stability,
activity,
specificity
for
diverse
applications
biotechnology
medicine.
Techniques
such
as
deep
reinforcement
transfer
learning
have
dramatically
improved
structure
prediction,
optimization
binding
affinities,
enzyme
design.
These
innovations
streamlined
process
allowing
rapid
generation
targeted
libraries,
reducing
experimental
sampling,
rational
tailored
properties.
Furthermore,
integration
approaches
high-throughput
techniques
facilitated
development
multifunctional
novel
therapeutics.
However,
challenges
remain
bridging
gap
between
predictions
validation
addressing
ethical
concerns
related
to
AI-driven
This
review
provides
comprehensive
overview
current
state
future
directions
engineering,
emphasizing
their
transformative
potential
creating
next-generation
biologics
advancing
synthetic
biology.
Journal of Chemical Theory and Computation,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 14, 2025
Training
accurate
machine
learning
potentials
requires
electronic
structure
data
comprehensively
covering
the
configurational
space
of
system
interest.
As
construction
this
is
computationally
demanding,
many
schemes
for
identifying
most
important
structures
have
been
proposed.
Here,
we
compare
performance
high-dimensional
neural
network
(HDNNPs)
quantum
liquid
water
at
ambient
conditions
trained
to
sets
constructed
using
random
sampling
as
well
various
flavors
active
based
on
query
by
committee.
Contrary
common
understanding
learning,
find
that
a
given
set
size,
leads
smaller
test
errors
not
included
in
training
process.
In
our
analysis,
show
can
be
related
small
energy
offsets
caused
bias
added
which
overcome
instead
correlations
an
error
measure
invariant
such
shifts.
Still,
all
HDNNPs
yield
very
similar
and
structural
properties
water,
demonstrates
robustness
procedure
with
respect
algorithm
even
when
few
200
structures.
However,
preliminary
potentials,
reasonable
initial
avoid
unnecessary
extension
covered
configuration
less
relevant
regions.
APL Materials,
Journal Year:
2025,
Volume and Issue:
13(2)
Published: Feb. 1, 2025
The
rise
of
artificial
intelligence
(AI)
as
a
powerful
research
tool
in
materials
science
has
been
extensively
acknowledged.
Particularly,
exploring
zeolites
with
target
properties
is
vital
significance
for
industrial
applications,
integrating
AI
technologies
into
zeolite
design
undoubtedly
brings
immense
promise
the
advancements
this
field.
Here,
we
provide
comprehensive
review
AI-empowered
digital
zeolites.
It
showcases
state-of-the-art
progress
predicting
zeolite-related
properties,
employing
machine
learning
potentials
simulations,
using
generative
models
inverse
design,
and
aiding
experimental
synthesis
challenges
perspectives
are
also
discussed,
emphasizing
new
opportunities
at
intersection
This
expected
to
offer
crucial
guidance
advancing
innovations
through
future.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: Feb. 28, 2025
Abstract
This
paper
presents
the
smart
distributed
data
factory
(SDDF),
an
AI-driven
computing
platform
designed
to
address
challenges
in
drug
discovery
by
creating
comprehensive
datasets
of
molecular
conformations
and
their
properties.
SDDF
uses
volunteer
computing,
leveraging
processing
power
personal
computers
worldwide
accelerate
quantum
chemistry
(DFT)
calculations.
To
tackle
vast
chemical
space
limited
high-quality
data,
employs
ensemble
machine
learning
(ML)
models
predict
properties
selectively
choose
most
challenging
points
for
further
DFT
The
also
generates
new
using
dynamics
with
forces
derived
from
these
models.
makes
several
contributions:
calculations;
active
framework
constructing
a
dataset
conformations;
large
public
diverse
ENAMINE
molecules
calculated
energies;
ML
accurate
energy
prediction.
was
generated
validate
approach
reducing
need
extensive
With
its
strict
scaffold
split,
can
be
used
training
benchmarking
By
combining
learning,
chemistry,
offers
scalable,
cost-effective
solution
developing
ultimately
accelerating
discovery.
Journal of Chemical Information and Modeling,
Journal Year:
2025,
Volume and Issue:
unknown
Published: May 5, 2025
Machine
learning-based
interatomic
potentials
(MLIPs)
have
transformed
the
prediction
of
potential
energy
surfaces
(PESs),
achieving
accuracy
comparable
to
ab
initio
calculations.
However,
atomic
predictions,
often
assumed
lack
physical
meaning,
remain
underexplored.
In
this
study,
we
demonstrate
that
inaccuracies
in
predictions
reduce
robustness
and
transferability
Neural
Network
Potentials
(NNPs)
error
can
be
masked
total
due
cancellation.
We
validate
finding
using
challenging
configurations
involving
deformation
failure
under
tensile
loading.
By
pretraining
empirical
applying
transfer
learning
with
density
functional
theory
(DFT)
data,
achieve
notable
improvements
energy,
forces,
stress
predictions.
Furthermore,
approach
enhances
NNPs,
emphasizing
importance
developing
high-quality
reliable
MLIPs.
The
efficacy
of
neural
network
potentials
(NNPs)
critically
depends
on
the
quality
configurational
datasets
used
for
training.
Prior
research
using
empirical
has
shown
that
well-selected
liquid-solid
transitional
configurations
a
metallic
system
can
be
translated
to
other
systems.
This
study
demonstrates
such
validated
relabeled
density
functional
theory
(DFT)
calculations,
thereby
enhancing
development
high-fidelity
NNPs.
Training
strategies
and
sampling
approaches
are
efficiently
assessed
subsequently
via
DFT
in
highly
parallelized
fashion
NNP
Our
results
reveal
relying
solely
energy
force
training
is
inadequate
prevent
overfitting,
highlighting
necessity
incorporating
stress
terms
into
loss
functions.
To
optimize
involving
terms,
we
propose
employing
transfer
learning
fine-tune
weights,
ensuring
potential
surface
smooth
these
quantities
composed
derivatives.
approach
markedly
improves
accuracy
elastic
constants
derived
from
simulations
both
potential-based
DFT-based
NNP.
Overall,
this
offers
significant
insights
leveraging
expedite
reliable
robust
NNPs
at
level.