The Journal of Chemical Physics,
Journal Year:
2023,
Volume and Issue:
158(16)
Published: April 27, 2023
Deep
learning
has
emerged
as
a
promising
paradigm
to
give
access
highly
accurate
predictions
of
molecular
and
material
properties.
A
common
short-coming
shared
by
current
approaches,
however,
is
that
neural
networks
only
point
estimates
their
do
not
come
with
predictive
uncertainties
associated
these
estimates.
Existing
uncertainty
quantification
efforts
have
primarily
leveraged
the
standard
deviation
across
an
ensemble
independently
trained
networks.
This
incurs
large
computational
overhead
in
both
training
prediction,
resulting
order-of-magnitude
more
expensive
predictions.
Here,
we
propose
method
estimate
based
on
single
network
without
need
for
ensemble.
allows
us
obtain
virtually
no
additional
over
inference.
We
demonstrate
quality
matches
those
obtained
from
deep
ensembles.
further
examine
our
methods
ensembles
configuration
space
test
system
compare
potential
energy
surface.
Finally,
study
efficacy
active
setting
find
results
match
ensemble-based
strategy
at
reduced
cost.
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: Jan. 7, 2023
Generative
chemical
language
models
(CLMs)
can
be
used
for
de
novo
molecular
structure
generation
by
learning
from
a
textual
representation
of
molecules.
Here,
we
show
that
hybrid
CLMs
additionally
leverage
the
bioactivity
information
available
training
compounds.
To
computationally
design
ligands
phosphoinositide
3-kinase
gamma
(PI3Kγ),
collection
virtual
molecules
was
created
with
generative
CLM.
This
compound
library
refined
using
CLM-based
classifier
prediction.
second
CLM
pretrained
patented
structures
and
fine-tuned
known
PI3Kγ
ligands.
Several
computer-generated
designs
were
commercially
available,
enabling
fast
prescreening
preliminary
experimental
validation.
A
new
ligand
sub-micromolar
activity
identified,
highlighting
method's
scaffold-hopping
potential.
Chemical
synthesis
biochemical
testing
two
top-ranked
designed
their
derivatives
corroborated
model's
ability
to
generate
medium
low
nanomolar
hit-to-lead
expansion.
The
most
potent
compounds
led
pronounced
inhibition
PI3K-dependent
Akt
phosphorylation
in
medulloblastoma
cell
model,
demonstrating
efficacy
PI3K/Akt
pathway
repression
human
tumor
cells.
results
positively
advocate
screening
activity-focused
design.
Journal of Chemical Information and Modeling,
Journal Year:
2023,
Volume and Issue:
63(13), P. 4012 - 4029
Published: June 20, 2023
Characterizing
uncertainty
in
machine
learning
models
has
recently
gained
interest
the
context
of
reliability,
robustness,
safety,
and
active
learning.
Here,
we
separate
total
into
contributions
from
noise
data
(aleatoric)
shortcomings
model
(epistemic),
further
dividing
epistemic
bias
variance
contributions.
We
systematically
address
influence
noise,
bias,
chemical
property
predictions,
where
diverse
nature
target
properties
vast
space
give
rise
to
many
different
distinct
sources
prediction
error.
demonstrate
that
error
can
each
be
significant
contexts
must
individually
addressed
during
development.
Through
controlled
experiments
on
sets
molecular
properties,
show
important
trends
performance
associated
with
level
set,
size
architecture,
molecule
representation,
ensemble
size,
set
splitting.
In
particular,
1)
test
limit
a
model's
observed
when
actual
is
much
better,
2)
using
size-extensive
aggregation
structures
crucial
for
extensive
prediction,
3)
ensembling
reliable
tool
quantification
improvement
specifically
contribution
variance.
develop
general
guidelines
how
improve
an
underperforming
falling
contexts.
Digital Discovery,
Journal Year:
2024,
Volume and Issue:
3(3), P. 467 - 481
Published: Jan. 1, 2024
Pareto
optimization
is
suited
to
multi-objective
problems
when
the
relative
importance
of
objectives
not
known
a
priori.
We
report
an
open
source
tool
accelerate
docking-based
virtual
screening
with
strong
empirical
performance.
Chemical Science,
Journal Year:
2021,
Volume and Issue:
12(22), P. 7866 - 7881
Published: Jan. 1, 2021
Structure-based
virtual
screening
is
an
important
tool
in
early
stage
drug
discovery
that
scores
the
interactions
between
a
target
protein
and
candidate
ligands.
As
libraries
continue
to
grow
(in
excess
of
108
molecules),
so
too
do
resources
necessary
conduct
exhaustive
campaigns
on
these
libraries.
However,
Bayesian
optimization
techniques,
previously
employed
other
scientific
problems,
can
aid
their
exploration:
surrogate
structure-property
relationship
model
trained
predicted
affinities
subset
library
be
applied
remaining
members,
allowing
least
promising
compounds
excluded
from
evaluation.
In
this
study,
we
explore
application
techniques
computational
docking
datasets
assess
impact
architecture,
acquisition
function,
batch
size
performance.
We
observe
significant
reductions
costs;
for
example,
using
directed-message
passing
neural
network
identify
94.8%
or
89.3%
top-50
000
ligands
100M
member
after
testing
only
2.4%
upper
confidence
bound
greedy
strategy,
respectively.
Such
model-guided
searches
mitigate
increasing
costs
increasingly
large
accelerate
high-throughput
with
applications
beyond
docking.
Briefings in Bioinformatics,
Journal Year:
2021,
Volume and Issue:
22(6)
Published: Aug. 9, 2021
Application
of
machine
and
deep
learning
methods
in
drug
discovery
cancer
research
has
gained
a
considerable
amount
attention
the
past
years.
As
field
grows,
it
becomes
crucial
to
systematically
evaluate
performance
novel
computational
solutions
relation
established
techniques.
To
this
end,
we
compare
rule-based
data-driven
molecular
representations
prediction
combination
sensitivity
synergy
scores
using
standardized
results
14
high-throughput
screening
studies,
comprising
64
200
unique
combinations
4153
molecules
tested
112
cell
lines.
We
clustering
quantify
their
similarity
by
adapting
Centered
Kernel
Alignment
metric.
Our
work
demonstrates
that
identify
an
optimal
representation
type,
is
necessary
supplement
quantitative
benchmark
with
qualitative
considerations,
such
as
model
interpretability
robustness,
which
may
vary
between
throughout
preclinical
development
projects.
Expert Opinion on Drug Discovery,
Journal Year:
2021,
Volume and Issue:
16(9), P. 937 - 947
Published: April 19, 2021
Introduction:
Artificial
Intelligence
(AI)
has
become
a
component
of
our
everyday
lives,
with
applications
ranging
from
recommendations
on
what
to
buy
the
analysis
radiology
images.
Many
techniques
originally
developed
for
other
fields
such
as
language
translation
and
computer
vision
are
now
being
applied
in
drug
discovery.
AI
enabled
multiple
aspects
discovery
including
high
content
screening
data,
design
synthesis
new
molecules.Areas
covered:
This
perspective
provides
an
overview
application
several
areas
relevant
property
prediction,
molecule
generation,
image
analysis,
organic
planning.Expert
opinion:
While
variety
machine
learning
methods
routinely
used
predict
biological
activity
ADME
properties,
representing
molecules
continue
evolve.
Molecule
generation
relatively
unproven
but
hold
potential
access
new,
unexplored
chemical
space.
The
will
benefit
dedicated
research,
well
developments
fields.
With
this
pairing
algorithmic
advancements
high-quality
impact
grow
coming
years.
Journal of Cheminformatics,
Journal Year:
2022,
Volume and Issue:
14(1)
Published: Jan. 10, 2022
In
this
paper,
we
present
a
data-driven
method
for
the
uncertainty-aware
prediction
of
chemical
reaction
yields.
The
reactants
and
products
in
are
represented
as
set
molecular
graphs.
predictive
distribution
yield
is
modeled
graph
neural
network
that
directly
processes
graphs
with
permutation
invariance.
Uncertainty-aware
learning
inference
applied
to
model
make
accurate
predictions
evaluate
their
uncertainty.
We
demonstrate
effectiveness
proposed
on
benchmark
datasets
various
settings.
Compared
existing
methods,
improves
uncertainty
quantification
performance
most