Journal of Cheminformatics,
Journal Year:
2023,
Volume and Issue:
15(1)
Published: July 25, 2023
Abstract
Explainable
machine
learning
is
increasingly
used
in
drug
discovery
to
help
rationalize
compound
property
predictions.
Feature
attribution
techniques
are
popular
choices
identify
which
molecular
substructures
responsible
for
a
predicted
change.
However,
established
feature
methods
have
so
far
displayed
low
performance
deep
algorithms
such
as
graph
neural
networks
(GNNs),
especially
when
compared
with
simpler
modeling
alternatives
random
forests
coupled
atom
masking.
To
mitigate
this
problem,
modification
of
the
regression
objective
GNNs
proposed
specifically
account
common
core
structures
between
pairs
molecules.
The
presented
approach
shows
higher
accuracy
on
recently-proposed
explainability
benchmark.
This
methodology
has
potential
assist
model
pipelines,
particularly
lead
optimization
efforts
where
specific
chemical
series
investigated.
ACS Catalysis,
Journal Year:
2023,
Volume and Issue:
13(21), P. 13863 - 13895
Published: Oct. 13, 2023
Recent
progress
in
engineering
highly
promising
biocatalysts
has
increasingly
involved
machine
learning
methods.
These
methods
leverage
existing
experimental
and
simulation
data
to
aid
the
discovery
annotation
of
enzymes,
as
well
suggesting
beneficial
mutations
for
improving
known
targets.
The
field
protein
is
gathering
steam,
driven
by
recent
success
stories
notable
other
areas.
It
already
encompasses
ambitious
tasks
such
understanding
predicting
structure
function,
catalytic
efficiency,
enantioselectivity,
dynamics,
stability,
solubility,
aggregation,
more.
Nonetheless,
still
evolving,
with
many
challenges
overcome
questions
address.
In
this
Perspective,
we
provide
an
overview
ongoing
trends
domain,
highlight
case
studies,
examine
current
limitations
learning-based
We
emphasize
crucial
importance
thorough
validation
emerging
models
before
their
use
rational
design.
present
our
opinions
on
fundamental
problems
outline
potential
directions
future
research.
Journal of Cheminformatics,
Journal Year:
2024,
Volume and Issue:
16(1)
Published: March 14, 2024
Abstract
In
materials
science,
accurately
computing
properties
like
viscosity,
melting
point,
and
glass
transition
temperatures
solely
through
physics-based
models
is
challenging.
Data-driven
machine
learning
(ML)
also
poses
challenges
in
constructing
ML
models,
especially
the
material
science
domain
where
data
limited.
To
address
this,
we
integrate
physics-informed
descriptors
from
molecular
dynamics
(MD)
simulations
to
enhance
accuracy
interpretability
of
models.
Our
current
study
focuses
on
predicting
viscosity
liquid
systems
using
MD
descriptors.
this
work,
curated
a
comprehensive
dataset
over
4000
small
organic
molecules’
viscosities
scientific
literature,
publications,
online
databases.
This
enabled
us
develop
quantitative
structure–property
relationships
(QSPR)
consisting
descriptor-based
graph
neural
network
predict
temperature-dependent
for
wide
range
viscosities.
The
QSPR
reveal
that
including
improves
prediction
experimental
viscosities,
particularly
at
set
scale
fewer
than
thousand
points.
Furthermore,
feature
importance
tools
intermolecular
interactions
captured
by
are
most
important
predictions.
Finally,
can
capture
inverse
relationship
between
temperature
six
battery-relevant
solvents,
some
which
were
not
included
original
set.
research
highlights
effectiveness
incorporating
into
leads
improved
difficult
when
alone
or
limited
available.
Graphical
Science Advances,
Journal Year:
2024,
Volume and Issue:
10(1)
Published: Jan. 5, 2024
Phase-separated
biomolecular
condensates
exhibit
a
wide
range
of
dynamic
properties,
which
depend
on
the
sequences
constituent
proteins
and
RNAs.
However,
it
is
unclear
to
what
extent
condensate
dynamics
can
be
tuned
without
also
changing
thermodynamic
properties
that
govern
phase
separation.
Using
coarse-grained
simulations
intrinsically
disordered
proteins,
we
show
thermodynamics
homopolymer
are
strongly
correlated,
with
increased
stability
being
coincident
low
mobilities
high
viscosities.
We
then
apply
an
“active
learning”
strategy
identify
heteropolymer
break
this
correlation.
This
data-driven
approach
accompanying
analysis
reveal
how
heterogeneous
amino
acid
compositions
nonuniform
sequence
patterning
map
independently
tunable
condensates.
Our
results
highlight
key
molecular
determinants
governing
physical
establish
design
rules
for
development
stimuli-responsive
biomaterials.
The
year
2024
marks
the
50th
anniversary
of
discovery
surface-enhanced
Raman
spectroscopy
(SERS).
Over
recent
years,
SERS
has
experienced
rapid
development
and
became
a
critical
tool
in
biomedicine
with
its
unparalleled
sensitivity
molecular
specificity.
This
review
summarizes
advancements
challenges
substrates,
nanotags,
instrumentation,
spectral
analysis
for
biomedical
applications.
We
highlight
key
developments
colloidal
solid
an
emphasis
on
surface
chemistry,
hotspot
design,
3D
hydrogel
plasmonic
architectures.
Additionally,
we
introduce
innovations
including
those
interior
gaps,
orthogonal
reporters,
near-infrared-II-responsive
properties,
along
biomimetic
coatings.
Emerging
technologies
such
as
optical
tweezers,
nanopores,
wearable
sensors
have
expanded
capabilities
single-cell
single-molecule
analysis.
Advances
analysis,
signal
digitalization,
denoising,
deep
learning
algorithms,
improved
quantification
complex
biological
data.
Finally,
this
discusses
applications
nucleic
acid
detection,
protein
characterization,
metabolite
monitoring,
vivo
spectroscopy,
emphasizing
potential
liquid
biopsy,
metabolic
phenotyping,
extracellular
vesicle
diagnostics.
concludes
perspective
clinical
translation
SERS,
addressing
commercialization
potentials
tissue
sensing
imaging.
Chemical Science,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 9, 2024
Large
language
models
(LLMs)
have
emerged
as
powerful
tools
in
chemistry,
significantly
impacting
molecule
design,
property
prediction,
and
synthesis
optimization.
This
review
highlights
LLM
capabilities
these
domains
their
potential
to
accelerate
scientific
discovery
through
automation.
We
also
LLM-based
autonomous
agents:
LLMs
with
a
broader
set
of
interact
surrounding
environment.
These
agents
perform
diverse
tasks
such
paper
scraping,
interfacing
automated
laboratories,
planning.
As
are
an
emerging
topic,
we
extend
the
scope
our
beyond
chemistry
discuss
across
any
domains.
covers
recent
history,
current
capabilities,
design
agents,
addressing
specific
challenges,
opportunities,
future
directions
chemistry.
Key
challenges
include
data
quality
integration,
model
interpretability,
need
for
standard
benchmarks,
while
point
towards
more
sophisticated
multi-modal
enhanced
collaboration
between
experimental
methods.
Due
quick
pace
this
field,
repository
has
been
built
keep
track
latest
studies:
https://github.com/ur-whitelab/LLMs-in-science.
Molecular Informatics,
Journal Year:
2025,
Volume and Issue:
44(3)
Published: March 1, 2025
Feature
attribution
methods
from
explainable
artificial
intelligence
(XAI)
provide
explanations
of
machine
learning
models
by
quantifying
feature
importance
for
predictions
test
instances.
While
features
determining
individual
have
frequently
been
identified
in
applications,
the
consistency
importance-based
using
different
has
not
thoroughly
investigated.
We
systematically
compared
model
molecular
learning.
Therefore,
a
system
highly
accurate
compound
activity
targets
was
generated.
For
these
predictions,
were
computed
methodological
variants
Shapley
value
formalism,
popular
approach
adapted
game
theory.
Predictions
each
assessed
model-agnostic
and
model-specific
value-based
method.
The
resulting
distributions
characterized
global
statistical
analysis
diverse
measures.
Unexpectedly,
calculations
yielded
distinct
predictions.
There
only
little
agreement
between
alternative
explanations.
Our
findings
suggest
that
should
include
an
assessment
methods.
ACS Chemical Neuroscience,
Journal Year:
2024,
Volume and Issue:
15(11), P. 2144 - 2159
Published: May 9, 2024
The
local
interpretable
model-agnostic
explanation
(LIME)
method
was
used
to
interpret
two
machine
learning
models
of
compounds
penetrating
the
blood–brain
barrier.
classification
models,
Random
Forest,
ExtraTrees,
and
Deep
Residual
Network,
were
trained
validated
using
barrier
penetration
dataset,
which
shows
penetrability
in
LIME
able
create
explanations
for
such
penetrability,
highlighting
most
important
substructures
molecules
that
affect
drug
simple
intuitive
outputs
prove
applicability
this
explainable
model
interpreting
permeability
across
terms
molecular
features.
filtered
with
a
weight
equal
or
greater
than
0.1
obtain
only
relevant
explanations.
results
showed
several
structures
are
penetration.
In
general,
it
found
some
nitrogenous
more
likely
permeate
application
these
structural
may
help
pharmaceutical
industry
potential
synthesis
research
groups
synthesize
active
rationally.
Journal of the American Chemical Society,
Journal Year:
2023,
Volume and Issue:
145(41), P. 22584 - 22598
Published: Oct. 9, 2023
The
use
of
sophisticated
machine
learning
(ML)
models,
such
as
graph
neural
networks
(GNNs),
to
predict
complex
molecular
properties
or
all
kinds
spectra
has
grown
rapidly.
However,
ensuring
the
interpretability
these
models'
predictions
remains
a
challenge.
For
example,
rigorous
understanding
predicted
X-ray
absorption
spectrum
(XAS)
generated
by
ML
models
requires
an
in-depth
investigation
respective
black-box
model
used.
Here,
this
is
done
for
different
GNNs
based
on
comprehensive,
custom-generated
XAS
data
set
small
organic
molecules.
We
show
that
thorough
analysis
with
respect
local
and
global
environments
considered
in
each
essential
selection
appropriate
allows
robust
prediction.
Moreover,
we
employ
feature
attribution
determine
contributions
various
atoms
molecules
peaks
observed
spectrum.
By
comparing
peak
assignment
core
virtual
orbitals
from
quantum
chemical
calculations
underlying
our
set,
demonstrate
it
possible
relate
atomic
via