Scientific Reports,
Год журнала:
2025,
Номер
15(1)
Опубликована: Янв. 18, 2025
Abstract
The
problem
of
protein
structure
determination
is
usually
solved
by
X-ray
crystallography.
Several
in
silico
deep
learning
methods
have
been
developed
to
overcome
the
high
attrition
rate,
cost
experiments
and
extensive
trial-and-error
settings,
for
predicting
crystallization
propensities
proteins
based
on
their
sequences.
In
this
work,
we
benchmark
power
open
language
models
(PLMs)
through
TRILL
platform,
a
be-spoke
framework
democratizing
usage
PLMs
task
proteins.
By
comparing
LightGBM
/
XGBoost
classifiers
built
average
embedding
representations
learned
different
PLMs,
such
as
ESM2,
Ankh,
ProtT5-XL,
ProstT5,
xTrimoPGLM,
SaProt
with
performance
state-of-the-art
sequence-based
like
DeepCrystal,
ATTCrys
CLPred,
identify
most
effective
outcomes.
utilizing
embeddings
from
ESM2
model
30
36
transformer
layers
150
3000
million
parameters
respectively
gains
3-
$$5\%$$
than
all
compared
various
evaluation
metrics,
including
AUPR
(Area
Under
Precision-Recall
Curve),
AUC
Receiver
Operating
Characteristic
F1
independent
test
sets.
Furthermore,
fine-tune
ProtGPT2
available
via
generate
crystallizable
Starting
generated
step
filtration
processes
consensus
PLM-based
classifiers,
sequence
identity
CD-HIT,
secondary
compatibility,
aggregation
screening,
homology
search
foldability
evaluation,
identified
set
5
novel
potentially
crystallizable.
ACS Central Science,
Год журнала:
2024,
Номер
10(2), С. 226 - 241
Опубликована: Фев. 5, 2024
Enzymes
can
be
engineered
at
the
level
of
their
amino
acid
sequences
to
optimize
key
properties
such
as
expression,
stability,
substrate
range,
and
catalytic
efficiency-or
even
unlock
new
activities
not
found
in
nature.
Because
search
space
possible
proteins
is
vast,
enzyme
engineering
usually
involves
discovering
an
starting
point
that
has
some
desired
activity
followed
by
directed
evolution
improve
its
"fitness"
for
a
application.
Recently,
machine
learning
(ML)
emerged
powerful
tool
complement
this
empirical
process.
ML
models
contribute
(1)
discovery
functional
annotation
known
protein
or
generating
novel
with
functions
(2)
navigating
fitness
landscapes
optimization
mappings
between
associated
values.
In
Outlook,
we
explain
how
complements
discuss
future
potential
improved
outcomes.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Дек. 23, 2023
Abstract
Protein
sequence
design
in
the
context
of
small
molecules,
nucleotides,
and
metals
is
critical
to
enzyme
molecule
binder
sensor
design,
but
current
state-of-the-art
deep
learning-based
methods
are
unable
model
non-protein
atoms
molecules.
Here,
we
describe
a
protein
method
called
LigandMPNN
that
explicitly
models
all
components
biomolecular
systems.
significantly
outperforms
Rosetta
ProteinMPNN
on
native
backbone
recovery
for
residues
interacting
with
molecules
(63.3%
vs.
50.4%
&
50.5%),
nucleotides
(50.5%
35.2%
34.0%),
(77.5%
36.0%
40.6%).
generates
not
only
sequences
also
sidechain
conformations
allow
detailed
evaluation
binding
interactions.
Experimental
characterization
demonstrates
can
generate
DNA-binding
proteins
high
affinity
specificity.
One-sentence
summary
We
present
allows
explicit
modeling
molecule,
nucleotide,
metal,
other
atomic
contexts.
Science,
Год журнала:
2024,
Номер
384(6691), С. 106 - 112
Опубликована: Апрель 4, 2024
The
de
novo
design
of
small
molecule-binding
proteins
has
seen
exciting
recent
progress;
however,
high-affinity
binding
and
tunable
specificity
typically
require
laborious
screening
optimization
after
computational
design.
We
developed
a
procedure
to
protein
that
recognizes
common
pharmacophore
in
series
poly(ADP-ribose)
polymerase-1
inhibitors.
One
three
designed
bound
different
inhibitors
with
affinities
ranging
from
<5
nM
low
micromolar.
X-ray
crystal
structures
confirmed
the
accuracy
protein-drug
interactions.
Molecular
dynamics
simulations
informed
role
water
binding.
Binding
free
energy
calculations
performed
directly
on
models
were
excellent
agreement
experimentally
measured
affinities.
conclude
tuned
interaction
energies
is
feasible
entirely
computation.
Current Opinion in Structural Biology,
Год журнала:
2024,
Номер
87, С. 102829 - 102829
Опубликована: Июнь 6, 2024
Structure-based
virtual
screening
aims
to
find
molecules
forming
favorable
interactions
with
a
biological
macromolecule
using
computational
models
of
complexes.
The
recent
surge
commercially
available
chemical
space
provides
the
opportunity
search
for
ligands
therapeutic
targets
among
billions
compounds.
This
review
offers
compact
overview
structure-based
screens
vast
spaces,
highlighting
successful
applications
in
early
drug
discovery
therapeutically
important
such
as
G
protein-coupled
receptors
and
viral
enzymes.
Emphasis
is
placed
on
strategies
explore
ultra-large
libraries
synergies
emerging
machine
learning
techniques.
current
opportunities
future
challenges
are
discussed,
indicating
that
this
approach
will
play
an
role
next-generation
pipeline.
Accounts of Chemical Research,
Год журнала:
2024,
Номер
57(10), С. 1500 - 1509
Опубликована: Апрель 5, 2024
ConspectusMolecular
docking,
also
termed
ligand
docking
(LD),
is
a
pivotal
element
of
structure-based
virtual
screening
(SBVS)
used
to
predict
the
binding
conformations
and
affinities
protein–ligand
complexes.
Traditional
LD
methodologies
rely
on
search
scoring
framework,
utilizing
heuristic
algorithms
explore
functions
evaluate
strengths.
However,
meet
efficiency
demands
SBVS,
these
are
often
simplified,
prioritizing
speed
over
accuracy.The
emergence
deep
learning
(DL)
has
exerted
profound
impact
diverse
fields,
ranging
from
natural
language
processing
computer
vision
drug
discovery.
DeepMind's
AlphaFold2
impressively
exhibited
its
ability
accurately
protein
structures
solely
amino
acid
sequences,
highlighting
remarkable
potential
DL
in
conformation
prediction.
This
groundbreaking
advancement
circumvents
traditional
search-scoring
frameworks
LD,
enhancing
both
accuracy
thereby
catalyzing
broader
adoption
pose
Nevertheless,
consensus
certain
aspects
remains
elusive.In
this
Account,
we
delineate
current
status
employing
augment
within
VS
paradigm,
our
contributions
domain.
Furthermore,
discuss
challenges
future
prospects,
drawing
insights
scholarly
investigations.
Initially,
present
an
overview
followed
by
introduction
paradigms,
which
deviate
significantly
frameworks.
Subsequently,
delve
into
associated
with
development
DL-based
(DLLD),
encompassing
evaluation
metrics,
application
scenarios,
physical
plausibility
predicted
conformations.
In
algorithms,
it
essential
recognize
multifaceted
nature
metrics.
While
prediction,
measured
success
rate,
aspect,
scoring/screening
power
computational
equally
important
given
role
tools
VS.
Regarding
early
methods
focused
blind
where
site
unknown.
recent
studies
suggest
shift
toward
identifying
sites
rather
than
predicting
poses
models.
contrast,
known
pocket
been
shown
be
more
practical.
Physical
another
significant
challenge.
Although
DLLD
models
achieve
higher
rates
compared
methods,
they
may
generate
implausible
local
structures,
such
as
incorrect
bond
angles
or
lengths,
disadvantageous
for
postprocessing
tasks
like
visualization.
Finally,
perspectives
DLLD,
emphasizing
need
improve
generalization
ability,
strike
balance
between
accuracy,
account
flexibility,
enhance
plausibility.
Additionally,
comparison
generative
regression
context,
exploring
their
respective
strengths
potential.
Wiley Interdisciplinary Reviews Computational Molecular Science,
Год журнала:
2024,
Номер
14(2)
Опубликована: Март 1, 2024
Abstract
Generative
AI
is
rapidly
transforming
the
frontier
of
research
in
computational
structural
biology.
Indeed,
recent
successes
have
substantially
advanced
protein
design
and
drug
discovery.
One
key
methodologies
underlying
these
advances
diffusion
models
(DM).
Diffusion
originated
computer
vision,
taking
over
image
generation
offering
superior
quality
performance.
These
were
subsequently
extended
modified
for
uses
other
areas
including
DMs
are
well
equipped
to
model
high
dimensional,
geometric
data
while
exploiting
strengths
deep
learning.
In
biology,
example,
they
achieved
state‐of‐the‐art
results
on
3D
structure
small
molecule
docking.
This
review
covers
basics
models,
associated
modeling
choices
regarding
molecular
representations,
capabilities,
prevailing
heuristics,
as
limitations
forthcoming
refinements.
We
also
provide
best
practices
around
evaluation
procedures
help
establish
rigorous
benchmarking
evaluation.
The
intended
a
fresh
view
into
highlight
its
potentials
current
challenges
generative
techniques
article
categorized
under:
Data
Science
>
Artificial
Intelligence/Machine
Learning
Structure
Mechanism
Molecular
Structures
Software
Modeling