bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 8, 2025
A
variety
of
deep
generative
models
have
been
adopted
to
perform
de
novo
functional
protein
generation.
Compared
3D
design,
sequence-based
generation
methods,
which
aim
generate
amino
acid
sequences
with
desired
functions,
remain
a
major
approach
for
due
the
abundance
and
quality
sequence
data,
as
well
relatively
low
modeling
complexity
training.
Although
these
are
typically
trained
match
from
training
exact
matching
every
is
not
always
essential.
Certain
changes
(e.g.,
mismatches,
insertions,
deletions)
may
necessarily
lead
changes.
This
suggests
that
maximizing
data
likelihood
beyond
space
could
yield
better
models.
Pre-trained
large
language
(PLMs)
like
ESM2
can
encode
into
latent
space,
potentially
serving
validators.
We
propose
by
simultaneously
optimizing
in
both
derived
PLM.
scheme
also
be
viewed
knowledge
distillation
dynamically
re-weights
samples
during
applied
our
method
train
GPT-
(i.e.,
autoregressive
transformers)
antimicrobial
peptide
(AMP)
malate
dehydrogenase
(MDH)
tasks.
Computational
experiments
confirmed
outperformed
various
adversarial
net,
variational
autoencoder,
GPT
model
without
proposed
strategy)
on
tasks,
demonstrating
effectiveness
multi-likelihood
optimization
strategy.
Nucleic Acids Research,
Journal Year:
2023,
Volume and Issue:
51(W1), P. W432 - W437
Published: May 11, 2023
Abstract
Accurate
and
fast
structure
prediction
of
peptides
less
40
amino
acids
in
aqueous
solution
has
many
biological
applications,
but
their
conformations
are
pH-
salt
concentration-dependent.
In
this
work,
we
present
PEP-FOLD4
which
goes
one
step
beyond
machine-learning
approaches,
such
as
AlphaFold2,
TrRosetta
RaptorX.
Adding
the
Debye-Hueckel
formalism
for
charged-charged
side
chain
interactions
to
a
Mie
all
intramolecular
(backbone
chain)
interactions,
PEP-FOLD4,
based
on
coarse-grained
representation
peptides,
performs
well
methods
well-structured
displays
significant
improvements
poly-charged
peptides.
is
available
at
http://bioserv.rpbs.univ-paris-diderot.fr/services/PEP-FOLD4.
This
server
free
there
no
login
requirement.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Feb. 26, 2023
ABSTRACT
Deep
learning
networks
offer
considerable
opportunities
for
accurate
structure
prediction
and
design
of
biomolecules.
While
cyclic
peptides
have
gained
significant
traction
as
a
therapeutic
modality,
developing
deep
methods
designing
such
has
been
slow,
mostly
due
to
the
small
number
available
structures
molecules
in
this
size
range.
Here,
we
report
approaches
modify
AlphaFold
network
peptides.
Our
results
show
approach
can
accurately
predict
native
from
single
sequence,
with
36
out
49
cases
predicted
high
confidence
(pLDDT
>
0.85)
matching
root
mean
squared
deviation
(RMSD)
less
than
1.5
Å.
Further
extending
our
approach,
describe
computational
sequences
peptide
backbones
generated
by
other
backbone
sampling
de
novo
new
macrocyclic
We
extensively
sampled
structural
diversity
between
7–13
amino
acids,
identified
around
10,000
unique
candidates
fold
into
designed
confidence.
X-ray
crystal
seven
diverse
sizes
match
very
closely
models
(root
<
1.0
Å),
highlighting
atomic
level
accuracy
approach.
The
scaffolds
developed
here
provide
basis
custom-designing
targeted
applications.
Viruses,
Journal Year:
2023,
Volume and Issue:
15(4), P. 820 - 820
Published: March 23, 2023
Viruses
with
rapid
replication
and
easy
mutation
can
become
resistant
to
antiviral
drug
treatment.
With
novel
viral
infections
emerging,
such
as
the
recent
COVID-19
pandemic,
therapies
are
urgently
needed.
Antiviral
proteins,
interferon,
have
been
used
for
treating
chronic
hepatitis
C
decades.
Natural-origin
antimicrobial
peptides,
defensins,
also
identified
possessing
activities,
including
direct
effects
ability
induce
indirect
immune
responses
viruses.
To
promote
development
of
drugs,
we
constructed
a
data
repository
peptides
proteins
(DRAVP).
The
database
provides
general
information,
activity,
structure
physicochemical
literature
information
proteins.
Because
most
lack
experimentally
determined
structures,
AlphaFold
was
predict
each
peptide’s
structure.
A
free
website
users
(http://dravp.cpu-bioinfor.org/,
accessed
on
30
August
2022)
facilitate
retrieval
sequence
analysis.
Additionally,
all
be
from
web
interface.
DRAVP
aims
useful
resource
developing
drugs.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 6, 2024
Abstract
Denoising
Diffusion
Probabilistic
Models
(DDPMs)
have
emerged
as
a
potent
class
of
generative
models,
demonstrating
exemplary
performance
across
diverse
AI
domains
such
computer
vision
and
natural
language
processing.
In
the
realm
protein
design,
while
there
been
advances
in
structure-based,
graph-based,
discrete
sequence-based
diffusion,
exploration
continuous
latent
space
diffusion
within
models
(pLMs)
remains
nascent.
this
work,
we
introduce
AMP-Diffusion,
model
tailored
for
antimicrobial
peptide
(AMP)
harnessing
capabilities
state-of-the-art
pLM,
ESM-2,
to
de
novo
generate
functional
AMPs
downstream
experimental
application.
Our
evaluations
reveal
that
peptides
generated
by
AMP-Diffusion
align
closely
both
pseudo-perplexity
amino
acid
diversity
when
benchmarked
against
experimentally-validated
AMPs,
further
exhibit
relevant
physicochemical
properties
similar
these
naturally-occurring
sequences.
Overall,
findings
underscore
biological
plausibility
our
sequences
pave
way
their
empirical
validation.
total,
framework
motivates
future
pLM-based
design.
Proceedings of the National Academy of Sciences,
Journal Year:
2024,
Volume and Issue:
121(44)
Published: Oct. 24, 2024
Interactions
mediated
by
intrinsically
disordered
protein
regions
(IDRs)
pose
formidable
challenges
in
structural
characterization.
IDRs
are
highly
versatile,
capable
of
adopting
diverse
structures
and
engagement
modes.
Motivated
recent
strides
structure
prediction,
we
embarked
on
exploring
the
extent
to
which
AlphaFold-Multimer
can
faithfully
reproduce
intricacies
interactions
involving
IDRs.
To
this
end,
gathered
multiple
datasets
covering
versatile
spectrum
IDR
binding
modes
used
them
probe
AlphaFold-Multimer’s
prediction
their
dynamics.
Our
analyses
revealed
that
is
not
only
predicting
various
types
bound
with
high
success
rate,
but
distinguishing
true
from
decoys,
unreliable
predictions
accurate
ones
achievable
appropriate
use
intrinsic
scores.
We
found
quality
drops
for
more
heterogeneous,
fuzzy
interaction
types,
most
likely
due
lower
interface
hydrophobicity
higher
coil
content.
Notably
though,
certain
scores,
such
as
Predicted
Aligned
Error
residue-ipTM,
correlated
heterogeneity
IDR,
enabling
clear
distinctions
between
homogeneous
Finally,
our
benchmarking
also
be
successful
when
using
full-length
proteins,
cognate
facilitate
identification
a
given
partner,
established
“minD,”
pinpoints
potential
sites
protein.
study
demonstrates
correctly
identify
interacting
predict
mode
partner.