bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 8, 2025
A
variety
of
deep
generative
models
have
been
adopted
to
perform
de
novo
functional
protein
generation.
Compared
3D
design,
sequence-based
generation
methods,
which
aim
generate
amino
acid
sequences
with
desired
functions,
remain
a
major
approach
for
due
the
abundance
and
quality
sequence
data,
as
well
relatively
low
modeling
complexity
training.
Although
these
are
typically
trained
match
from
training
exact
matching
every
is
not
always
essential.
Certain
changes
(e.g.,
mismatches,
insertions,
deletions)
may
necessarily
lead
changes.
This
suggests
that
maximizing
data
likelihood
beyond
space
could
yield
better
models.
Pre-trained
large
language
(PLMs)
like
ESM2
can
encode
into
latent
space,
potentially
serving
validators.
We
propose
by
simultaneously
optimizing
in
both
derived
PLM.
scheme
also
be
viewed
knowledge
distillation
dynamically
re-weights
samples
during
applied
our
method
train
GPT-
(i.e.,
autoregressive
transformers)
antimicrobial
peptide
(AMP)
malate
dehydrogenase
(MDH)
tasks.
Computational
experiments
confirmed
outperformed
various
adversarial
net,
variational
autoencoder,
GPT
model
without
proposed
strategy)
on
tasks,
demonstrating
effectiveness
multi-likelihood
optimization
strategy.
Protein Science,
Journal Year:
2024,
Volume and Issue:
33(7)
Published: June 22, 2024
Antimicrobial
resistance
is
a
critical
public
health
concern,
necessitating
the
exploration
of
alternative
treatments.
While
antimicrobial
peptides
(AMPs)
show
promise,
assessing
their
toxicity
using
traditional
wet
lab
methods
both
time-consuming
and
costly.
We
introduce
tAMPer,
novel
multi-modal
deep
learning
model
designed
to
predict
peptide
by
integrating
underlying
amino
acid
sequence
composition
three-dimensional
structure
peptides.
tAMPer
adopts
graph-based
representation
for
peptides,
encoding
ColabFold-predicted
structures,
where
nodes
represent
acids
edges
spatial
interactions.
Structural
features
are
extracted
graph
neural
networks,
recurrent
networks
capture
sequential
dependencies.
tAMPer's
performance
was
assessed
on
publicly
available
protein
benchmark
an
AMP
hemolysis
data
we
generated.
On
latter,
achieves
F1-score
68.7%,
outperforming
second-best
method
23.4%.
benchmark,
exhibited
improvement
over
3.0%
in
compared
current
state-of-the-art
methods.
anticipate
accelerate
discovery
development
reducing
reliance
laborious
screening
experiments.
Scientific Data,
Journal Year:
2024,
Volume and Issue:
11(1)
Published: May 25, 2024
Abstract
With
the
discovery
of
therapeutic
activity
peptides,
they
have
emerged
as
a
promising
class
anti-cancer
agents
due
to
their
specific
targeting,
low
toxicity,
and
potential
for
high
selectivity.
In
particular,
peptide-drug
conjugates
enter
clinical,
coupling
targeted
peptides
with
traditional
chemotherapy
drugs
or
cytotoxic
will
become
new
direction
in
cancer
treatment.
To
facilitate
drug
development
therapy
we
constructed
DCTPep,
novel,
open,
comprehensive
database
peptides.
addition
anticancer
(ACPs),
peptide
library
also
includes
related
therapy.
These
data
were
collected
manually
from
published
research
articles,
patents,
other
protein
databases.
Data
on
include
clinically
investigated
and/or
approved
therapy,
which
mainly
come
portal
websites
regulatory
authorities
organisations
different
countries
regions.
DCTPep
has
total
6214
entries,
believe
that
contribute
design
screening
future
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 8, 2025
A
variety
of
deep
generative
models
have
been
adopted
to
perform
de
novo
functional
protein
generation.
Compared
3D
design,
sequence-based
generation
methods,
which
aim
generate
amino
acid
sequences
with
desired
functions,
remain
a
major
approach
for
due
the
abundance
and
quality
sequence
data,
as
well
relatively
low
modeling
complexity
training.
Although
these
are
typically
trained
match
from
training
exact
matching
every
is
not
always
essential.
Certain
changes
(e.g.,
mismatches,
insertions,
deletions)
may
necessarily
lead
changes.
This
suggests
that
maximizing
data
likelihood
beyond
space
could
yield
better
models.
Pre-trained
large
language
(PLMs)
like
ESM2
can
encode
into
latent
space,
potentially
serving
validators.
We
propose
by
simultaneously
optimizing
in
both
derived
PLM.
scheme
also
be
viewed
knowledge
distillation
dynamically
re-weights
samples
during
applied
our
method
train
GPT-
(i.e.,
autoregressive
transformers)
antimicrobial
peptide
(AMP)
malate
dehydrogenase
(MDH)
tasks.
Computational
experiments
confirmed
outperformed
various
adversarial
net,
variational
autoencoder,
GPT
model
without
proposed
strategy)
on
tasks,
demonstrating
effectiveness
multi-likelihood
optimization
strategy.