Proteins Structure Function and Bioinformatics,
Journal Year:
2023,
Volume and Issue:
91(12), P. 1539 - 1549
Published: Nov. 2, 2023
Abstract
Computing
protein
structure
from
amino
acid
sequence
information
has
been
a
long‐standing
grand
challenge.
Critical
assessment
of
prediction
(CASP)
conducts
community
experiments
aimed
at
advancing
solutions
to
this
and
related
problems.
Experiments
are
conducted
every
2
years.
The
2020
experiment
(CASP14)
saw
major
progress,
with
the
second
generation
deep
learning
methods
delivering
accuracy
comparable
for
many
single
proteins.
There
is
an
expectation
that
these
will
have
much
wider
application
in
computational
structural
biology.
Here
we
summarize
results
most
recent
experiment,
CASP15,
2022,
emphasis
on
new
learning‐driven
progress.
Other
papers
special
issue
proteins
provide
more
detailed
analysis.
For
structures,
AlphaFold2
method
still
superior
other
approaches,
but
there
two
points
note.
First,
although
was
core
all
successful
methods,
wide
variety
implementation
combination
methods.
Second,
using
standard
protocol
default
parameters
only
produces
highest
quality
result
about
thirds
targets,
extensive
sampling
required
others.
advance
CASP
enormous
increase
computed
complexes,
achieved
by
use
overall
do
not
fully
match
performance
too,
based
perform
best,
again
than
defaults
often
required.
Also
note
encouraging
early
compute
ensembles
macromolecular
structures.
Critically
usability
both
derived
estimates
local
global
high
quality,
however
interface
regions
slightly
less
reliable.
CASP15
also
included
computation
RNA
structures
first
time.
Here,
classical
approaches
produced
better
agreement
ones,
limited.
Also,
time,
protein–ligand
area
interest
drug
design.
were
ones.
Many
discussed
conference,
it
clear
continue
advance.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Jan. 18, 2023
Abstract
As
opposed
to
scaling-up
protein
language
models
(PLMs),
we
seek
improving
performance
via
protein-specific
optimization.
Although
the
proportionality
between
model
size
and
richness
of
its
learned
representations
is
validated,
prioritize
accessibility
pursue
a
path
data-efficient,
cost-reduced,
knowledge-guided
Through
over
twenty
experiments
ranging
from
masking,
architecture,
pre-training
data,
derive
insights
experimentation
into
building
that
interprets
life,
optimally.
We
present
Ankh,
first
general-purpose
PLM
trained
on
Google’s
TPU-v4
surpassing
state-of-the-art
with
fewer
parameters
(<10%
for
pre-training,
<7%
inference,
<30%
embedding
dimension).
provide
representative
range
structure
function
benchmarks
where
Ankh
excels.
further
variant
generation
analysis
High-N
One-N
input
data
scales
succeeds
in
learning
evolutionary
conservation-mutation
trends
introducing
functional
diversity
while
retaining
key
structural-functional
characteristics.
dedicate
our
work
promoting
research
innovation
attainable
resources.
Communications Biology,
Journal Year:
2023,
Volume and Issue:
6(1)
Published: Feb. 8, 2023
Abstract
Deep-learning
(DL)
methods
like
DeepMind’s
AlphaFold2
(AF2)
have
led
to
substantial
improvements
in
protein
structure
prediction.
We
analyse
confident
AF2
models
from
21
model
organisms
using
a
new
classification
protocol
(CATH-Assign)
which
exploits
novel
DL
for
structural
comparison
and
classification.
Of
~370,000
models,
92%
can
be
assigned
3253
superfamilies
our
CATH
domain
superfamily
The
remaining
cluster
into
2367
putative
superfamilies.
Detailed
manual
analysis
on
618
of
these,
having
at
least
one
human
relative,
reveal
extremely
remote
homologies
further
unusual
features.
Only
25
could
confirmed.
Although
most
map
existing
superfamilies,
domains
expand
by
67%
increases
the
number
unique
‘global’
folds
36%
will
provide
valuable
insights
function
relationships.
CATH-Assign
harness
huge
expansion
data
provided
DeepMind
rationalise
evolutionary
changes
driving
functional
divergence.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: July 25, 2023
Abstract
Adapting
large
language
models
(LLMs)
to
protein
sequences
spawned
the
development
of
powerful
(pLMs).
Concurrently,
AlphaFold2
broke
through
in
structure
prediction.
Now
we
can
systematically
and
comprehensively
explore
dual
nature
proteins
that
act
exist
as
three-dimensional
(3D)
machines
evolve
linear
strings
one-dimensional
(1D)
sequences.
Here,
leverage
pLMs
simultaneously
model
both
modalities
by
combining
1D
with
3D
a
single
model.
We
encode
structures
token
using
3Di-alphabet
introduced
3D-alignment
method
Foldseek
.
This
new
foundation
pLM
extracts
features
patterns
resulting
“structure-sequence”
representation.
Toward
this
end,
built
non-redundant
dataset
from
AlphaFoldDB
fine-tuned
an
existing
(ProtT5)
translate
between
3Di
amino
acid
As
proof-of-concept
for
our
novel
approach,
dubbed
Protein
structure-sequence
T5
(
ProstT5
),
showed
improved
performance
subsequent
prediction
tasks,
“inverse
folding”,
namely
generation
adopting
given
structural
scaffold
(“fold”).
Our
work
showcased
potential
tap
into
information-rich
revolution
fueled
AlphaFold2.
paves
way
develop
tools
integrating
vast
resource
predictions,
opens
research
avenues
post-AlphaFold2
era.
is
freely
available
all
at
https://github.com/mheinzinger/ProstT5
Science,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 16, 2025
More
than
three
billion
years
of
evolution
have
produced
an
image
biology
encoded
into
the
space
natural
proteins.
Here
we
show
that
language
models
trained
at
scale
on
evolutionary
data
can
generate
functional
proteins
are
far
away
from
known
We
present
ESM3,
a
frontier
multimodal
generative
model
reasons
over
sequence,
structure,
and
function
ESM3
follow
complex
prompts
combining
its
modalities
is
highly
responsive
to
alignment
improve
fidelity.
prompted
fluorescent
Among
generations
synthesized,
found
bright
protein
distance
(58%
sequence
identity)
proteins,
which
estimate
equivalent
simulating
five
hundred
million
evolution.
Proteins Structure Function and Bioinformatics,
Journal Year:
2023,
Volume and Issue:
91(12), P. 1539 - 1549
Published: Nov. 2, 2023
Abstract
Computing
protein
structure
from
amino
acid
sequence
information
has
been
a
long‐standing
grand
challenge.
Critical
assessment
of
prediction
(CASP)
conducts
community
experiments
aimed
at
advancing
solutions
to
this
and
related
problems.
Experiments
are
conducted
every
2
years.
The
2020
experiment
(CASP14)
saw
major
progress,
with
the
second
generation
deep
learning
methods
delivering
accuracy
comparable
for
many
single
proteins.
There
is
an
expectation
that
these
will
have
much
wider
application
in
computational
structural
biology.
Here
we
summarize
results
most
recent
experiment,
CASP15,
2022,
emphasis
on
new
learning‐driven
progress.
Other
papers
special
issue
proteins
provide
more
detailed
analysis.
For
structures,
AlphaFold2
method
still
superior
other
approaches,
but
there
two
points
note.
First,
although
was
core
all
successful
methods,
wide
variety
implementation
combination
methods.
Second,
using
standard
protocol
default
parameters
only
produces
highest
quality
result
about
thirds
targets,
extensive
sampling
required
others.
advance
CASP
enormous
increase
computed
complexes,
achieved
by
use
overall
do
not
fully
match
performance
too,
based
perform
best,
again
than
defaults
often
required.
Also
note
encouraging
early
compute
ensembles
macromolecular
structures.
Critically
usability
both
derived
estimates
local
global
high
quality,
however
interface
regions
slightly
less
reliable.
CASP15
also
included
computation
RNA
structures
first
time.
Here,
classical
approaches
produced
better
agreement
ones,
limited.
Also,
time,
protein–ligand
area
interest
drug
design.
were
ones.
Many
discussed
conference,
it
clear
continue
advance.