Proteins Structure Function and Bioinformatics,
Год журнала:
2023,
Номер
91(12), С. 1539 - 1549
Опубликована: Ноя. 2, 2023
Abstract
Computing
protein
structure
from
amino
acid
sequence
information
has
been
a
long‐standing
grand
challenge.
Critical
assessment
of
prediction
(CASP)
conducts
community
experiments
aimed
at
advancing
solutions
to
this
and
related
problems.
Experiments
are
conducted
every
2
years.
The
2020
experiment
(CASP14)
saw
major
progress,
with
the
second
generation
deep
learning
methods
delivering
accuracy
comparable
for
many
single
proteins.
There
is
an
expectation
that
these
will
have
much
wider
application
in
computational
structural
biology.
Here
we
summarize
results
most
recent
experiment,
CASP15,
2022,
emphasis
on
new
learning‐driven
progress.
Other
papers
special
issue
proteins
provide
more
detailed
analysis.
For
structures,
AlphaFold2
method
still
superior
other
approaches,
but
there
two
points
note.
First,
although
was
core
all
successful
methods,
wide
variety
implementation
combination
methods.
Second,
using
standard
protocol
default
parameters
only
produces
highest
quality
result
about
thirds
targets,
extensive
sampling
required
others.
advance
CASP
enormous
increase
computed
complexes,
achieved
by
use
overall
do
not
fully
match
performance
too,
based
perform
best,
again
than
defaults
often
required.
Also
note
encouraging
early
compute
ensembles
macromolecular
structures.
Critically
usability
both
derived
estimates
local
global
high
quality,
however
interface
regions
slightly
less
reliable.
CASP15
also
included
computation
RNA
structures
first
time.
Here,
classical
approaches
produced
better
agreement
ones,
limited.
Also,
time,
protein–ligand
area
interest
drug
design.
were
ones.
Many
discussed
conference,
it
clear
continue
advance.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2022,
Номер
unknown
Опубликована: Июль 21, 2022
Abstract
Artificial
intelligence
has
the
potential
to
open
insight
into
structure
of
proteins
at
scale
evolution.
It
only
recently
been
possible
extend
protein
prediction
two
hundred
million
cataloged
proteins.
Characterizing
structures
exponentially
growing
billions
sequences
revealed
by
large
gene
sequencing
experiments
would
necessitate
a
break-through
in
speed
folding.
Here
we
show
that
direct
inference
from
primary
sequence
using
language
model
enables
an
order
magnitude
speed-up
high
resolution
prediction.
Leveraging
models
learn
evolutionary
patterns
across
millions
sequences,
train
up
15B
parameters,
largest
date.
As
are
scaled
they
information
three-dimensional
individual
atoms.
This
results
is
60x
faster
than
state-of-the-art
while
maintaining
and
accuracy.
Building
on
this,
present
ESM
Metage-nomic
Atlas.
first
large-scale
structural
characterization
metagenomic
proteins,
with
more
617
structures.
The
atlas
reveals
225
confidence
predictions,
including
whose
novel
comparison
experimentally
determined
structures,
giving
unprecedented
view
vast
breadth
diversity
some
least
understood
earth.
Nature Methods,
Год журнала:
2023,
Номер
21(1), С. 110 - 116
Опубликована: Ноя. 30, 2023
Abstract
Artificial
intelligence-based
protein
structure
prediction
methods
such
as
AlphaFold
have
revolutionized
structural
biology.
The
accuracies
of
these
predictions
vary,
however,
and
they
do
not
take
into
account
ligands,
covalent
modifications
or
other
environmental
factors.
Here,
we
evaluate
how
well
can
be
expected
to
describe
the
a
by
comparing
directly
with
experimental
crystallographic
maps.
In
many
cases,
matched
maps
remarkably
closely.
even
very
high-confidence
differed
from
on
global
scale
through
distortion
domain
orientation,
local
in
backbone
side-chain
conformation.
We
suggest
considering
exceptionally
useful
hypotheses.
further
that
it
is
important
consider
confidence
when
interpreting
carry
out
determination
verify
details,
particularly
those
involve
interactions
included
prediction.
PLoS Computational Biology,
Год журнала:
2022,
Номер
18(8), С. e1010483 - e1010483
Опубликована: Авг. 22, 2022
The
unprecedented
performance
of
Deepmind's
Alphafold2
in
predicting
protein
structure
CASP
XIV
and
the
creation
a
database
structures
for
multiple
proteomes
sequence
repositories
is
reshaping
structural
biology.
However,
because
this
returns
single
structure,
it
brought
into
question
Alphafold's
ability
to
capture
intrinsic
conformational
flexibility
proteins.
Here
we
present
general
approach
drive
model
alternate
conformations
through
simple
manipulation
alignment
via
silico
mutagenesis.
grounded
hypothesis
that
must
also
encode
heterogeneity,
thus
its
rational
will
enable
sample
conformations.
A
systematic
modeling
pipeline
benchmarked
against
canonical
examples
applied
interrogate
landscape
membrane
This
work
broadens
applicability
by
generating
be
tested
biologically,
biochemically,
biophysically,
use
structure-based
drug
design.
Bioinformatics,
Год журнала:
2022,
Номер
38(7), С. 1877 - 1880
Опубликована: Янв. 27, 2022
Abstract
Motivation
Antibodies
are
a
key
component
of
the
immune
system
and
have
been
extensively
used
as
biotherapeutics.
Accurate
knowledge
their
structure
is
central
to
understanding
antigen-binding
function.
The
area
for
antigen
binding
main
structural
variation
in
antibodies
concentrated
six
complementarity
determining
regions
(CDRs),
with
most
important
variable
being
CDR-H3
loop.
sequence
variability
make
it
particularly
challenging
model.
Recently
deep
learning
methods
offered
step
change
our
ability
predict
protein
structures.
Results
In
this
work,
we
present
ABlooper,
an
end-to-end
equivariant
learning-based
CDR
loop
prediction
tool.
ABlooper
rapidly
predicts
loops
high
accuracy
provides
confidence
estimate
each
its
predictions.
On
models
Rosetta
Antibody
Benchmark,
makes
predictions
average
RMSD
2.49
Å,
which
drops
2.05
Å
when
considering
only
75%
confident
Availability
implementation
https://github.com/oxpig/ABlooper.
Supplementary
information
data
available
at
Bioinformatics
online.
Abstract
Intrinsically
disordered
regions
(IDRs)
defying
the
traditional
protein
structure–function
paradigm
have
been
difficult
to
analyze.
The
availability
of
accurate
structure
predictions
on
a
large
scale
in
AlphaFoldDB
offers
fresh
perspective
IDR
prediction.
Here,
we
establish
three
baselines
for
prediction
from
models
based
recent
CAID
dataset.
Surprisingly,
is
highly
competitive
predicting
both
IDRs
and
conditionally
folded
binding
regions,
demonstrating
plasticity
disorder
continuum.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Дек. 8, 2023
Predicting
the
effects
of
mutations
in
proteins
is
critical
to
many
applications,
from
understanding
genetic
disease
designing
novel
that
can
address
our
most
pressing
challenges
climate,
agriculture
and
healthcare.
Despite
a
surge
machine
learning-based
protein
models
tackle
these
questions,
an
assessment
their
respective
benefits
challenging
due
use
distinct,
often
contrived,
experimental
datasets,
variable
performance
across
different
families.
Addressing
requires
scale.
To
end
we
introduce
ProteinGym,
large-scale
holistic
set
benchmarks
specifically
designed
for
fitness
prediction
design.
It
encompasses
both
broad
collection
over
250
standardized
deep
mutational
scanning
assays,
spanning
millions
mutated
sequences,
as
well
curated
clinical
datasets
providing
high-quality
expert
annotations
about
mutation
effects.
We
devise
robust
evaluation
framework
combines
metrics
design,
factors
known
limitations
underlying
methods,
covers
zero-shot
supervised
settings.
report
diverse
70
high-performing
various
subfields
(eg.,
alignment-based,
inverse
folding)
into
unified
benchmark
suite.
open
source
corresponding
codebase,
MSAs,
structures,
model
predictions
develop
user-friendly
website
facilitates
data
access
analysis.
ACS Catalysis,
Год журнала:
2023,
Номер
13(21), С. 13863 - 13895
Опубликована: Окт. 13, 2023
Recent
progress
in
engineering
highly
promising
biocatalysts
has
increasingly
involved
machine
learning
methods.
These
methods
leverage
existing
experimental
and
simulation
data
to
aid
the
discovery
annotation
of
enzymes,
as
well
suggesting
beneficial
mutations
for
improving
known
targets.
The
field
protein
is
gathering
steam,
driven
by
recent
success
stories
notable
other
areas.
It
already
encompasses
ambitious
tasks
such
understanding
predicting
structure
function,
catalytic
efficiency,
enantioselectivity,
dynamics,
stability,
solubility,
aggregation,
more.
Nonetheless,
still
evolving,
with
many
challenges
overcome
questions
address.
In
this
Perspective,
we
provide
an
overview
ongoing
trends
domain,
highlight
case
studies,
examine
current
limitations
learning-based
We
emphasize
crucial
importance
thorough
validation
emerging
models
before
their
use
rational
design.
present
our
opinions
on
fundamental
problems
outline
potential
directions
future
research.
Nature Methods,
Год журнала:
2022,
Номер
19(11), С. 1376 - 1382
Опубликована: Окт. 20, 2022
Abstract
Machine-learning
prediction
algorithms
such
as
AlphaFold
and
RoseTTAFold
can
create
remarkably
accurate
protein
models,
but
these
models
usually
have
some
regions
that
are
predicted
with
low
confidence
or
poor
accuracy.
We
hypothesized
by
implicitly
including
new
experimental
information
a
density
map,
greater
portion
of
model
could
be
accurately,
this
might
synergistically
improve
parts
the
were
not
fully
addressed
either
machine
learning
experiment
alone.
An
iterative
procedure
was
developed
in
which
automatically
rebuilt
on
basis
maps
used
templates
predictions.
show
improves
beyond
improvement
obtained
simple
rebuilding
guided
data.
This
for
modeling
has
been
incorporated
into
an
automated
interpretation
crystallographic
electron
cryo-microscopy
maps.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июль 2, 2024
Abstract
More
than
three
billion
years
of
evolution
have
produced
an
image
biology
encoded
into
the
space
natural
proteins.
Here
we
show
that
language
models
trained
on
tokens
generated
by
can
act
as
evolutionary
simulators
to
generate
functional
proteins
are
far
away
from
known
We
present
ESM3,
a
frontier
multimodal
generative
model
reasons
over
sequence,
structure,
and
function
ESM3
follow
complex
prompts
combining
its
modalities
is
highly
responsive
biological
alignment.
prompted
fluorescent
with
chain
thought.
Among
generations
synthesized,
found
bright
protein
at
distance
(58%
identity)
Similarly
distant
separated
five
hundred
million
evolution.