Signal Transduction and Targeted Therapy,
Journal Year:
2023,
Volume and Issue:
8(1)
Published: March 14, 2023
Abstract
AlphaFold2
(AF2)
is
an
artificial
intelligence
(AI)
system
developed
by
DeepMind
that
can
predict
three-dimensional
(3D)
structures
of
proteins
from
amino
acid
sequences
with
atomic-level
accuracy.
Protein
structure
prediction
one
the
most
challenging
problems
in
computational
biology
and
chemistry,
has
puzzled
scientists
for
50
years.
The
advent
AF2
presents
unprecedented
progress
protein
attracted
much
attention.
Subsequent
release
more
than
200
million
predicted
further
aroused
great
enthusiasm
science
community,
especially
fields
medicine.
thought
to
have
a
significant
impact
on
structural
research
areas
need
information,
such
as
drug
discovery,
design,
function,
et
al.
Though
time
not
long
since
was
developed,
there
are
already
quite
few
application
studies
medicine,
many
them
having
preliminarily
proved
potential
AF2.
To
better
understand
promote
its
applications,
we
will
this
article
summarize
principle
architecture
well
recipe
success,
particularly
focus
reviewing
applications
Limitations
current
also
be
discussed.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2021,
Volume and Issue:
unknown
Published: Aug. 15, 2021
ColabFold
offers
accelerated
protein
structure
and
complex
predictions
by
combining
the
fast
homology
search
of
MMseqs2
with
AlphaFold2
or
RoseTTAFold.
ColabFold’s
40
-
60×
faster
optimized
model
use
allows
predicting
close
to
a
thousand
structures
per
day
on
server
one
GPU.
Coupled
Google
Colaboratory,
becomes
free
accessible
platform
for
folding.
is
open-source
software
available
at
github.com/sokrypton/ColabFold
.
Its
novel
environmental
databases
are
colabfold.mmseqs.com
Contact
[email protected]
,
[email protected][email protected]
BMC Bioinformatics,
Journal Year:
2019,
Volume and Issue:
20(1)
Published: Dec. 1, 2019
Predicting
protein
function
and
structure
from
sequence
is
one
important
challenge
for
computational
biology.
For
26
years,
most
state-of-the-art
approaches
combined
machine
learning
evolutionary
information.
However,
some
applications
retrieving
related
proteins
becoming
too
time-consuming.
Additionally,
information
less
powerful
small
families,
e.g.
the
Dark
Proteome.
Both
these
problems
are
addressed
by
new
methodology
introduced
here.We
a
novel
way
to
represent
sequences
as
continuous
vectors
(embeddings)
using
language
model
ELMo
taken
natural
processing.
By
modeling
sequences,
effectively
captured
biophysical
properties
of
life
unlabeled
big
data
(UniRef50).
We
refer
embeddings
SeqVec
(Sequence-to-Vector)
demonstrate
their
effectiveness
training
simple
neural
networks
two
different
tasks.
At
per-residue
level,
secondary
(Q3
=
79%
±
1,
Q8
68%
1)
regions
with
intrinsic
disorder
(MCC
0.59
0.03)
were
predicted
significantly
better
than
through
one-hot
encoding
or
Word2vec-like
approaches.
per-protein
subcellular
localization
was
in
ten
classes
(Q10
membrane-bound
distinguished
water-soluble
(Q2
87%
1).
Although
generated
best
predictions
single
no
solution
improved
over
existing
method
Nevertheless,
our
approach
popular
methods
even
did
beat
best.
Thus,
they
prove
condense
underlying
principles
sequences.
Overall,
novelty
speed:
where
lightning-fast
HHblits
needed
on
average
about
minutes
generate
target
protein,
created
0.03
s.
As
this
speed-up
independent
size
growing
databases,
provides
highly
scalable
analysis
proteomics,
i.e.
microbiome
metaproteome
analysis.Transfer-learning
succeeded
extract
databases
relevant
various
prediction
modeled
life,
namely
any
features
suggested
textbooks
methods.
The
exception
information,
however,
that
not
available
level
sequence.
Cell Reports Methods,
Journal Year:
2021,
Volume and Issue:
1(3), P. 100014 - 100014
Published: June 21, 2021
Structure
prediction
for
proteins
lacking
homologous
templates
in
the
Protein
Data
Bank
(PDB)
remains
a
significant
unsolved
problem.
We
developed
protocol,
C-I-TASSER,
to
integrate
interresidue
contact
maps
from
deep
neural-network
learning
with
cutting-edge
I-TASSER
fragment
assembly
simulations.
Large-scale
benchmark
tests
showed
that
C-I-TASSER
can
fold
more
than
twice
number
of
non-homologous
I-TASSER,
which
does
not
use
contacts.
When
applied
folding
experiment
on
8,266
Pfam
families,
successfully
folded
4,162
domain
including
504
folds
are
found
PDB.
Furthermore,
it
created
correct
85%
SARS-CoV-2
genome,
despite
quick
mutation
rate
virus
and
sparse
sequence
profiles.
The
results
demonstrated
critical
importance
coupling
whole-genome
metagenome-based
evolutionary
information
optimal
structure
simulations
solving
problem
protein
prediction.
Protein Science,
Journal Year:
2022,
Volume and Issue:
32(1)
Published: Nov. 24, 2022
Structural
comparison
reveals
remote
homology
that
often
fails
to
be
detected
by
sequence
comparison.
The
DALI
web
server
(http://ekhidna2.biocenter.helsinki.fi/dali)
is
a
platform
for
structural
analysis
provides
database
searches
and
interactive
visualization,
including
alignments
annotated
with
secondary
structure,
protein
families
logos,
3D
structure
superimposition
supported
color-coded
conservation.
Here,
we
are
using
mine
the
AlphaFold
Database
version
1,
which
increased
coverage
of
20%.
We
found
100
homologous
relationships
hitherto
unreported
in
current
reference
domains,
Pfam
35.0.
In
particular,
linked
35
domains
unknown
function
(DUFs)
previously
characterized
families,
generating
functional
hypothesis
can
explored
downstream
biology
studies.
Other
findings
include
gene
fusions,
tandem
duplications,
adjustments
domain
boundaries.
evidence
browsed
interactively
through
live
examples
on
DALI's
website.
Proteins Structure Function and Bioinformatics,
Journal Year:
2021,
Volume and Issue:
89(12), P. 1711 - 1721
Published: Oct. 4, 2021
We
describe
the
operation
and
improvement
of
AlphaFold,
system
that
was
entered
by
team
AlphaFold2
to
"human"
category
in
14th
Critical
Assessment
Protein
Structure
Prediction
(CASP14).
The
AlphaFold
CASP14
is
entirely
different
one
CASP13.
It
used
a
novel
end-to-end
deep
neural
network
trained
produce
protein
structures
from
amino
acid
sequence,
multiple
sequence
alignments,
homologous
proteins.
In
assessors'
ranking
summed
z
scores
(>2.0),
scored
244.0
compared
90.8
next
best
group.
predictions
made
had
median
domain
GDT_TS
92.4;
this
first
time
level
average
accuracy
has
been
achieved
during
CASP,
especially
on
more
difficult
Free
Modeling
targets,
represents
significant
state
art
structure
prediction.
reported
how
run
as
human
improved
such
it
now
achieves
an
equivalent
performance
without
intervention,
opening
door
highly
accurate
large-scale
Science,
Journal Year:
2024,
Volume and Issue:
384(6693)
Published: March 7, 2024
Deep-learning
methods
have
revolutionized
protein
structure
prediction
and
design
but
are
presently
limited
to
protein-only
systems.
We
describe
RoseTTAFold
All-Atom
(RFAA),
which
combines
a
residue-based
representation
of
amino
acids
DNA
bases
with
an
atomic
all
other
groups
model
assemblies
that
contain
proteins,
nucleic
acids,
small
molecules,
metals,
covalent
modifications,
given
their
sequences
chemical
structures.
By
fine-tuning
on
denoising
tasks,
we
developed
RFdiffusion
(RFdiffusionAA),
builds
structures
around
molecules.
Starting
from
random
distributions
acid
residues
surrounding
target
designed
experimentally
validated,
through
crystallography
binding
measurements,
proteins
bind
the
cardiac
disease
therapeutic
digoxigenin,
enzymatic
cofactor
heme,
light-harvesting
molecule
bilin.
NAR Genomics and Bioinformatics,
Journal Year:
2021,
Volume and Issue:
3(3)
Published: June 23, 2021
Abstract
Viruses
are
abundant,
diverse
and
ancestral
biological
entities.
Their
diversity
is
high,
both
in
terms
of
the
number
different
protein
families
encountered
sequence
heterogeneity
each
family.
The
recent
increase
sequenced
viral
genomes
constitutes
a
great
opportunity
to
gain
new
insights
into
this
consequently
urges
development
annotation
resources
help
functional
comparative
analysis.
Here,
we
introduce
PHROG
(Prokaryotic
Virus
Remote
Homologous
Groups),
library
generated
using
clustering
approach
based
on
remote
homology
detection
by
HMM
profile-profile
comparisons.
Considering
17
473
reference
(pro)viruses
prokaryotes,
868
340
total
938
864
proteins
were
grouped
38
880
clusters
that
proved
be
2-fold
deeper
than
classical
strategy
BLAST-like
similarity
searches,
yet
remain
homogeneous.
Manual
inspection
similarities
various
databases
led
5108
(containing
50.6
%
dataset)
with
705
terms,
included
9
categories,
specifically
designed
for
viruses.
Hopefully,
will
useful
tool
better
annotate
future
prokaryotic
sequences
thus
helping
scientific
community
understand
evolution
ecology
these