bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 23, 2024
A
bstract
Structural
divergence
varies
among
protein
residues.
Unlike
the
classic
problem
of
substitution
rate
variation,
this
structural
variation
has
been
largely
ignored.
Here
we
show
that
in
enzymes
increases
with
both
residue
flexibility
and
distance
from
active
site.
Although
these
factors
are
correlated,
demonstrate
through
modelling
pattern
arises
two
independent
types
constraints,
non-functional
functional.
Their
relative
importance
across
enzyme
families:
as
functional
constraints
increase
4%
to
85%,
decrease
96%
15%,
reshaping
pattern.
This
analysis
overturns
accepted
views
evolution:
First,
evolutionary
thought
mirror
dynamics
generally,
but
similarity
exists
only
when
dominate.
Second,
site
conservation
attributed
alone,
it
stems
their
location
rigid
regions
where
high.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: July 25, 2023
Abstract
Adapting
large
language
models
(LLMs)
to
protein
sequences
spawned
the
development
of
powerful
(pLMs).
Concurrently,
AlphaFold2
broke
through
in
structure
prediction.
Now
we
can
systematically
and
comprehensively
explore
dual
nature
proteins
that
act
exist
as
three-dimensional
(3D)
machines
evolve
linear
strings
one-dimensional
(1D)
sequences.
Here,
leverage
pLMs
simultaneously
model
both
modalities
by
combining
1D
with
3D
a
single
model.
We
encode
structures
token
using
3Di-alphabet
introduced
3D-alignment
method
Foldseek
.
This
new
foundation
pLM
extracts
features
patterns
resulting
“structure-sequence”
representation.
Toward
this
end,
built
non-redundant
dataset
from
AlphaFoldDB
fine-tuned
an
existing
(ProtT5)
translate
between
3Di
amino
acid
As
proof-of-concept
for
our
novel
approach,
dubbed
Protein
structure-sequence
T5
(
ProstT5
),
showed
improved
performance
subsequent
prediction
tasks,
“inverse
folding”,
namely
generation
adopting
given
structural
scaffold
(“fold”).
Our
work
showcased
potential
tap
into
information-rich
revolution
fueled
AlphaFold2.
paves
way
develop
tools
integrating
vast
resource
predictions,
opens
research
avenues
post-AlphaFold2
era.
is
freely
available
all
at
https://github.com/mheinzinger/ProstT5
NAR Genomics and Bioinformatics,
Journal Year:
2024,
Volume and Issue:
6(4)
Published: Sept. 28, 2024
Adapting
language
models
to
protein
sequences
spawned
the
development
of
powerful
(pLMs).
Concurrently,
AlphaFold2
broke
through
in
structure
prediction.
Now
we
can
systematically
and
comprehensively
explore
dual
nature
proteins
that
act
exist
as
three-dimensional
(3D)
machines
evolve
linear
strings
one-dimensional
(1D)
sequences.
Here,
leverage
pLMs
simultaneously
model
both
modalities
a
single
model.
We
encode
structures
token
using
3Di-alphabet
introduced
by
3D-alignment
method
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 21, 2024
Here,
we
describe
the
"Obelisks,"
a
previously
unrecognised
class
of
viroid-like
elements
that
first
identified
in
human
gut
metatranscriptomic
data.
"Obelisks"
share
several
properties:
(i)
apparently
circular
RNA
~1kb
genome
assemblies,
(ii)
predicted
rod-like
secondary
structures
encompassing
entire
genome,
and
(iii)
open
reading
frames
coding
for
novel
protein
superfamily,
which
call
"Oblins".
We
find
Obelisks
form
their
own
distinct
phylogenetic
group
with
no
detectable
sequence
or
structural
similarity
to
known
biological
agents.
Further,
are
prevalent
tested
microbiome
metatranscriptomes
representatives
detected
~7%
analysed
stool
(29/440)
~50%
oral
(17/32).
Obelisk
compositions
appear
differ
between
anatomic
sites
capable
persisting
individuals,
continued
presence
over
>300
days
observed
one
case.
Large
scale
searches
29,959
(clustered
at
90%
nucleotide
identity),
examples
from
all
seven
continents
diverse
ecological
niches.
From
this
search,
subset
code
Obelisk-specific
variants
hammerhead
type-III
self-cleaving
ribozyme.
Lastly,
case
bacterial
species
(Streptococcus
sanguinis)
defined
laboratory
strains
harboured
specific
population.
As
such,
comprise
RNAs
have
colonised,
gone
unnoticed
in,
human,
global
microbiomes.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 1, 2024
Abstract
Protein
structure
is
conserved
beyond
sequence,
making
multiple
structural
alignment
(MSTA)
essential
for
analyzing
distantly
related
proteins.
Computational
prediction
methods
have
vastly
extended
our
repository
of
available
proteins
structures,
requiring
fast
and
accurate
MSTA
methods.
Here,
we
introduce
FoldMason,
a
progressive
method
that
leverages
the
alphabet
from
Foldseek,
pairwise
aligner,
hundreds
thousands
protein
exceeding
quality
state-of-the-art
methods,
while
two
orders
magnitudes
faster
than
other
FoldMason
computes
confidence
scores,
offers
interactive
visualizations,
provides
speed
accuracy
large-scale
analysis
in
era
prediction.
Using
Flaviviridae
glycoproteins,
demonstrate
how
FoldMason’s
MSTAs
support
phylogenetic
below
twilight
zone.
free
open-source
software:
foldmason.foldseek.com
webserver:
search.foldseek.com/foldmason
.
PLoS Pathogens,
Journal Year:
2024,
Volume and Issue:
20(5), P. e1012176 - e1012176
Published: May 6, 2024
Magnaporthe
AVRs
and
ToxB-like
(MAX)
effectors
constitute
a
family
of
secreted
virulence
proteins
in
the
fungus
Pyricularia
oryzae
(syn
.
oryzae)
,
which
causes
blast
disease
on
numerous
cereals
grasses.
In
spite
high
sequence
divergence,
MAX
share
common
fold
characterized
by
ß-sandwich
core
stabilized
conserved
disulfide
bond.
this
study,
we
investigated
structural
landscape
diversity
within
effector
repertoire
P
Combining
experimental
protein
structure
determination
silico
modeling
validated
presence
domain
77
out
94
groups
orthologs
(OG)
identified
previous
population
genomic
study.
Four
novel
structures
determined
NMR
were
remarkably
good
agreement
with
AlphaFold2
(AF2)
predictions.
Based
comparison
AF2-generated
3D
models
propose
classification
superfamily
20
that
vary
canonical
fold,
bond
patterns,
additional
secondary
N-
C-terminal
extensions.
About
one-third
members
remain
singletons,
without
strong
relationship
to
other
effectors.
Analysis
surface
properties
AF2
also
highlights
variability
at
level,
potentially
reflecting
wide
their
functions
host
targets.
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: March 6, 2025
In
structural
bioinformatics,
the
efficiency
of
predicting
protein
similarity,
function,
and
evolutionary
relationships
is
crucial.
Our
approach
proposed
herein
leverages
energy
profiles
derived
from
a
knowledge-based
potential,
deviating
traditional
methods
relying
on
alignment
or
atomic
distances.
This
method
assigns
unique
to
individual
proteins,
facilitating
rapid
comparative
analysis
for
both
similarities
across
various
hierarchical
levels.
study
demonstrates
that
contain
substantial
information
about
structure
at
class,
fold,
superfamily,
family
Notably,
these
accurately
distinguish
proteins
species,
illustrated
by
classification
coronavirus
spike
glycoproteins
bacteriocin
proteins.
Introducing
separation
measure
based
profile
our
shows
significant
correlation
with
network-based
approach,
emphasizing
potential
as
efficient
predictors
drug
combinations
faster
computational
requirements.
key
insight
sequence-based
strongly
correlates
structure-derived
energy,
enabling
comparisons
solely
sequences.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 20, 2024
Abstract
Viruses
are
intracellular
parasites
of
organisms
from
all
domains
life.
They
infect
and
cause
disease
in
humans,
animals
plants
but
also
play
crucial
roles
the
ecology
microbial
communities.
Tolerance
to
genetic
change,
high-mutation
rates,
adaptations
hosts
immune
escape
has
driven
high
divergence
viral
genes,
hampering
their
functional
annotation
phylogenetic
inference.
The
protein
structure
is
more
conserved
than
sequence
can
be
used
for
searches
distant
homologs
evolutionary
analysis
divergent
proteins.
Structures
proteins
traditionally
underrepresented
public
databases,
recent
advances
prediction
allows
us
address
this
issue.
Combining
two
state-of-the-art
approaches,
AlphaFold2-ColabFold
ESMFold,
we
predicted
models
85,000
4,400
human
animal
viruses,
expanding
structural
coverage
by
30
times
compared
experimental
structures.
We
performed
network
analyses
demonstrate
utility
inference
relationships.
Taking
approach,
examined
deep
history
class-I
fusion
glycoproteins,
gaining
insights
on
origins
coronavirus
spike
protein.
To
enable
further
discoveries,
have
created
Viro3D
(
https://viro3d.cvr.gla.ac.uk/
),
a
virus
species-centred
database.
It
users
search,
browse
download
interest
explore
similar
structures
present
other
species.
This
resource
will
facilitate
fundamental
molecular
virology,
investigation
evolution,
may
structure-informed
design
therapies
vaccines.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 23, 2024
Abstract
The
rotation
of
the
bacterial
flagellum
is
powered
by
MotAB
stator
complex,
which
converts
ion
flux
into
torque.
origin
and
evolution
this
remarkable
complex
understudied.
Here,
we
perform
first
phylogenetic
structural
characterisation
classification
nonflagellar
relatives.
Using
193
genomes
sampled
across
27
phyla,
estimated
phylogenies
ancestral
sequences,
generated
AlphaFold
predictions
for
all
extant
reconstructed
proteins.
We
then
mapped
them
onto
phylogeny
to
determine
patterns
diversity
distribution
innovations.
identify
two
discrete
groups:
Flagellar
Ion
Transporters
(FIT)
Generic
(GIT).
FIT
proteins
are
structurally
conserved
have
a
square
fold
domain
torque-generating
interface
(TGI).
divided
clades,
termed
TGI4
TGI5,
referring
whether
there
4
or
5
short
helices
in
TGI.
TGI5
motors
predominantly
found
Proteobacteria
include
well-studied
E.
coli
K12
system,
while
diverse
phyla
Na
+
-powered
polar
Vibrio
(PomAB).
GIT
proteins,
on
other
hand,
lack
these
attributes.
interaction
between
A
B
subunits
jointly
necessary
function,
with
genes
typically
adjacent
within
an
operon.
Motility
assays
show
that
elements
unique
play
important
role
flagellar
motility.
Our
results
indicate
motor
has
single
shares
motility-related
traits.
Significance
Statement
motility
key
feature
pathogenicity
survival.
It
allows
bacteria
propel
themselves
direct
movement
according
environmental
conditions.
investigated
molecular
provide
motive
force
power
rotation.
This
study
integrates
phylogenetics,
3D
protein
structure
modeling,
state
reconstruction
(ASR)
insights
mechanisms
motor.