bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Авг. 17, 2024
Abstract
Recent
breakthroughs
in
protein
structure
prediction
have
led
to
an
unprecedented
surge
high-quality
3D
models,
highlighting
the
need
for
efficient
computational
solutions
manage
and
analyze
this
wealth
of
structural
data.
In
our
work,
we
comprehensively
examine
clusters
obtained
from
AlphaFold
Protein
Structure
Database
(AFDB),
a
subset
ESMAtlas,
Microbiome
Immunity
Project
(MIP).
We
create
single
cohesive
low-dimensional
representation
resulting
space.
Our
results
show
that,
while
each
database
occupies
distinct
regions
within
space,
they
collectively
exhibit
significant
overlap
their
functional
profiles.
High-level
biological
functions
tend
cluster
particular
regions,
revealing
shared
landscape
despite
diverse
sources
By
creating
single,
space
integrating
data
sources,
localizing
annotations
providing
open-access
web-server
exploration,
work
offers
insights
future
research
concerning
sequence-structure-function
relationships,
enabling
various
questions
be
asked
about
taxonomic
assignments,
environmental
factors,
or
specificity.
This
approach
is
generalizable
other
datasets,
further
discovery
beyond
findings
presented
here.
Nucleic Acids Research,
Год журнала:
2024,
Номер
53(D1), С. D1 - D9
Опубликована: Дек. 10, 2024
The
2025
Nucleic
Acids
Research
database
issue
contains
185
papers
spanning
biology
and
related
areas.
Seventy
three
new
databases
are
covered,
while
resources
previously
described
in
the
account
for
101
update
articles.
Databases
most
recently
published
elsewhere
a
further
11
papers.
acid
include
EXPRESSO
multi-omics
of
3D
genome
structure
(this
issue's
chosen
Breakthrough
Resource
Article)
NAIRDB
Fourier
transform
infrared
data.
New
protein
predictions
human
isoforms
at
ASpdb
viral
proteins
BFVD.
UniProt,
Pfam
InterPro
have
all
provided
updates:
metabolism
signalling
covered
by
descriptions
STRING,
KEGG
CAZy,
updated
microbe-oriented
Enterobase,
VFDB
PHI-base.
Biomedical
research
is
supported,
among
others,
ClinVar,
PubChem
DrugMAP.
Genomics-related
Ensembl,
UCSC
Genome
Browser
dbSNP.
plant
cover
Solanaceae
(SolR)
Asteraceae
(AMIR)
families
an
from
NCBI
Taxonomy
also
features.
Database
Issue
freely
available
on
website
(https://academic.oup.com/nar).
At
NAR
online
Molecular
Biology
Collection
(http://www.oxfordjournals.org/nar/database/c/),
932
entries
been
reviewed
last
year,
74
added
226
discontinued
URLs
eliminated
bringing
current
total
to
2236
databases.
BBA Advances,
Год журнала:
2025,
Номер
unknown, С. 100154 - 100154
Опубликована: Март 1, 2025
Lectins
are
ubiquitous
proteins
that
interact
with
glycans
in
a
variety
of
molecular
processes
and
as
such,
also
play
role
diseases,
whether
infectious,
chronic
or
cancer-related.
The
systematic
study
lectins
is
therefore
essential,
particular
for
understanding
cell-cell
communication.
Accumulated
protein
three-dimensional
structural
data
the
past
decades
boosted
advance
AI-based
prediction
opened
up
new
options
to
characterise
known
often
be
multimeric
multivalent.
This
article
reviews
methods
obtain
structures
lectins,
current
available
lectin
3D
their
interactions,
how
this
knowledge
used
classify
these
shows
combination
an
array
bioinformatics
tools
should
make
binding
specificity
possible
near
future.
PLoS Computational Biology,
Год журнала:
2025,
Номер
21(3), С. e1012503 - e1012503
Опубликована: Март 28, 2025
Understanding
the
biological
functions
of
Puccinia
striiformis
f.
sp.
tritici
(
Pst
)
effectors
is
fundamental
for
uncovering
mechanisms
pathogenicity
and
variability,
thereby
paving
way
developing
durable
effective
control
strategies
stripe
rust.
However,
due
to
lack
an
efficient
genetic
transformation
system
in
,
progress
effector
function
studies
has
been
slow.
Here,
we
modeled
structures
15,201
from
twelve
races
or
isolates,
a
isolate,
one
hordei
isolate
using
AlphaFold2.
Of
these,
8,102
folds
were
successfully
predicted,
performed
sequence-
structure-based
annotations
these
effectors.
These
classified
into
410
structure
clusters
1,005
sequence
clusters.
Sequence
lengths
varied
widely,
with
concentration
between
101-250
amino
acids,
motif
analysis
revealed
that
47%
5.81%
predicted
contain
known
motifs
[Y/F/W]xC
RxLR,
respectively
highlighting
structural
conservation
across
substantial
portion
Subcellular
localization
predictions
indicated
predominant
cytoplasmic
localization,
notable
chloroplast
nuclear
presence.
Structure-guided
significantly
enhances
prediction
efficiency
as
demonstrated
by
75%
among
have
annotation.
The
clustering
annotation
both
based
on
homologies
allowed
us
determine
adopted
folding
fold
families
A
common
feature
observed
was
formation
different
sequences.
In
our
study,
comparative
analyses
new
family
core
four
helices,
including
Pst27791,
PstGSRE4,
PstSIE1,
which
target
key
wheat
immune
pathway
proteins,
impacting
host
functions.
Further
showed
similarities
other
pathogens,
such
AvrSr35,
AvrSr50,
Zt-KP4-1,
MoHrip2,
possibility
convergent
evolutionary
strategies,
yet
be
supported
further
data
encompassing
some
evolutionarily
distant
species.
Currently,
initial
most
effectors’
sequence,
relationships
providing
novel
foundation
advance
future
understanding
evolution.
Abstract
Identifying
structural
relationships
between
proteins
is
crucial
for
understanding
their
functions
and
evolutionary
histories.
We
present
ISS_ProtSci,
a
Python
package
designed
similarity
searches
within
the
AlphaFold
Database
v2
(AFDB2).
ISS_ProtSci
incorporates
DaliLite
to
identify
geometrically
similar
structures
uses
transitive
closure
algorithm
iteratively
explore
neighboring
shells
of
proteins.
The
precomputed
all‐against‐all
comparisons
generated
by
Foldseek,
chosen
its
speed,
are
validated
precision.
Search
results
annotated
with
metadata
from
UniProtKB
Pfam
protein
family
classifications,
using
hmmsearch
domains.
Outputs,
including
Dali
pairwise
alignment
data,
provided
in
TSV
format
easy
filtering
analysis.
Our
method
offers
significant
improvement
recall
over
existing
tools
like
especially
detecting
more
distantly
related
This
particularly
valuable
structurally
diverse
families
where
traditional
sequence‐based
or
fast
methods
struggle.
delivers
practical
runtimes
flexibility,
allowing
users
input
PDB
file,
define
minimum
size
common
core,
evaluate
clans.
In
evaluating
our
across
12
test
cases
based
on
clans,
we
achieved
99%
relevant
proteins,
even
challenging
Foldseek's
dropped
below
50%.
not
only
identifies
closely
but
also
uncovers
previously
unrecognized
relationships,
contributing
accurate
classifications.
software
can
be
downloaded
http://ekhidna2.biocenter.helsinki.fi/ISS_ProtSci/
.
International Journal of Molecular Sciences,
Год журнала:
2025,
Номер
26(11), С. 5074 - 5074
Опубликована: Май 24, 2025
Fire
blight,
caused
by
Erwinia
amylovora,
is
a
devastating
bacterial
disease
threatening
apple,
pear,
and
other
Rosaceae
species.
In
our
prior
study,
transcriptome
analysis
identified
fire
blight-resistant
variety,
Duli
(Pyrus
betulifolia
Bunge),
highlighted
the
PR1
gene
as
key
resistance
factor.
Using
Duli’s
genomic
data,
we
systematically
characterized
Pb-PR-1
family
through
bioinformatics
analysis.
A
total
of
31
genes
were
found,
encoding
proteins
123–341
amino
acids.
Phylogenetic
grouped
these
into
four
subfamilies,
with
27
distributed
across
seven
chromosomes,
all
contain
conserved
CAP
superfamily
domain.
Their
promoter
regions
enriched
in
hormone
stress-responsive
elements.
After
inoculation
E.
susceptible
showed
lesion
development
day
2,
rapid
progression,
while
resistant
plants
exhibited
slower
advancement
smaller
lesions.
Enzyme
activity
assays
revealed
that
plants,
PPO
(polyphenol
oxidase)
CAT
(catalase)
activities
peaked
on
6,
showing
2.4-fold
3.81-fold
increase
compared
to
Duli.
At
same
time,
MDA
(malondialdehyde)
content
decreased
16.6%.
The
SOD
(superoxide
dismutase)
PAL
(phenylalanine
ammonia-lyase)
4,
increments
34.32%
47.1%
over
qRT-PCR
significant
differences
expression
between
post-inoculation.
Notably,
Pb-PR-1-11,
Pb-PR-1-21,
Pb-PR-1-26
increased
infection
duration,
aligning
trends.
Other
high
early
but
declined
6.
Pb-PR-1-3,
Pb-PR-1-6,
Pb-PR-1-8,
Pb-PR-1-16,
Pb-PR-1-30
upregulated
13.17-fold
average
2.
summary,
elevated
during
enhanced
defense-related
enzyme
activities,
improving
plant
resistance.
This
study
provides
foundation
for
understanding
PR-1
family’s
role
advancing
blight
Pyrus
The
release
of
the
AlphaFold
database,
which
contains
214
million
predicted
protein
structures,
represents
a
major
leap
forward
for
proteomics
and
its
applications.
However,
lack
comprehensive
annotation
limits
accessibility
usability.
Here,
we
present
DPCstruct,
an
unsupervised
clustering
algorithm
designed
to
provide
domain-level
classification
structures.
Using
structural
predictions
from
AlphaFold2
all-against-all
local
alignments
Foldseek,
DPCstruct
identifies
groups
recurrent
motifs
into
domain
clusters.
When
applied
Foldseek
Cluster
representative
set
proteins
AlphaFoldDB,
successfully
recovers
majority
folds
catalogued
in
established
databases
such
as
SCOP
CATH.
Out
28
246
clusters
identified
by
24%
have
no
or
sequence
similarity
known
families.
Supported
modular
efficient
implementation,
classifying
15
entries
less
than
48
h,
is
well
suited
large-scale
metagenomics
It
also
facilitates
rapid
incorporation
updates
latest
prediction
tools,
ensuring
that
remains
up-to-date.
pipeline
associated
database
are
freely
available
dedicated
repository,
enhancing
navigation
AlphaFoldDB
through
annotations
enabling
other
datasets.
Published
American
Physical
Society
2025
Nucleic Acids Research,
Год журнала:
2024,
Номер
53(D1), С. D348 - D355
Опубликована: Ноя. 20, 2024
Abstract
CATH
(https://www.cathdb.info)
is
a
structural
classification
database
that
assigns
domains
to
the
structures
in
Protein
Data
Bank
(PDB)
and
AlphaFold
Structure
Database
(AFDB)
adds
layers
of
biological
information,
including
homology
functional
annotation.
This
article
covers
developments
since
2021.
We
report
significant
expansion
information
(180-fold)
for
superfamilies
through
PDB
predicted
domain
from
Encyclopedia
Domains
(TED)
resource.
TED
provides
on
AFDB.
v4.4
represents
an
∼64
844
experimentally
determined
PDB.
also
present
mapping
∼90
million
superfamilies.
New
data
increases
number
5841
6573,
folds
1349
2078
architectures
41
77.
comprises
structures,
so
these
new
remain
hypothetical
until
confirmed.
classifies
into
families
(FunFams)
within
superfamily.
have
updated
sequences
FunFams
by
scanning
FunFam-HMMs
against
UniProt
release
2024_02,
giving
276%
increase
coverage.
The
has
resulted
4-fold
with
information.