Briefings in Bioinformatics,
Год журнала:
2024,
Номер
26(1)
Опубликована: Ноя. 22, 2024
Abstract
Computational
prediction
of
nucleic
acid-binding
residues
in
protein
sequences
is
an
active
field
research,
with
over
80
methods
that
were
released
the
past
2
decades.
We
identify
and
discuss
87
sequence-based
predictors
include
dozens
recently
published
are
surveyed
for
first
time.
overview
historical
progress
examine
multiple
practical
issues
availability
impact
predictors,
key
features
their
predictive
models,
important
aspects
related
to
training
assessment.
observe
decade
has
brought
increased
use
deep
neural
networks
language
which
contributed
substantial
gains
performance.
also
highlight
advancements
vital
challenging
cross-predictions
between
deoxyribonucleic
acid
(DNA)-binding
ribonucleic
(RNA)-binding
targeting
two
distinct
sources
binding
annotations,
structure-based
versus
intrinsic
disorder-based.
The
trained
on
structure-annotated
interactions
tend
perform
poorly
disorder-annotated
vice
versa,
only
a
few
target
well
across
both
annotation
types.
significant
problem,
some
DNA-binding
or
RNA-binding
indiscriminately
predicting
Moreover,
we
show
web
servers
cited
substantially
more
than
tools
without
implementation
no
longer
working
implementations,
motivating
development
long-term
maintenance
servers.
close
by
discussing
future
research
directions
aim
drive
further
this
area.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июль 19, 2024
ABSTRACT
Proteins
with
internal
repeats
(PIRs)
are
the
second
most
abundant
class
of
fungal
cell
wall
resident
proteins.
In
yeasts,
PIRs
preserve
stability
under
stressful
conditions.
They
characterized
by
conserved
N-terminal
amino
acid
sequences
repeated
in
tandem
(PIR
domains),
and
a
Cys-rich
C-terminal
domain.
Despite
have
been
inferred
several
filamentous
fungi
genomes,
they
not
studied
beyond
yeasts.
this
work,
diversity,
evolution
biological
role,
focused
on
new
class,
were
addressed.
Bioinformatic
inference
indicated
an
innovation
Ascomycota.
Predicted
clustered
two
main
groups:
classical
yeasts
(N-terminal
PIR
domains;
domain),
from
inverted
architecture
domain;
which
could
harbor
additional
GPI-signals.
As
representatives
group,
Neurospora
crassa
(Nc)
PIR-1
(NCU04033)
PIR-2
(NCU07569)
studied.
Confocal
microscopy
eGFP-labeled
revealed
accumulate
apical
plugs;
additionally,
requires
Kex2
processing
site
for
correct
maturation,
its
predicted
GPI
modification
signal
resulted
functional.
Moreover,
Nc
Δ
pir-1
pir-2
single
mutants
showed
growth
rate
similar
to
that
WT,
but
double
mutant
/Δ
grew
significatively
slower.
Similarly,
mildly
sensitive
calcofluor
white,
although
was
severely
impaired.
PIR-2,
stabilizers
as
yeast
PIRs.
Nucleic Acids Research,
Год журнала:
2024,
Номер
52(17), С. 10464 - 10489
Опубликована: Авг. 27, 2024
Abstract
Tandem
repeat
proteins
(TRPs)
are
widely
distributed
and
bind
to
a
wide
variety
of
ligands.
DNA-binding
TRPs
such
as
zinc
finger
(ZNF)
transcription
activator-like
effector
(TALE)
play
important
roles
in
biology
biotechnology.
In
this
study,
we
first
conducted
an
extensive
analysis
public
databases,
found
that
the
enormous
diversity
is
largely
unexplored.
We
then
focused
our
efforts
on
identifying
novel
possessing
capabilities.
established
protein
language
model
for
prediction
(PLM-DBPPred),
predicted
large
number
TRPs.
A
subset
was
selected
experimental
screening,
leading
identification
11
TRPs,
with
six
showing
sequence
specificity.
Notably,
members
STAR
(Short
TALE-like
Repeat
proteins)
family
can
be
programmed
target
specific
9
bp
DNA
sequences
high
affinity.
Leveraging
property,
generated
artificial
factors
using
reprogrammed
achieved
targeted
activation
endogenous
gene
sets.
Furthermore,
families
MOON
(Marine
Organism-Originated
binding
protein)
pTERF
(prokaryotic
mTERF-like
exhibit
unique
features
distinct
characteristics,
revealing
interesting
biological
clues.
Our
study
expands
demonstrates
systematic
approach
greatly
enhances
discovery
new
insights
tools.
Cell Reports Methods,
Год журнала:
2024,
Номер
4(11), С. 100896 - 100896
Опубликована: Ноя. 1, 2024
MotivationExploring
protein
diversity
is
key
to
understanding
function
and
advancing
engineering.
Environmental
DNA
contains
vast
sequence
space,
going
beyond
current
databases.
Harnessing
these
sequences
requires
approaches
for
the
targeted
annotation
of
specific
functions.
Here,
we
present
DeepMetagenome,
a
deep
learning-based
procedure,
which
not
only
facilitates
identification
typical
family
but
also
enables
discovery
within
under-annotated
families
in
existing
databases.Highlights•DeepMetagenome
method
annotating
from
(meta)genomes•DeepMetagenome
outperformed
alignment-based
machine
learning
methods•Predicted
metallothionein
genes
were
experimentally
verified
their
function•DeepMetagenome
can
be
easily
repurposed
mining
other
proteinsSummaryProtein
natural
offers
space
engineering,
its
detection
metagenomes/proteomes
without
prior
assumptions.
Python-based
method,
explores
through
modules
training
analyzing
datasets.
The
model
includes
Embedding,
Conv1D,
LSTM,
Dense
layers,
with
feature
analysis
data
cleaning.
Applied
metallothioneins
database
over
146
million
coding
features,
DeepMetagenome
identified
500
high-confidence
sequences,
outperforming
DIAMOND
CNN-based
models.
It
showed
stable
performance
compared
Transformer-based
25
epochs.
Among
23
synthesized
20
exhibited
metal
resistance.
tool
successfully
explored
three
additional
freely
available
on
GitHub
detailed
instructions.Graphical
abstract
Nature Communications,
Год журнала:
2024,
Номер
15(1)
Опубликована: Дек. 30, 2024
Short
tandem
repeats
(STRs)
have
emerged
as
important
and
hypermutable
sites
where
genetic
variation
correlates
with
gene
expression
in
plant
animal
systems.
Recently,
it
has
been
shown
that
a
broad
range
of
transcription
factors
(TFs)
are
affected
by
STRs
near
or
the
DNA
target
binding
site.
Despite
this,
distribution
STR
motif
repetitiveness
eukaryote
genomes
is
still
largely
unknown.
Here,
we
identify
monomer
dimer
5.1
billion
10-bp
windows
upstream
translation
starts
downstream
stops
25
million
genes
spanning
1270
species
across
eukaryotic
Tree
Life.
We
report
all
surveyed
gene-proximal
shifts
repetitiveness.
Within
genomes,
landscapes
correlated
to
function
genes;
housekeeping
functions
were
depleted
Furthermore,
TF
sites,
indicating
evolved
conjunction
cis-regulatory
TFs
recognize
repetitive
sites.
These
results
suggest
hypermutability
inherent
canalized
along
genome
sequence
contributes
regulatory
eco-evolutionary
dynamics
eukaryotes.
Briefings in Bioinformatics,
Год журнала:
2024,
Номер
26(1)
Опубликована: Ноя. 22, 2024
Abstract
Computational
prediction
of
nucleic
acid-binding
residues
in
protein
sequences
is
an
active
field
research,
with
over
80
methods
that
were
released
the
past
2
decades.
We
identify
and
discuss
87
sequence-based
predictors
include
dozens
recently
published
are
surveyed
for
first
time.
overview
historical
progress
examine
multiple
practical
issues
availability
impact
predictors,
key
features
their
predictive
models,
important
aspects
related
to
training
assessment.
observe
decade
has
brought
increased
use
deep
neural
networks
language
which
contributed
substantial
gains
performance.
also
highlight
advancements
vital
challenging
cross-predictions
between
deoxyribonucleic
acid
(DNA)-binding
ribonucleic
(RNA)-binding
targeting
two
distinct
sources
binding
annotations,
structure-based
versus
intrinsic
disorder-based.
The
trained
on
structure-annotated
interactions
tend
perform
poorly
disorder-annotated
vice
versa,
only
a
few
target
well
across
both
annotation
types.
significant
problem,
some
DNA-binding
or
RNA-binding
indiscriminately
predicting
Moreover,
we
show
web
servers
cited
substantially
more
than
tools
without
implementation
no
longer
working
implementations,
motivating
development
long-term
maintenance
servers.
close
by
discussing
future
research
directions
aim
drive
further
this
area.