bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Май 4, 2024
Abstract
The
explosion
of
sequence
data
has
allowed
the
rapid
growth
protein
language
models
(pLMs).
pLMs
have
now
been
employed
in
many
frameworks
including
variant-effect
and
peptide-specificity
prediction.
Traditionally,
for
protein-protein
or
peptide-protein
interactions
(PPIs),
corresponding
sequences
are
either
co-embedded
followed
by
post-hoc
integration
concatenated
prior
to
embedding.
Interestingly,
no
method
utilizes
a
representation
interaction
itself.
We
developed
an
LM
(iLM),
which
uses
novel
represent
between
protein/peptide
sequences.
S
liding
W
indow
In
teraction
G
rammar
(SWING)
leverages
differences
amino
acid
properties
generate
vocabulary.
This
vocabulary
is
input
into
supervised
prediction
step
where
LM’s
representations
used
as
features.
SWING
was
first
applied
predicting
peptide:MHC
(pMHC)
interactions.
not
only
successful
at
generating
Class
I
II
that
comparable
state-of-the-art
approaches,
but
unique
Mixed
model
also
jointly
both
classes.
Further,
trained
on
alleles
predictive
II,
complex
task
attempted
any
existing
approach.
For
de
novo
data,
using
accurately
predicted
pMHC
murine
SLE
(MRL/lpr
model)
T1D
(NOD
model),
were
validated
experimentally.
To
further
evaluate
SWING’s
generalizability,
we
tested
its
ability
predict
disruption
specific
missense
mutations.
Although
modern
methods
like
AlphaMissense
ESM1b
can
interfaces
variant
effects/pathogenicity
per
mutation,
they
unable
interaction-specific
disruptions.
impact
Mendelian
mutations
population
variants
PPIs.
generalizable
approach
disruptions
with
information.
Overall,
first-in-class
zero-shot
iLM
learns
iScience,
Год журнала:
2020,
Номер
23(3), С. 100939 - 100939
Опубликована: Фев. 27, 2020
Missense
mutations
may
affect
proteostasis
by
destabilizing
or
over-stabilizing
protein
complexes
and
changing
the
pathway
flux.
Predicting
effects
of
stabilizing
on
protein-protein
interactions
is
notoriously
difficult
because
existing
experimental
sets
are
skewed
toward
reducing
binding
affinity
many
computational
methods
fail
to
correctly
evaluate
their
effects.
To
address
this
issue,
we
developed
a
method
MutaBind2,
which
estimates
impacts
single
as
well
multiple
interactions.
MutaBind2
employs
only
seven
features,
most
important
them
describe
proteins
with
solvent,
evolutionary
conservation
site,
thermodynamic
stability
complex
each
monomer.
This
approach
shows
distinct
improvement
especially
in
evaluating
increasing
affinity.
can
be
used
for
finding
disease
driver
mutations,
designing
stable
complexes,
discovering
new
interaction
inhibitors.
International Journal of Molecular Sciences,
Год журнала:
2020,
Номер
21(7), С. 2563 - 2563
Опубликована: Апрель 7, 2020
Maintaining
wild
type
protein–protein
interactions
is
essential
for
the
normal
function
of
cell
and
any
mutation
that
alter
their
characteristics
can
cause
disease.
Therefore,
ability
to
correctly
quickly
predict
effect
amino
acid
mutations
crucial
understanding
disease
effects
be
able
carry
out
genome-wide
studies.
Here,
we
report
a
new
development
SAAMBE
method,
SAAMBE-3D,
which
machine
learning-based
approach,
resulting
in
accurate
predictions
extremely
fast.
It
achieves
Pearson
correlation
coefficient
ranging
from
0.78
0.82
depending
on
training
protocol
benchmarking
five-fold
validation
test
against
SKEMPI
v2.0
database
outperforms
currently
existing
algorithms
various
blind-tests.
Furthermore,
optimized
tested
via
cross-validation
Cornell
University
dataset,
SAAMBE-3D
AUC
1.0
0.96
homo
hereto-dimer
datasets.
Another
important
feature
it
very
fast,
takes
less
than
fraction
second
complete
prediction.
available
as
web
server
well
stand-alone
code,
last
one
being
another
allowing
other
researchers
directly
download
code
run
local
computer.
Combined
all
together,
an
fast
software
applicable
studies
assess
interactions.
The
webserver
codes
(SAAMBE-3D
predicting
change
binding
free
energy
SAAMBE-3D-DN
if
disruptive
or
non-disruptive)
are
available.
FEBS Letters,
Год журнала:
2021,
Номер
595(8), С. 1132 - 1158
Опубликована: Март 3, 2021
Mitochondrial
disorders
are
monogenic
characterized
by
a
defect
in
oxidative
phosphorylation
and
caused
pathogenic
variants
one
of
over
340
different
genes.
The
implementation
whole-exome
sequencing
has
led
to
revolution
their
diagnosis,
duplicated
the
number
associated
disease
genes,
significantly
increased
diagnosed
fraction.
However,
genetic
etiology
substantial
fraction
patients
exhibiting
mitochondrial
remains
unknown,
highlighting
limitations
variant
detection
interpretation,
which
calls
for
improved
computational
DNA
methods,
as
well
addition
OMICS
tools.
More
intriguingly,
this
also
suggests
that
some
lie
outside
protein-coding
genes
mechanisms
beyond
Mendelian
inheritance
mtDNA
relevance.
This
review
covers
current
status
basis
diseases,
discusses
challenges
perspectives,
explores
contribution
factors
regions
expansion
spectrum
disease.
Genetic
variations
(including
substitutions,
insertions,
and
deletions)
exert
a
profound
influence
on
DNA
sequences.
These
are
systematically
classified
as
synonymous,
nonsynonymous,
nonsense,
each
manifesting
distinct
effects
proteins.
The
implementation
of
high-throughput
sequencing
has
significantly
augmented
our
comprehension
the
intricate
interplay
between
gene
protein
structure
function,
well
their
ramifications
in
context
diseases.
Frameshift
variations,
particularly
small
insertions
deletions
(indels),
disrupt
coding
instrumental
disease
pathogenesis.
This
review
presents
succinct
computational
methods,
databases,
current
challenges,
future
directions
predicting
consequences
frameshift
indels
variations.
We
analyzed
predictive
efficacy,
reliability,
utilization
methods
variant
account,
database.
Besides,
we
also
compared
prediction
methodologies
GOF/LOF
pathogenic
variation
data.
Addressing
challenges
pertaining
to
accuracy
cross-species
generalizability,
nascent
technologies
such
AI
deep
learning
harbor
immense
potential
enhance
capabilities.
importance
interdisciplinary
research
collaboration
cannot
be
overstated
for
devising
effective
diagnosis,
treatment,
prevention
strategies
concerning
diseases
associated
with