bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Dec. 4, 2023
Abstract
Understanding
protein
function
is
vital
for
drug
discovery,
disease
diagnosis,
and
engineering.
While
Protein
Language
Models
(PLMs)
pre-trained
on
vast
sequence
datasets
have
achieved
remarkable
success,
equivalent
Structure
(PSMs)
remain
underrepresented.
We
attribute
this
to
the
relative
lack
of
high-confidence
structural
data
suitable
pre-training
objectives.
In
context,
we
introduce
BioCLIP,
a
contrastive
learning
framework
that
pre-trains
PSMs
by
leveraging
PLMs,
generating
meaningful
per-residue
per-chain
representations.
When
evaluated
tasks
such
as
protein-protein
interaction,
Gene
Ontology
annotation,
Enzyme
Commission
number
prediction,
BioCLIP-trained
consistently
outperform
models
trained
from
scratch
further
enhance
performance
when
merged
with
embeddings.
Notably,
BioCLIP
approaches,
or
exceeds,
specialized
methods
across
all
benchmarks
using
its
singular
design.
Our
work
addresses
challenges
obtaining
quality
designing
self-supervised
objectives,
setting
stage
more
comprehensive
function.
Source
code
publicly
available
2
.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: July 22, 2022
Summary
Effective
pandemic
preparedness
relies
on
anticipating
viral
mutations
that
are
able
to
evade
host
immune
responses
in
order
facilitate
vaccine
and
therapeutic
design.
However,
current
strategies
for
evolution
prediction
not
available
early
a
–
experimental
approaches
require
polyclonal
antibodies
test
against
existing
computational
methods
draw
heavily
from
strain
prevalence
make
reliable
predictions
of
variants
concern.
To
address
this,
we
developed
EVEscape,
generalizable,
modular
framework
combines
fitness
deep
learning
model
historical
sequences
with
biophysical
structural
information.
EVEscape
quantifies
the
escape
potential
at
scale
has
advantage
being
applicable
before
surveillance
sequencing,
scans,
or
3D
structures
antibody
complexes
available.
We
demonstrate
trained
prior
2020,
is
as
accurate
high-throughput
scans
variation
SARS-CoV-2
generalizable
other
viruses
including
Influenza,
HIV,
understudied
such
Lassa
Nipah.
provide
continually
updated
scores
all
strains
predict
likely
additional
forecast
emerging
tool
ongoing
development
(
evescape.org
).
mSystems,
Journal Year:
2022,
Volume and Issue:
7(2)
Published: March 21, 2022
Next-generation
sequencing
has
been
essential
to
the
global
response
COVID-19
pandemic.
As
of
January
2022,
nearly
7
million
severe
acute
respiratory
syndrome
coronavirus
2
(SARS-CoV-2)
sequences
are
available
researchers
in
public
databases.
Sequence
databases
an
abundant
resource
from
which
extract
biologically
relevant
and
clinically
actionable
information.
pandemic
gone
on,
SARS-CoV-2
rapidly
evolved,
involving
complex
genomic
changes
that
challenge
current
approaches
classifying
variants.
Deep
sequence
learning
could
be
a
potentially
powerful
way
build
sequence-to-phenotype
models.
Unfortunately,
while
they
can
predictive,
deep
typically
produces
"black
box"
models
cannot
directly
provide
biological
clinical
insight.
Researchers
should
therefore
consider
implementing
emerging
methods
for
visualizing
interpreting
Finally,
address
important
data
limitations,
including
(i)
disparities,
(ii)
insufficient
metadata,
(iii)
screening
artifacts
due
poor
quality
control.
Biology,
Journal Year:
2022,
Volume and Issue:
11(12), P. 1786 - 1786
Published: Dec. 8, 2022
Through
the
COVID-19
pandemic,
SARS-CoV-2
has
gained
and
lost
multiple
mutations
in
novel
or
unexpected
combinations.
Predicting
how
complex
affect
disease
severity
is
critical
planning
public
health
responses
as
virus
continues
to
evolve.
This
paper
presents
a
computational
framework
complement
conventional
lineage
classification
applies
it
predict
severe
potential
of
viral
genetic
variation.
The
transformer-based
neural
network
model
architecture
additional
layers
that
provide
sample
embeddings
sequence-wide
attention
for
interpretation
visualization.
First,
training
taxonomy
validates
architecture's
interpretability.
Second,
an
interpretable
predictive
trained
on
spike
protein
sequence
patient
metadata
from
GISAID.
Confounding
effects
changing
demographics,
increasing
vaccination
rates,
improving
treatment
over
time
are
addressed
by
including
demographics
case
date
independent
input
model.
resulting
can
be
interpreted
identify
potentially
significant
proves
robust
predctive
tool.
Although
data
obtained
entirely
before
availability
empirical
Omicron,
Omicron's
reduced
risk
disease,
accord
with
epidemiological
experimental
data.
Proteins Structure Function and Bioinformatics,
Journal Year:
2024,
Volume and Issue:
92(6), P. 705 - 719
Published: Jan. 5, 2024
Abstract
The
omicron
variant
of
severe
acute
respiratory
syndrome
coronavirus
2
(SARS‐CoV‐2)
characterized
by
30
mutations
in
its
spike
protein,
has
rapidly
spread
worldwide
since
November
2021,
significantly
exacerbating
the
ongoing
COVID‐19
pandemic.
In
order
to
investigate
relationship
between
these
and
variant's
high
transmissibility,
we
conducted
a
systematic
analysis
mutational
effect
on
spike–angiotensin‐converting
enzyme‐2
(ACE2)
interactions
explored
structural/energy
correlation
key
mutations,
utilizing
reliable
coarse‐grained
model.
Our
study
extended
beyond
receptor‐binding
domain
(RBD)
trimer
through
comprehensive
modeling
full‐length
rather
than
just
RBD.
free‐energy
calculation
revealed
that
enhanced
binding
affinity
protein
ACE2
receptor
is
correlated
with
increased
structural
stability
isolated
thus
explaining
heightened
transmissibility.
conclusion
was
supported
our
experimental
analyses
involving
expression
purification
trimer.
Furthermore,
energy
decomposition
established
those
electrostatic
make
major
contributions
this
effect.
We
categorized
into
four
groups
an
analytical
framework
can
be
employed
studying
future
mutations.
Additionally,
calculations
rationalized
reduced
towards
most
available
therapeutic
neutralizing
antibodies,
when
compared
wild
type.
By
providing
concrete
data
offering
solid
explanation,
contributes
better
understanding
theories
observations
lays
foundation
for
investigations.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Dec. 4, 2023
Abstract
Understanding
protein
function
is
vital
for
drug
discovery,
disease
diagnosis,
and
engineering.
While
Protein
Language
Models
(PLMs)
pre-trained
on
vast
sequence
datasets
have
achieved
remarkable
success,
equivalent
Structure
(PSMs)
remain
underrepresented.
We
attribute
this
to
the
relative
lack
of
high-confidence
structural
data
suitable
pre-training
objectives.
In
context,
we
introduce
BioCLIP,
a
contrastive
learning
framework
that
pre-trains
PSMs
by
leveraging
PLMs,
generating
meaningful
per-residue
per-chain
representations.
When
evaluated
tasks
such
as
protein-protein
interaction,
Gene
Ontology
annotation,
Enzyme
Commission
number
prediction,
BioCLIP-trained
consistently
outperform
models
trained
from
scratch
further
enhance
performance
when
merged
with
embeddings.
Notably,
BioCLIP
approaches,
or
exceeds,
specialized
methods
across
all
benchmarks
using
its
singular
design.
Our
work
addresses
challenges
obtaining
quality
designing
self-supervised
objectives,
setting
stage
more
comprehensive
function.
Source
code
publicly
available
2
.