bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Dec. 4, 2023
Abstract
Understanding
protein
function
is
vital
for
drug
discovery,
disease
diagnosis,
and
engineering.
While
Protein
Language
Models
(PLMs)
pre-trained
on
vast
sequence
datasets
have
achieved
remarkable
success,
equivalent
Structure
(PSMs)
remain
underrepresented.
We
attribute
this
to
the
relative
lack
of
high-confidence
structural
data
suitable
pre-training
objectives.
In
context,
we
introduce
BioCLIP,
a
contrastive
learning
framework
that
pre-trains
PSMs
by
leveraging
PLMs,
generating
meaningful
per-residue
per-chain
representations.
When
evaluated
tasks
such
as
protein-protein
interaction,
Gene
Ontology
annotation,
Enzyme
Commission
number
prediction,
BioCLIP-trained
consistently
outperform
models
trained
from
scratch
further
enhance
performance
when
merged
with
embeddings.
Notably,
BioCLIP
approaches,
or
exceeds,
specialized
methods
across
all
benchmarks
using
its
singular
design.
Our
work
addresses
challenges
obtaining
quality
designing
self-supervised
objectives,
setting
stage
more
comprehensive
function.
Source
code
publicly
available
2
.
The
emergence
of
several
new
variants
severe
acute
respiratory
syndrome
coronavirus
2
(SARS-CoV-2)
in
recent
months
has
raised
concerns
around
the
potential
impact
on
ongoing
vaccination
programs.
Data
from
clinical
trials
and
real-world
evidence
suggest
that
current
vaccines
remain
highly
effective
against
alpha
variant
(B.1.1.7),
while
some
have
reduced
efficacy
effectiveness
symptomatic
disease
caused
by
beta
(B.1.351)
delta
(B.1.617.2);
however,
hospitalization
remains
high.
Although
data
primary
regimen
omicron
(B.1.1.529)
are
limited,
booster
programs
using
mRNA
been
shown
to
restore
protection
infection
(regardless
vaccine
used
for
regimen)
maintain
high
hospitalization.
However,
wanes
with
time
after
dose.
Studies
demonstrated
reductions
varying
magnitude
neutralizing
activity
vaccine-elicited
antibodies
a
range
SARS-CoV-2
variants,
particular
exhibiting
partial
immune
escape.
suggests
T-cell
responses
preserved
across
platforms,
regardless
concern.
Nevertheless,
various
mitigation
strategies
under
investigation
address
or
future
including
modification
certain
(including
omicron),
multivalent
formulations,
different
delivery
mechanisms.
Cold Spring Harbor Perspectives in Medicine,
Journal Year:
2022,
Volume and Issue:
12(5), P. a041390 - a041390
Published: April 20, 2022
Our
understanding
of
the
still
unfolding
severe
acute
respiratory
syndrome
coronavirus
2
(SARS-CoV-2)
pandemic
would
have
been
extremely
limited
without
study
genetics
and
evolution
this
new
human
coronavirus.
Large-scale
genome-sequencing
efforts
provided
close
to
real-time
tracking
global
spread
diversification
SARS-CoV-2
since
its
entry
into
population
in
late
2019.
These
data
underpinned
analysis
origins,
epidemiology,
adaptations
population:
principally
immune
evasion
increasing
transmissibility.
SARS-CoV-2,
despite
being
a
pathogen,
was
highly
capable
human-to-human
transmission.
During
rapid
humans,
has
evolved
independent
forms,
so-called
"variants
concern,"
that
are
better
optimized
for
The
most
important
adaptation
bat
progenitor
both
SARS-CoV-1
infection
(and
other
mammals)
is
use
angiotensin-converting
enzyme
(ACE2)
receptor.
Relaxed
structural
constraints
provide
plasticity
SARS-related
spike
protein
permitting
it
accommodate
significant
amino
acid
replacements
antigenic
consequence
compromising
ability
bind
ACE2.
Although
bulk
research
justifiably
concentrated
on
viral
as
main
determinant
changes
transmissibility,
there
accumulating
evidence
contribution
regions
proteome
virus-host
interaction.
Whereas
levels
community
transmission
recombinants
genetically
distinct
variants
at
present
low,
when
divergent
cocirculate,
recombination
between
clades
detected,
risk
viruses
with
properties
emerge.
Applying
computational
machine
learning
methods
genome
sequence
sets
generate
experimentally
verifiable
predictions
will
serve
an
early
warning
system
novel
variant
surveillance
be
future
vaccine
planning.
Omicron,
latest
concern,
focused
attention
step
change
events,
"shift,"
opposed
incremental
"drift"
antigenicity.
Both
increase
transmissibility
shift
Omicron
led
readily
causing
infections
fully
vaccinated
and/or
previously
infected.
Omicron's
virulence,
while
reduced
relative
concern
replaced,
Delta,
very
much
premised
past
exposure
individuals
clear
signal
boosted
vaccination
protects
from
disease.
Currently,
proven
itself
dangerous
pathogen
unpredictable
evolutionary
capacity,
leading
too
great
not
ensure
all
world
screened
by
sequencing,
protected
through
available
affordable
vaccines,
non-punitive
strategies
place
detecting
responding
concern.
Nature,
Journal Year:
2023,
Volume and Issue:
622(7984), P. 818 - 825
Published: Oct. 11, 2023
Abstract
Effective
pandemic
preparedness
relies
on
anticipating
viral
mutations
that
are
able
to
evade
host
immune
responses
facilitate
vaccine
and
therapeutic
design.
However,
current
strategies
for
evolution
prediction
not
available
early
in
a
pandemic—experimental
approaches
require
polyclonal
antibodies
test
against
1–16
,
existing
computational
methods
draw
heavily
from
strain
prevalence
make
reliable
predictions
of
variants
concern
17–19
.
To
address
this,
we
developed
EVEscape,
generalizable
modular
framework
combines
fitness
deep
learning
model
historical
sequences
with
biophysical
structural
information.
EVEscape
quantifies
the
escape
potential
at
scale
has
advantage
being
applicable
before
surveillance
sequencing,
experimental
scans
or
three-dimensional
structures
antibody
complexes
available.
We
demonstrate
trained
2020,
is
as
accurate
high-throughput
variation
SARS-CoV-2
other
viruses
including
influenza,
HIV
understudied
such
Lassa
Nipah.
provide
continually
revised
scores
all
strains
predict
probable
further
forecast
emerging
tool
continuing
development
(
evescape.org
).
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: Jan. 7, 2025
Abstract
New
and
more
transmissible
variants
of
SARS-CoV-2
have
arisen
multiple
times
over
the
course
pandemic.
Rapidly
identifying
mutations
that
affect
transmission
could
improve
our
understanding
viral
biology
highlight
new
warrant
further
study.
Here
we
develop
a
generic,
analytical
epidemiological
model
to
infer
effects
from
genomic
surveillance
data.
Applying
data
across
many
regions,
find
substantially
rate,
both
within
outside
Spike
protein.
The
largest
on
are
strongly
supported
by
experimental
evidence
prior
studies.
Importantly,
detects
lineages
with
increased
even
at
low
frequencies.
As
an
example,
significant
advantages
for
Alpha,
Delta,
Omicron
shortly
after
their
appearances
in
regional
data,
when
they
comprised
only
around
1-2%
sample
sequences.
Our
thus
facilitates
rapid
identification
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: June 13, 2023
The
relentless
evolution
of
SARS-CoV-2
poses
a
significant
threat
to
public
health,
as
it
adapts
immune
pressure
from
vaccines
and
natural
infections.
Gaining
insights
into
potential
antigenic
changes
is
critical
but
challenging
due
the
vast
sequence
space.
Here,
we
introduce
Machine
Learning-guided
Antigenic
Evolution
Prediction
(MLAEP),
which
combines
structure
modeling,
multi-task
learning,
genetic
algorithms
predict
viral
fitness
landscape
explore
via
in
silico
directed
evolution.
By
analyzing
existing
variants,
MLAEP
accurately
infers
variant
order
along
evolutionary
trajectories,
correlating
with
corresponding
sampling
time.
Our
approach
identified
novel
mutations
immunocompromised
COVID-19
patients
emerging
variants
like
XBB1.5.
Additionally,
predictions
were
validated
through
vitro
neutralizing
antibody
binding
assays,
demonstrating
that
predicted
exhibited
enhanced
evasion.
profiling
predicting
changes,
aids
vaccine
development
enhances
preparedness
against
future
variants.
The International Journal of High Performance Computing Applications,
Journal Year:
2023,
Volume and Issue:
37(6), P. 683 - 705
Published: Oct. 27, 2023
We
seek
to
transform
how
new
and
emergent
variants
of
pandemic-causing
viruses,
specifically
SARS-CoV-2,
are
identified
classified.
By
adapting
large
language
models
(LLMs)
for
genomic
data,
we
build
genome-scale
(GenSLMs)
which
can
learn
the
evolutionary
landscape
SARS-CoV-2
genomes.
pre-training
on
over
110
million
prokaryotic
gene
sequences
fine-tuning
a
SARS-CoV-2-specific
model
1.5
genomes,
show
that
GenSLMs
accurately
rapidly
identify
concern.
Thus,
our
knowledge,
represents
one
first
whole-genome
scale
foundation
generalize
other
prediction
tasks.
demonstrate
scaling
GPU-based
supercomputers
AI-hardware
accelerators
utilizing
1.63
Zettaflops
in
training
runs
with
sustained
performance
121
PFLOPS
mixed
precision
peak
850
PFLOPS.
present
initial
scientific
insights
from
examining
tracking
dynamics
paving
path
realizing
this
biological
data.
Cell Reports,
Journal Year:
2023,
Volume and Issue:
42(8), P. 112888 - 112888
Published: July 31, 2023
Evolution
of
the
severe
acute
respiratory
syndrome
coronavirus
2
(SARS-CoV-2)
Omicron
variant
has
led
to
emergence
sublineages
with
different
patterns
neutralizing
antibody
evasion.
We
report
that
BA.4/BA.5
breakthrough
infection
individuals
immunized
SARS-CoV-2
wild-type-strain-based
mRNA
vaccines
results
in
a
boost
BA.4.6,
BF.7,
BQ.1.1,
and
BA.2.75
neutralization
but
does
not
efficiently
BA.2.75.2,
XBB,
or
XBB.1.5
neutralization.
In
silico
analyses
showed
spike
glycoprotein
lost
most
B
cell
epitopes,
especially
XBB.1.5.
contrast,
T
epitopes
are
conserved
across
variants
including
responses
mRNA-vaccinated,
SARS-CoV-2-naive
against
wild-type
strain,
BA.1,
were
comparable,
suggesting
immunity
recent
may
remain
largely
unaffected.
While
some
effectively
evade
immunity,
spike-protein-specific
due
nature
polymorphic
cell-mediated
immune
responses,
continue
contribute
prevention/limitation
COVID-19
manifestation.
Nucleic Acids Research,
Journal Year:
2025,
Volume and Issue:
53(4)
Published: Feb. 8, 2025
In
infected
individuals,
viruses
are
present
as
a
population
consisting
of
dominant
and
minor
variant
genomes.
Most
databases
contain
information
on
the
genome
sequence.
Since
emergence
SARS-CoV-2
in
late
2019,
variants
have
been
selected
that
more
transmissible
capable
partial
immune
escape.
Currently,
models
for
projecting
evolution
based
using
sequences
to
forecast
whether
known
mutation
will
be
prevalent
future.
However,
novel
(and
other
viruses)
driven
by
evolutionary
pressure
acting
genomes,
which
then
become
form
potential
next
wave
infection.
this
study,
sequencing
data
from
96
209
patients,
sampled
over
3-year
period,
were
used
analyse
patterns
These
develop
unsupervised
machine
learning
clusters
identify
amino
acids
had
greater
than
others
Spike
protein.
Being
able
may
future
would
better
inform
design
longer-lived
medical
countermeasures
allow
risk-based
evaluation
viral
properties,
including
assessment
transmissibility
escape,
thus
providing
candidates
with
early
warning
signals
when
new
emerges.