bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Дек. 11, 2024
Abstract
B
cell
selection
and
evolution
play
crucial
roles
in
dictating
successful
immune
responses.
Recent
advancements
sequencing
technologies
deep-learning
strategies
have
paved
the
way
for
generating
exploiting
an
ever-growing
wealth
of
antibody
repertoire
data.
The
self-supervised
nature
protein
language
models
(PLMs)
has
demonstrated
ability
to
learn
complex
representations
sequences
been
leveraged
a
wide
range
applications
including
diagnostics,
structural
modeling,
antigen-specificity
predictions.
PLM-derived
likelihoods
used
improve
affinities
vitro,
raising
question
whether
PLMs
can
capture
predict
features
vivo.
Here,
we
explore
how
general
antibody-specific
PLM-generated
sequence
pseudolikelihoods
(SPs)
relate
vivo
such
as
expansion,
isotype
usage,
somatic
hypermutation
(SHM)
at
single-cell
resolution.
Our
results
demonstrate
that
type
PLM
region
input
significantly
affect
generated
SP.
Contrary
previous
vitro
reports,
observe
negative
correlation
between
SPs
binding
affinity,
whereas
SHM,
antigen
specificity
were
strongly
correlated
with
SPs.
By
constructing
evolutionary
lineage
trees
clones
from
human
mouse
repertoires,
SHMs
are
routinely
among
most
likely
mutations
suggested
by
mutating
residues
lower
absolute
than
conserved
residues.
findings
highlight
potential
further
suggest
their
assist
discovery
engineering.
Key
points
-
In
contrast
work
(Hie
et
al.,
2024),
pseudolikelihood
(SP)
affinity.
This
be
explained
inherent
germline
bias
posed
training
data
difference
settings.
also
reveal
considerable
V-gene
family,
isotype,
amount
(SHM).
Moreover,
labeled
antigen-binding
SP
is
consistent
reconstructing
trajectories,
detected
predictable
SHM
using
PLMs.
We
(CDR3
or
full
V(D)J)
provided
model,
well
used,
influence
resulting
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 7, 2024
A
bstract
The
versatile
binding
properties
of
antibodies
have
made
them
an
extremely
important
class
biotherapeutics.
However,
therapeutic
antibody
development
is
a
complex,
expensive
and
time-consuming
task,
with
the
final
needing
to
not
only
strong
specific
binding,
but
also
be
minimally
impacted
by
any
developability
issues.
success
transformer-based
language
models
in
protein
sequence
space
availability
vast
amounts
sequences,
has
led
many
antibody-specific
help
guide
discovery
design.
Antibody
diversity
primarily
arises
from
V(D)J
recombination,
mutations
within
CDRs,
and/or
small
number
away
germline
outside
CDRs.
Consequently,
significant
portion
variable
domain
all
natural
sequences
remains
germline.
This
affects
pre-training
models,
where
this
facet
data
introduces
prevailing
bias
towards
residues.
poses
challenge,
as
are
often
vital
for
generating
potent
target,
meaning
that
need
able
suggest
key
In
study,
we
explore
implications
bias,
examining
its
impact
on
both
general-protein
models.
We
develop
train
series
new
optimised
predicting
non-germline
then
compare
our
model,
AbLang-2,
current
show
how
it
suggests
diverse
set
valid
high
cumulative
probability.
AbLang-2
trained
unpaired
paired
data,
freely
available
(
https://github.com/oxpig/AbLang2.git
).
PLoS Computational Biology,
Год журнала:
2024,
Номер
20(12), С. e1012646 - e1012646
Опубликована: Дек. 6, 2024
Antibodies
are
proteins
produced
by
the
immune
system
that
can
identify
and
neutralise
a
wide
variety
of
antigens
with
high
specificity
affinity,
constitute
most
successful
class
biotherapeutics.
With
advent
next-generation
sequencing,
billions
antibody
sequences
have
been
collected
in
recent
years,
though
their
application
design
better
therapeutics
has
constrained
sheer
volume
complexity
data.
To
address
this
challenge,
we
present
IgBert
IgT5,
best
performing
antibody-specific
language
models
developed
to
date
which
consistently
handle
both
paired
unpaired
variable
region
as
input.
These
trained
comprehensively
using
more
than
two
billion
million
light
heavy
chains
Observed
Antibody
Space
dataset.
We
show
our
outperform
existing
protein
on
diverse
range
regression
tasks
relevant
engineering.
This
advancement
marks
significant
leap
forward
leveraging
machine
learning,
large
scale
data
sets
high-performance
computing
for
enhancing
therapeutic
development.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Янв. 4, 2024
Abstract
The
central
tenet
of
molecular
biology
is
that
a
protein’s
amino
acid
sequence
determines
its
three-dimensional
structure,
and
thus
function.
However,
proteins
with
similar
sequences
do
not
always
fold
into
the
same
shape,
vice-versa,
dissimilar
can
adopt
folds.
In
this
work,
we
explore
antibodies,
class
in
immune
system,
whose
local
shapes
are
highly
unpredictable,
even
small
variations
their
sequence.
Inspired
by
CLIP
method
[1],
propose
multimodal
contrastive
learning
approach,
sequence-structure
pre-training
(CSSP),
which
amalgamates
representations
antibody
structures
mutual
latent
space.
Integrating
structural
information
leads
both
protein
language
models
to
show
better
correspondence
similarity
improves
accuracy
data
efficiency
downstream
binding
prediction
tasks.
We
provide
an
optimised
CSSP-trained
model,
AntiBERTa2-CSSP,
for
non-commercial
use
at
https://huggingface.co/alchemab
.
Communications Biology,
Год журнала:
2024,
Номер
7(1)
Опубликована: Июль 31, 2024
Designing
effective
monoclonal
antibody
(mAb)
therapeutics
faces
a
multi-parameter
optimization
challenge
known
as
"developability",
which
reflects
an
antibody's
ability
to
progress
through
development
stages
based
on
its
physicochemical
properties.
While
natural
antibodies
may
provide
valuable
guidance
for
mAb
selection,
we
lack
comprehensive
understanding
of
developability
parameter
(DP)
plasticity
(redundancy,
predictability,
sensitivity)
and
how
the
DP
landscapes
human-engineered
relate
one
another.
These
gaps
hinder
fundamental
profile
cartography.
To
chart
engineered
landscapes,
computed
40
sequence-
46
structure-based
DPs
over
two
million
native
single-chain
sequences.
We
find
lower
redundancy
among
compared
sequence-based
DPs.
Sequence
sensitivity
single
amino
acid
substitutions
varied
by
region
DP,
structure
values
across
conformational
ensemble
structures.
show
that
sequence
are
more
predictable
than
ones
different
machine-learning
tasks
embeddings,
indicating
constrained
design
space.
Human-engineered
localize
within
antibodies,
suggesting
explore
mere
subspaces
one.
Our
work
quantifies
developability,
providing
resource
therapeutic
design.
Analysis
2
reveals
form
This
large-scale
analysis
allows
quantification
plasticity,
accelerating
drug
PLoS Computational Biology,
Год журнала:
2025,
Номер
21(3), С. e1012153 - e1012153
Опубликована: Март 31, 2025
Antibodies
play
a
crucial
role
in
the
adaptive
immune
response,
with
their
specificity
to
antigens
being
fundamental
determinant
of
function.
Accurate
prediction
antibody-antigen
is
vital
for
understanding
responses,
guiding
vaccine
design,
and
developing
antibody-based
therapeutics.
In
this
study,
we
present
method
supervised
fine-tuning
antibody
language
models,
which
improves
on
pre-trained
model
embeddings
binding
SARS-CoV-2
spike
protein
influenza
hemagglutinin.
We
perform
four
models
predict
these
demonstrate
that
fine-tuned
classifiers
exhibit
enhanced
predictive
accuracy
compared
trained
embeddings.
Additionally,
investigate
change
attention
activations
after
gain
insights
into
molecular
basis
antigen
recognition
by
antibodies.
Furthermore,
apply
BCR
repertoire
data
related
vaccination,
demonstrating
ability
capture
changes
following
vaccination.
Overall,
our
study
highlights
effect
as
valuable
tools
improve
prediction.
Frontiers in Molecular Biosciences,
Год журнала:
2024,
Номер
11
Опубликована: Март 28, 2024
Antibodies
are
proteins
produced
by
our
immune
system
that
have
been
harnessed
as
biotherapeutics.
The
discovery
of
antibody-based
therapeutics
relies
on
analyzing
large
volumes
diverse
sequences
coming
from
phage
display
or
animal
immunizations.
Identification
suitable
therapeutic
candidates
is
achieved
grouping
the
their
similarity
and
subsequent
selection
a
set
antibodies
for
further
tests.
Such
groupings
typically
created
using
sequence-similarity
measures
alone.
Maximizing
diversity
in
selected
crucial
to
reducing
number
tests
molecules
with
near-identical
properties.
With
advances
structural
modeling
machine
learning,
can
now
be
grouped
across
other
dimensions,
such
predicted
paratopes
three-dimensional
structures.
Here
we
benchmarked
antibody
methods
clonotype,
sequence,
paratope
prediction,
structure
embedding
information.
results
were
two
tasks:
binder
detection
epitope
mapping.
We
demonstrate
no
method
appears
outperform
others,
while
mapping,
paratope,
clusterings
top
performers.
Most
importantly,
all
propose
orthogonal
groupings,
offering
more
pools
when
multiple
than
any
single
To
facilitate
exploring
different
methods,
an
online
tool-CLAP-available
at
(
clap.naturalantibody.com
)
allows
users
group,
contrast,
visualize
methods.
Briefings in Bioinformatics,
Год журнала:
2024,
Номер
25(4)
Опубликована: Май 23, 2024
Abstract
In
recent
decades,
antibodies
have
emerged
as
indispensable
therapeutics
for
combating
diseases,
particularly
viral
infections.
However,
their
development
has
been
hindered
by
limited
structural
information
and
labor-intensive
engineering
processes.
Fortunately,
significant
advancements
in
deep
learning
methods
facilitated
the
precise
prediction
of
protein
structure
function
leveraging
co-evolution
from
homologous
proteins.
Despite
these
advances,
predicting
conformation
remains
challenging
due
to
unique
evolution
high
flexibility
antigen-binding
regions.
Here,
address
this
challenge,
we
present
Bio-inspired
Antibody
Language
Model
(BALM).
This
model
is
trained
on
a
vast
dataset
comprising
336
million
40%
nonredundant
unlabeled
antibody
sequences,
capturing
both
conserved
properties
specific
antibodies.
Notably,
BALM
showcases
exceptional
performance
across
four
tasks.
Moreover,
introduce
BALMFold,
an
end-to-end
method
derived
BALM,
capable
swiftly
full
atomic
structures
individual
sequences.
Remarkably,
BALMFold
outperforms
those
well-established
like
AlphaFold2,
IgFold,
ESMFold
OmegaFold
benchmark,
demonstrating
potential
advance
innovative
streamline
therapeutic
reducing
need
unnecessary
trials.
The
server
freely
available
at
https://beamlab-sh.com/models/BALMFold.
Nucleic Acids Research,
Год журнала:
2023,
Номер
52(2), С. 548 - 557
Опубликована: Дек. 18, 2023
High
throughput
sequencing
of
B
cell
receptors
(BCRs)
is
increasingly
applied
to
study
the
immense
diversity
antibodies.
Learning
biologically
meaningful
embeddings
BCR
sequences
beneficial
for
predictive
modeling.
Several
embedding
methods
have
been
developed
BCRs,
but
no
direct
performance
benchmarking
exists.
Moreover,
impact
input
sequence
length
and
paired-chain
information
on
prediction
remains
be
explored.
We
evaluated
multiple
models
predict
properties
receptor
specificity.
Despite
differences
in
model
architectures,
most
effectively
capture
BCR-specific
slightly
outperform
general
protein
language
predicting
In
addition,
incorporating
full-length
heavy
chains
paired
light
chain
improves
all
embeddings.
This
provides
insights
into
improve
downstream
applications
antibody
analysis
discovery.