bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 19, 2024
Abstract
Supervised
machine
learning
models
rely
on
training
datasets
with
positive
(target
class)
and
negative
examples.
Therefore,
the
composition
of
dataset
has
a
direct
influence
model
performance.
Specifically,
sample
selection
bias,
concerning
samples
not
representing
target
class,
presents
challenges
across
range
domains
such
as
text
classification
protein-protein
interaction
prediction.
Machine-learning-based
immunotherapeutics
design
is
an
increasingly
important
area
research,
focusing
designing
antibodies
or
T-cell
receptors
(TCRs)
that
can
bind
to
their
molecules
high
specificity
affinity.
Given
biomedical
importance
immunotherapeutics,
there
need
address
unresolved
question
how
set
impacts
generalization
biological
rule
discovery
enable
rational
safe
drug
design.
We
out
study
this
in
context
antibody-antigen
binding
prediction
problem
by
varying
encompassing
affinity
gradient.
based
our
investigation
large
synthetic
provide
ground
truth
structure-based
data,
allowing
access
residue-wise
energy
interface.
found
both
out-of-distribution
depended
type
used.
Importantly,
we
discovered
model’s
capacity
learn
rules
trivial
correlate
its
accuracy.
confirmed
findings
real-world
relevant
experimental
data.
Our
work
highlights
considering
for
achieving
optimal
performance
machine-learning-based
research.
Significance
Statement
The
effectiveness
supervised
hinges
datasets,
particularly
inclusion
This
bias
greatly
impact
As
development
immunotherapeutic
agents
using
becoming
crucial
biomedicine,
understanding
imperative.
study,
focused
problem,
reveals
choice
significantly
affects
These
underscore
necessity
carefully
machine-learning-driven
research
performance,
robustness
meaningful
acquisition.
Despite
recent
advances
in
transgenic
animal
models
and
display
technologies,
humanization
of
mouse
sequences
remains
one
the
main
routes
for
therapeutic
antibody
development.
Traditionally,
is
manual,
laborious,
requires
expert
knowledge.
Although
automation
efforts
are
advancing,
existing
methods
either
demonstrated
on
a
small
scale
or
entirely
proprietary.
To
predict
immunogenicity
risk,
human-likeness
can
be
evaluated
using
humanness
scores,
but
these
lack
diversity,
granularity
interpretability.
Meanwhile,
immune
repertoire
sequencing
has
generated
rich
libraries
such
as
Observed
Antibody
Space
(OAS)
that
offer
augmented
diversity
not
yet
exploited
engineering.
Here
we
present
BioPhi,
an
open-source
platform
featuring
novel
(Sapiens)
evaluation
(OASis).
Sapiens
deep
learning
method
trained
OAS
language
modeling.
Based
Trends in Pharmacological Sciences,
Journal Year:
2023,
Volume and Issue:
44(3), P. 175 - 189
Published: Jan. 18, 2023
Due
to
their
high
target
specificity
and
binding
affinity,
therapeutic
antibodies
are
currently
the
largest
class
of
biotherapeutics.
The
traditional
largely
empirical
antibody
development
process
is,
while
mature
robust,
cumbersome
has
significant
limitations.
Substantial
recent
advances
in
computational
artificial
intelligence
(AI)
technologies
now
starting
overcome
many
these
limitations
increasingly
integrated
into
pipelines.
Here,
we
provide
an
overview
AI
methods
relevant
for
development,
including
databases,
predictors
properties
structure,
design
with
emphasis
on
machine
learning
(ML)
models,
complementarity-determining
region
(CDR)
loops,
structural
components
critical
binding.
Cell,
Journal Year:
2022,
Volume and Issue:
185(21), P. 4008 - 4022.e14
Published: Aug. 31, 2022
The
continual
evolution
of
SARS-CoV-2
and
the
emergence
variants
that
show
resistance
to
vaccines
neutralizing
antibodies
threaten
prolong
COVID-19
pandemic.
Selection
are
driven
in
part
by
mutations
within
viral
spike
protein
particular
ACE2
receptor-binding
domain
(RBD),
a
primary
target
site
for
antibodies.
Here,
we
develop
deep
mutational
learning
(DML),
machine-learning-guided
engineering
technology,
which
is
used
investigate
massive
sequence
space
combinatorial
mutations,
representing
billions
RBD
variants,
accurately
predicting
their
impact
on
binding
antibody
escape.
A
highly
diverse
landscape
possible
identified
could
emerge
from
multitude
evolutionary
trajectories.
DML
may
be
predictive
profiling
current
prospective
including
mutated
such
as
Omicron,
thus
guiding
development
therapeutic
treatments
COVID-19.
Although
the
therapeutic
efficacy
and
commercial
success
of
monoclonal
antibodies
(mAbs)
are
tremendous,
design
discovery
new
candidates
remain
a
time
cost-intensive
endeavor.
In
this
regard,
progress
in
generation
data
describing
antigen
binding
developability,
computational
methodology,
artificial
intelligence
may
pave
way
for
era
silico
on-demand
immunotherapeutics
discovery.
Here,
we
argue
that
main
necessary
machine
learning
(ML)
components
an
mAb
sequence
generator
are:
understanding
rules
mAb-antigen
binding,
capacity
to
modularly
combine
parameters,
algorithms
unconstrained
parameter-driven
synthesis.
We
review
current
toward
realization
these
discuss
challenges
must
be
overcome
allow
ML-based
fit-for-purpose
candidates.
Beyond
potency,
a
good
developability
profile
is
key
attribute
of
biological
drug.
Selecting
and
screening
for
such
attributes
early
in
the
drug
development
process
can
save
resources
avoid
costly
late-stage
failures.
Here,
we
review
some
most
important
properties
that
be
assessed
on
biologics.
These
include
influence
source
biologic,
its
biophysical
pharmacokinetic
properties,
how
well
it
expressed
recombinantly.
We
furthermore
present
silico,
vitro,
vivo
methods
techniques
exploited
at
different
stages
discovery
to
identify
molecules
with
liabilities
thereby
facilitate
selection
optimal
leads.
Finally,
reflect
relevant
parameters
injectable
versus
orally
delivered
biologics
provide
an
outlook
toward
what
general
trends
are
expected
rise
npj Vaccines,
Journal Year:
2024,
Volume and Issue:
9(1)
Published: Jan. 20, 2024
Computer-aided
discovery
of
vaccine
targets
has
become
a
cornerstone
rational
design.
In
this
article,
I
discuss
how
Machine
Learning
(ML)
can
inform
and
guide
key
computational
steps
in
design
concerned
with
the
identification
B
T
cell
epitopes
correlates
protection.
provide
examples
ML
models,
as
well
types
data
predictions
for
which
they
are
built.
argue
that
interpretable
potential
to
improve
immunogens
also
tool
scientific
discovery,
by
helping
elucidate
molecular
processes
underlying
vaccine-induced
immune
responses.
outline
limitations
challenges
terms
availability
method
development
need
be
addressed
bridge
gap
between
advances
their
translational
application
Computers & Chemical Engineering,
Journal Year:
2024,
Volume and Issue:
182, P. 108585 - 108585
Published: Jan. 11, 2024
While
machine
learning
(ML)
has
made
significant
contributions
to
the
biopharmaceutical
field,
its
applications
are
still
in
early
stages
terms
of
providing
direct
support
for
quality-by-design
based
development
and
manufacturing
biologics,
hindering
enormous
potential
bioprocesses
automation
from
their
manufacturing.
However,
adoption
ML-based
models
instead
conventional
multivariate
data
analysis
methods
is
significantly
increasing
due
accumulation
large-scale
production
data.
This
trend
primarily
driven
by
real-time
monitoring
process
variables
quality
attributes
products
through
implementation
advanced
analytical
technologies.
Given
complexity
multidimensionality
a
bioproduct
design,
bioprocess
development,
product
data,
approaches
increasingly
being
employed
achieve
accurate,
flexible,
high-performing
predictive
address
problems
analytics,
monitoring,
control
within
biopharma
field.
paper
aims
provide
comprehensive
review
current
ML
solutions
control,
optimisation
upstream,
downstream,
formulation
processes
monoclonal
antibodies.
Finally,
this
thoroughly
discusses
main
challenges
related
themselves,
use
antibody
Moreover,
it
offers
further
insights
into
innovative
novel
trends
new
digital
solutions.
Briefings in Bioinformatics,
Journal Year:
2022,
Volume and Issue:
23(4)
Published: July 13, 2022
Antibodies
are
versatile
molecular
binders
with
an
established
and
growing
role
as
therapeutics.
Computational
approaches
to
developing
designing
these
molecules
being
increasingly
used
complement
traditional
lab-based
processes.
Nowadays,
in
silico
methods
fill
multiple
elements
of
the
discovery
stage,
such
characterizing
antibody-antigen
interactions
identifying
developability
liabilities.
Recently,
computational
tackling
problems
have
begun
follow
machine
learning
paradigms,
many
cases
deep
specifically.
This
paradigm
shift
offers
improvements
areas
structure
or
binding
prediction
opens
up
new
possibilities
language-based
modeling
antibody
repertoires
machine-learning-based
generation
novel
sequences.
In
this
review,
we
critically
examine
recent
developments
(deep)
therapeutic
design
implications
for
fully
design.
Cell Reports Methods,
Journal Year:
2023,
Volume and Issue:
3(1), P. 100374 - 100374
Published: Jan. 1, 2023
Antibodies
are
multimeric
proteins
capable
of
highly
specific
molecular
recognition.
The
complementarity
determining
region
3
the
antibody
variable
heavy
chain
(CDRH3)
often
dominates
antigen-binding
specificity.
Hence,
it
is
a
priority
to
design
optimal
antigen-specific
CDRH3
develop
therapeutic
antibodies.
combinatorial
structure
sequences
makes
impossible
query
binding-affinity
oracles
exhaustively.
Moreover,
antibodies
expected
have
high
target
specificity
and
developability.
Here,
we
present
AntBO,
Bayesian
optimization
framework
utilizing
trust
for
an
in
silico
with
favorable
developability
scores.
experiments
on
159
antigens
demonstrate
that
AntBO
step
toward
practically
viable
vitro
design.
In
under
200
calls
oracle,
suggests
outperforming
best
binding
sequence
from
6.9
million
experimentally
obtained
CDRH3s.
Additionally,
finds
very-high-affinity
only
38
protein
designs
while
requiring
no
domain
knowledge.