bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 29, 2024
Abstract
Understanding
T-Cell
receptor
(TCR)
and
epitope
interactions
is
critical
for
advancing
our
knowledge
of
the
human
immune
system.
Traditional
approaches
that
use
sequence
similarity
or
structure
data
often
struggle
to
scale
generalize
across
diverse
TCR/epitope
interactions.
To
address
these
limitations,
we
introduce
ImmuneCLIP,
a
contrastive
fine-tuning
method
leverages
pre-trained
protein
language
models
align
TCR
embeddings
in
shared
latent
space.
ImmuneCLIP
evaluated
on
ranking
binding
prediction
tasks,
where
it
consistently
outperforms
sequence-similarity
based
methods
existing
deep
learning
models.
Furthermore,
shows
strong
generalization
capabilities
even
with
limited
training
data,
highlighting
its
potential
studying
uncovering
patterns
improve
understanding
recognition
systems.
Nature Communications,
Journal Year:
2024,
Volume and Issue:
15(1)
Published: Aug. 28, 2024
Prediction
methods
inputting
embeddings
from
protein
language
models
have
reached
or
even
surpassed
state-of-the-art
performance
on
many
prediction
tasks.
In
natural
processing
fine-tuning
large
has
become
the
de
facto
standard.
contrast,
most
model-based
predictions
do
not
back-propagate
to
model.
Here,
we
compare
of
three
(ESM2,
ProtT5,
Ankh)
eight
different
Two
results
stand
out.
Firstly,
task-specific
supervised
almost
always
improves
downstream
predictions.
Secondly,
parameter-efficient
can
reach
similar
improvements
consuming
substantially
fewer
resources
at
up
4.5-fold
acceleration
training
over
full
models.
Our
suggest
try
fine-tuning,
in
particular
for
problems
with
small
datasets,
such
as
fitness
landscape
a
single
protein.
For
ease
adaptability,
provide
easy-to-use
notebooks
fine-tune
all
used
during
this
work
per-protein
(pooling)
and
per-residue
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 22, 2023
Abstract
The
relationship
between
pH
and
enzyme
catalytic
activity,
especially
the
optimal
(pHopt)
at
which
enzymes
function,
is
critical
for
biotechnological
applications.
Hence,
computational
methods
to
predict
pHopt
will
enhance
discovery
design
by
facilitating
accurate
identification
of
that
function
optimally
specific
levels,
elucidating
sequence-function
relationships.
In
this
study,
we
proposed
evaluated
various
machine-learning
predicting
pHopt,
conducting
extensive
hyperparameter
optimization,
training
over
11,000
model
instances.
Our
results
demonstrate
models
utilizing
language
embeddings
markedly
outperform
other
in
pHopt.
We
present
EpHod,
best-performing
model,
making
it
publicly
available
researchers.
From
sequence
data,
EpHod
directly
learns
structural
biophysical
features
relate
including
proximity
residues
center
accessibility
solvent
molecules.
Overall,
presents
a
promising
advancement
prediction
potentially
speed
up
development
technologies.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 31, 2024
Abstract
Protein
language
models
(PLMs)
have
emerged
as
powerful
approaches
for
mapping
protein
sequences
into
embeddings
suitable
various
applications.
As
representation
schemes,
PLMs
generate
per-token
(i.e.,
per-residue)
representations,
resulting
in
variable-sized
outputs
based
on
length.
This
variability
poses
a
challenge
protein-level
prediction
tasks
that
require
uniform-sized
consistent
analysis
across
different
proteins.
Previous
work
has
typically
used
average
pooling
to
summarize
token-level
PLM
outputs,
but
it
is
unclear
whether
this
method
effectively
prioritizes
the
relevant
information
representations.
We
introduce
novel
utilizing
optimal
transport
convert
variable-length
fixed-length
conceptualize
samples
from
probabilistic
distribution
and
employ
sliced-Wasserstein
distances
map
these
against
reference
set,
creating
Euclidean
embedding
output
space.
The
agnostic
length
of
input
represents
entire
protein.
demonstrate
superiority
our
over
several
downstream
tasks,
particularly
with
constrained
sizes,
enabling
smaller-scale
match
or
exceed
performance
average-pooled
larger-scale
PLMs.
Our
aggregation
scheme
especially
effective
longer
by
capturing
essential
might
be
lost
through
pooling.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 10, 2025
Protein
Language
Models
(PLMs)
trained
on
large
databases
of
protein
sequences
have
proven
effective
in
modeling
biology
across
a
wide
range
applications.
However,
while
PLMs
excel
at
capturing
individual
properties,
they
face
challenges
natively
representing
protein–protein
interactions
(PPIs),
which
are
crucial
to
understanding
cellular
processes
and
disease
mechanisms.
Here,
we
introduce
MINT,
PLM
specifically
designed
model
sets
interacting
proteins
contextual
scalable
manner.
Using
unsupervised
training
curated
PPI
dataset
derived
from
the
STRING
database,
MINT
outperforms
existing
diverse
tasks
relating
interactions,
including
binding
affinity
prediction
estimation
mutational
effects.
Beyond
these
core
capabilities,
it
excels
complex
assemblies
surpasses
specialized
models
antibody–antigen
T
cell
receptor–epitope
prediction.
MINT's
predictions
impacts
oncogenic
PPIs
align
with
experimental
studies,
provides
reliable
estimates
for
potential
cross–neutralization
antibodies
against
SARS–CoV–2
variants
concern.
These
findings
position
as
powerful
tool
elucidating
significant
implications
biomedical
research
therapeutic
discovery.
Briefings in Bioinformatics,
Journal Year:
2024,
Volume and Issue:
25(5)
Published: July 25, 2024
Abstract
Protein–protein
interactions
(PPIs)
are
important
for
many
biological
processes,
but
predicting
them
from
sequence
data
remains
challenging.
Existing
deep
learning
models
often
cannot
generalize
to
proteins
not
present
in
the
training
set
and
do
provide
uncertainty
estimates
their
predictions.
To
address
these
limitations,
we
TUnA,
a
Transformer-based
uncertainty-aware
model
PPI
prediction.
TUnA
uses
ESM-2
embeddings
with
Transformer
encoders
incorporates
Spectral-normalized
Neural
Gaussian
Process.
achieves
state-of-the-art
performance
and,
importantly,
evaluates
unseen
sequences.
We
demonstrate
that
TUnA’s
can
effectively
identify
most
reliable
predictions,
significantly
reducing
false
positives.
This
capability
is
crucial
bridging
gap
between
computational
predictions
experimental
validation.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 29, 2025
Abstract
Linking
sequence
variation
to
phenotypic
effects
is
critical
for
efficient
exploitation
of
large
genomic
datasets.
Here
we
present
a
novel
approach
combining
directed
evolution
with
protein
language
modeling
characterize
naturally-evolved
variants
rice
immune
receptor.
Using
high-throughput
evolution,
engineered
the
receptor
Pik-1
bind
and
recognize
fungal
proteins
Avr-PikC
Avr-PikF,
which
evade
detection
by
currently
characterized
alleles.
A
model
was
fine-tuned
on
this
data
correlate
ligand
binding
behavior.
This
then
used
found
in
3,000
Rice
Genomes
Project
dataset.
Two
scored
highly
against
Avr-PikC,
vitro
analyses
confirmed
their
improved
over
wild-type
Overall,
machine
learning
identified
promising
sources
disease
resistance
shows
potential
utility
exploring
other
interest.
Eurasia Journal of Mathematics Science and Technology Education,
Journal Year:
2025,
Volume and Issue:
21(3), P. em2598 - em2598
Published: Feb. 25, 2025
This
study
aims
to
fill
the
gap
in
understanding
trends,
methods,
content,
and
impacts
of
technology
implementation
differentiated
biology
education
at
secondary
higher
levels.
The
methodology
employed
is
a
systematic
literature
review
on
use
education.
search
was
conducted
using
terms
‘technology’
AND
(‘differentiated
instruction’
OR
‘personalized
learning’
‘adaptive
teaching’
‘learning
style’)
‘biology
education’
Scopus
database,
yielding
922
articles,
which
only
18
met
criteria
for
further
analysis.
findings
indicate
rapid
increase
publications,
with
61%
articles
published
between
2022
2024.
majority
publications
come
from
journals
fields
<i>social
sciences/education</i>,
while
contributions
biochemistry,
genetics,
molecular
remain
limited,
suggesting
need
cross-disciplinary
collaboration.
Most
studies
(78%)
used
quantitative
mixed
72%
focusing
most
commonly
technologies
include
hands-on
tools,
data
analysis
collaborative
animal
anatomy
physiology
as
dominant
topics.
These
support
learning
by
enhancing
understanding,
engagement,
outcomes,
well
observation
scientific
explanation
skills
school
level,
research
bioinformatics
level.