Chemical Reviews,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 10, 2024
Transition
metals
function
as
structural
and
catalytic
cofactors
for
a
large
diversity
of
proteins
enzymes
that
collectively
comprise
the
metalloproteome.
Metallostasis
considers
all
cellular
processes,
notably
metal
sensing,
metalloproteome
remodeling,
trafficking
(or
allocation)
ensure
functional
integrity
adaptability
Bacteria
employ
both
protein
RNA-based
mechanisms
sense
intracellular
transition
bioavailability
orchestrate
systems-level
outputs
maintain
metallostasis.
In
this
review,
we
contextualize
metallostasis
by
briefly
discussing
specialized
roles
play
in
biology.
We
then
offer
comprehensive
perspective
on
metalloregulatory
metal-sensing
riboswitches,
defining
general
principles
within
each
sensor
superfamily
capture
how
specificity
is
encoded
sequence,
selectivity
can
be
leveraged
downstream
synthetic
biology
biotechnology
applications.
This
followed
discussion
recent
work
highlights
selected
outputs,
including
remodeling
allocation
metallochaperones
to
client
compartments.
close
places
where
more
needed
fill
gaps
our
understanding
Science,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 21, 2024
Directed
protein
evolution
is
central
to
biomedical
applications
but
faces
challenges
like
experimental
complexity,
inefficient
multi-property
optimization,
and
local
maxima
traps.
While
in
silico
methods
using
language
models
(PLMs)
can
provide
modeled
fitness
landscape
guidance,
they
struggle
generalize
across
diverse
families
map
activity.
We
present
EVOLVEpro,
a
few-shot
active
learning
framework
that
combines
PLMs
regression
rapidly
improve
EVOLVEpro
surpasses
current
methods,
yielding
up
100-fold
improvements
desired
properties.
demonstrate
its
effectiveness
six
proteins
RNA
production,
genome
editing,
antibody
binding
applications.
These
results
highlight
the
advantages
of
with
minimal
data
over
zero-shot
predictions.
opens
new
possibilities
for
AI-guided
engineering
biology
medicine.
Polymers,
Journal Year:
2024,
Volume and Issue:
16(23), P. 3368 - 3368
Published: Nov. 29, 2024
The
integration
of
machine
learning
(ML)
into
material
manufacturing
has
driven
advancements
in
optimizing
biopolymer
production
processes.
ML
techniques,
applied
across
various
stages
production,
enable
the
analysis
complex
data
generated
throughout
identifying
patterns
and
insights
not
easily
observed
through
traditional
methods.
As
sustainable
alternatives
to
petrochemical-based
plastics,
biopolymers
present
unique
challenges
due
their
reliance
on
variable
bio-based
feedstocks
processing
conditions.
This
review
systematically
summarizes
current
applications
techniques
aiming
provide
a
comprehensive
reference
for
future
research
while
highlighting
potential
enhance
efficiency,
reduce
costs,
improve
product
quality.
also
shows
role
algorithms,
including
supervised,
unsupervised,
deep
Science Advances,
Journal Year:
2025,
Volume and Issue:
11(7)
Published: Feb. 12, 2025
Machine
learning
(ML)
is
changing
the
world
of
computational
protein
design,
with
data-driven
methods
surpassing
biophysical-based
in
experimental
success.
However,
they
are
most
often
reported
as
case
studies,
lack
integration
and
standardization,
therefore
hard
to
objectively
compare.
In
this
study,
we
established
a
streamlined
diverse
toolbox
for
that
predict
amino
acid
probabilities
inside
Rosetta
software
framework
allows
side-by-side
comparison
these
models.
Subsequently,
existing
fitness
landscapes
were
used
benchmark
novel
ML
realistic
design
settings.
We
focused
on
traditional
problems
design:
sampling
scoring.
A
major
finding
our
study
approaches
better
at
purging
space
from
deleterious
mutations.
Nevertheless,
scoring
resulting
mutations
without
model
fine-tuning
showed
no
clear
improvement
over
Rosetta.
conclude
now
complements,
rather
than
replaces,
biophysical
design.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 28, 2024
Abstract
Training
and
deploying
deep
learning
models
pose
challenges
for
users
without
machine
(ML)
expertise.
SaprotHub
offers
a
user-friendly
platform
that
democratizes
the
process
of
training,
utilizing,
storing,
sharing
protein
ML
models,
fostering
collaboration
within
biology
community—all
requiring
extensive
At
its
core,
Saprot
is
an
advanced,
foundational
language
model.
Through
ColabSaprot
framework,
it
supports
potentially
hundreds
training
prediction
applications,
enabling
co-construction
co-sharing
these
trained
models.
This
enhances
user
engagement
drives
community-wide
innovation.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 15, 2024
Abstract
Proteins
perform
their
functions
by
folding
amino
acid
sequences
into
dynamic
structural
ensembles.
Despite
the
important
role
of
protein
dynamics,
complexity
and
absence
efficient
representation
methods
have
limited
integration
studies
on
function
mutation
fitness,
especially
in
deep
learning
applications.
To
address
this,
we
present
SeqDance,
a
language
model
designed
to
learn
properties
directly
from
sequence
alone.
SeqDance
is
pre-trained
biophysical
derived
over
30,400
molecular
dynamics
trajectories
28,600
normal
mode
analyses.
Our
results
show
that
effectively
captures
local
interactions,
co-movement
patterns,
global
conformational
features,
even
for
proteins
lacking
homologs
pre-training
set.
Additionally,
showed
enhances
prediction
fitness
landscapes,
disorder-to-order
transition
binding
regions,
phase-separating
proteins.
By
sequence,
complements
conventional
evolution-
static
structure-based
methods,
offering
new
insights
behavior
function.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 24, 2024
Summary
Various
machine
learning-assisted
directed
evolution
(MLDE)
strategies
have
been
shown
to
identify
high-fitness
protein
variants
more
efficiently
than
typical
wet-lab
approaches.
However,
limited
understanding
of
the
factors
influencing
MLDE
performance
across
diverse
proteins
has
hindered
optimal
strategy
selection
for
campaigns.
To
address
this,
we
systematically
analyzed
multiple
strategies,
including
active
learning
and
focused
training
using
six
distinct
zero-shot
predictors,
16
fitness
landscapes.
By
quantifying
landscape
navigability
with
attributes,
found
that
offers
a
greater
advantage
on
landscapes
which
are
challenging
evolution,
especially
when
is
combined
learning.
Despite
varying
levels
landscapes,
predictors
leveraging
evolutionary,
structural,
stability
knowledge
sources
consistently
outperforms
random
sampling
both
binding
interactions
enzyme
activities.
Our
findings
provide
practical
guidelines
selecting
engineering.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 3, 2024
ABSTRACT
Mutational
changes
that
affect
the
binding
of
C2
fragment
Streptococcal
protein
G
(GB1)
to
Fc
domain
human
IgG
(IgG-Fc)
have
been
extensively
studied
using
deep
mutational
scanning
(DMS),
and
affinity
all
single
mutations
has
measured
experimentally
in
literature.
To
investigate
underlying
molecular
basis,
we
perform
in-silico
for
possible
mutations,
along
with
2-µs-long
dynamics
(WT-MD)
wild-type
(WT)
GB1
both
unbound
IgG-Fc
bound
forms.
We
compute
hydrogen
bonds
between
WT-MD
identify
dominant
binding,
which
then
assess
conformations
produced
by
Mutation
Minimization
(MuMi)
explain
fitness
landscape
binding.
Furthermore,
analyze
MuMi
focusing
on
relative
solvent
accessibility
(RSA)
residues
probability
being
located
at
interface.
With
these
analyses,
interactions
display
structural
features
Our
findings
pave
way
improved
predictive
accuracy
stability
interaction
studies,
are
crucial
advancements
drug
design
synthetic
biology.
The Journal of Physical Chemistry B,
Journal Year:
2024,
Volume and Issue:
128(33), P. 7987 - 7996
Published: Aug. 8, 2024
Mutational
changes
that
affect
the
binding
of
C2
fragment
Streptococcal
protein
G
(GB1)
to
Fc
domain
human
IgG
(IgG-Fc)
have
been
extensively
studied
using
deep
mutational
scanning
(DMS),
and
affinity
all
single
mutations
has
measured
experimentally
in
literature.
To
investigate
underlying
molecular
basis,
we
perform
silico
for
possible
mutations,
along
with
2
μs-long
dynamics
(WT-MD)
wild-type
(WT)
GB1
both
unbound
IgG-Fc
bound
forms.
We
compute
hydrogen
bonds
between
WT-MD
identify
dominant
binding,
which
then
assess
conformations
produced
by
Mutation
Minimization
(MuMi)
explain
fitness
landscape
binding.
Furthermore,
analyze
MuMi
focusing
on
relative
solvent
accessibility
residues
probability
being
located
at
interface.
With
these
analyses,
interactions
display
structural
features
In
sum,
our
findings
highlight
potential
as
a
reliable
computationally
efficient
tool
predicting
landscapes,
offering
significant
advantages
over
traditional
methods.
The
methodologies
results
presented
this
study
pave
way
improved
predictive
accuracy
stability
interaction
studies,
are
crucial
advancements
drug
design
synthetic
biology.
Abstract
We
present
a
novel
protein
engineering
approach
to
directed
evolution
with
machine
learning
that
integrates
new
semi-supervised
neural
network
fitness
prediction
model,
Seq2Fitness,
and
an
innovative
optimization
algorithm,
b
iphasic
nnealing
for
d
iverse
daptive
s
equence
ampling
(BADASS)
design
sequences.
Seq2Fitness
leverages
language
models
predict
landscapes,
combining
evolutionary
data
experimental
labels,
while
BADASS
efficiently
explores
these
landscapes
by
dynamically
adjusting
temperature
mutation
energies
prevent
premature
convergence
find
diverse
high-fitness
predictions
improve
the
Spearman
correlation
measurements
over
alternative
model
predictions,
e.g.,
from
0.34
0.55
sequences
mutations
residues
are
absent
training
set.
requires
less
memory
computation
compared
gradient-based
Markov
Chain
Monte
Carlo
methods,
finding
more
higher-fitness
maintaining
sequence
diversity
in
tasks
two
different
families
hundreds
of
amino
acids.
For
example,
both
100%
top
10,000
found
have
higher
than
wildtype
sequence,
versus
broad
range
between
3%
99%
competing
approaches
often
many
fewer
found.
The
top,
100th,
1,000th
all
also
higher.
In
addition,
we
developed
theoretical
framework
explain
where
comes
from,
why
it
works,
how
behaves.
Although
only
evaluate
here
on
acid
sequences,
may
be
broadly
useful
exploration
other
spaces,
including
DNA
RNA.
To
ensure
reproducibility
facilitate
adoption,
our
code
is
publicly
available
.
Author
summary
Designing
proteins
enhanced
properties
essential
applications,
industrial
enzymes
therapeutic
molecules.
However,
traditional
methods
fail
explore
vast
space
effectively,
partly
due
rarity
this
work,
introduce
BADASS,
algorithm
samples
probability
distribution
parameter
updated
dynamically,
alternating
cooling
heating
phases,
discover
diversity.
This
stands
contrast
like
simulated
annealing,
which
converge
lower
solutions,
(MCMC),
converging
solutions
at
significantly
computational
cost.
Our
forward
evaluations
no
gradient
computations,
enabling
rapid
high-performing
can
validated
lab,
especially
when
combined
models.
represents
significant
advance
engineering,
opening
possibilities
applications.