ChemCatChem,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 23, 2024
Abstract
The
advent
of
machine
learning
(ML)
has
significantly
advanced
enzyme
engineering,
particularly
through
zero‐shot
(ZS)
predictors
that
forecast
the
effects
amino
acid
mutations
on
properties
without
requiring
additional
labeled
data
for
target
enzyme.
This
review
comprehensively
summarizes
ZS
developed
over
past
decade,
categorizing
them
into
kinetic
parameters,
stability,
solubility/aggregation,
and
fitness.
It
details
algorithms
used,
encompassing
traditional
ML
approaches
deep
models,
emphasizing
their
predictive
performance.
Practical
applications
in
engineering
specific
enzymes
are
discussed.
Despite
notable
advancements,
challenges
persist,
including
limited
training
necessity
to
incorporate
environmental
factors
(e.g.,
pH,
temperature)
dynamics
these
models.
Future
directions
proposed
advance
prediction‐guided
thereby
enhancing
practical
utility
predictors.
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: Feb. 28, 2025
Estimation
of
enzymatic
activities
still
heavily
relies
on
experimental
assays,
which
can
be
cost
and
time-intensive.
We
present
CatPred,
a
deep
learning
framework
for
predicting
in
vitro
enzyme
kinetic
parameters,
including
turnover
numbers
(kcat),
Michaelis
constants
(Km),
inhibition
(Ki).
CatPred
addresses
key
challenges
such
as
the
lack
standardized
datasets,
performance
evaluation
sequences
that
are
dissimilar
to
those
used
during
training,
model
uncertainty
quantification.
explore
diverse
architectures
feature
representations,
pretrained
protein
language
models
three-dimensional
structural
features,
enable
robust
predictions.
provides
accurate
predictions
with
query-specific
estimates,
lower
predicted
variances
correlating
higher
accuracy.
Pretrained
features
particularly
enhance
out-of-distribution
samples.
also
introduces
benchmark
datasets
extensive
coverage
(~23
k,
41
12
k
data
points
kcat,
Km,
Ki
respectively).
Our
performs
competitively
existing
methods
while
offering
reliable
is
parameters
(kcat,
Ki)
from
sequence
features.
It
improves
accuracy
unseen
enzymes
using
advancing
computational
characterization.
PLoS Computational Biology,
Journal Year:
2025,
Volume and Issue:
21(3), P. e1012109 - e1012109
Published: March 12, 2025
Advancements
with
cost-effective,
high-throughput
omics
technologies
have
had
a
transformative
effect
on
both
fundamental
and
translational
research
in
the
medical
sciences.
These
advancements
facilitated
departure
from
traditional
view
of
human
red
blood
cells
(RBCs)
as
mere
carriers
hemoglobin,
devoid
significant
biological
complexity.
Over
past
decade,
proteomic
analyses
identified
growing
number
different
proteins
present
within
RBCs,
enabling
systems
biology
analysis
their
physiological
functions.
Here,
we
introduce
RBC-GEM,
one
most
comprehensive,
curated
genome-scale
metabolic
reconstructions
specific
cell
type
to-date.
It
was
developed
through
meta-analysis
data
29
studies
published
over
two
decades
resulting
an
RBC
proteome
composed
more
than
4,600
distinct
proteins.
Through
workflow-guided
manual
curation,
compiled
reactions
carried
out
by
this
to
form
model
(GEM)
RBC.
RBC-GEM
is
hosted
version-controlled
GitHub
repository,
ensuring
adherence
standardized
protocols
for
reconstruction
quality
control
stewardship
principles.
represents
network
consisting
820
genes
encoding
acting
1,685
unique
metabolites
2,723
biochemical
reactions:
740%
size
expansion
its
predecessor.
We
demonstrated
utility
creating
context-specific
proteome-constrained
models
derived
stored
RBCs
616
donors,
classified
based
simulated
abundance
dependence.
This
up-to-date
GEM
can
be
used
contextualization
construction
computational
whole-cell
Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
Abstract
An
accurate
deep
learning
predictor
is
needed
for
enzyme
optimal
temperature
(${T}_{opt}$),
which
quantitatively
describes
how
affects
the
catalytic
activity.
In
comparison
with
existing
models,
a
new
model
developed
in
this
study,
Seq2Topt,
reached
superior
accuracy
on
${T}_{opt}$
prediction
just
using
protein
sequences
(RMSE
=
12.26°C
and
R2
0.57),
could
capture
key
regions
multi-head
attention
residues.
Through
case
studies
thermophilic
selection
predicting
shifts
caused
by
point
mutations,
Seq2Topt
was
demonstrated
as
promising
computational
tool
mining
in-silico
design.
Additionally,
predictors
of
pH
(Seq2pHopt,
RMSE
0.88
0.42)
melting
(Seq2Tm,
7.57
°C
0.64)
were
based
architecture
suggesting
that
development
potentially
give
rise
to
useful
platform
enzymes.
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: March 20, 2025
Abstract
Accurate
prediction
of
enzyme
kinetic
parameters
is
crucial
for
exploration
and
modification.
Existing
models
face
the
problem
either
low
accuracy
or
poor
generalization
ability
due
to
overfitting.
In
this
work,
we
first
developed
unbiased
datasets
evaluate
actual
performance
these
methods
proposed
a
deep
learning
model,
CataPro,
based
on
pre-trained
molecular
fingerprints
predict
turnover
number
(
k
c
t
),
Michaelis
constant
K
m
catalytic
efficiency
/
).
Compared
with
previous
baseline
models,
CataPro
demonstrates
clearly
enhanced
datasets.
representational
mining
project,
by
combining
traditional
methods,
identified
an
(SsCSO)
19.53
times
increased
activity
compared
initial
(CSO2)
then
successfully
engineered
it
improve
its
3.34
times.
This
reveals
high
potential
as
effective
tool
future
discovery
Research Square (Research Square),
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 19, 2025
Abstract
Enzyme
catalytic
efficiency
(kcat
/
Km)
is
a
key
parameter
for
identifying
high-activity
enzymes.
Recently
deep
learning
techniques
have
demonstrated
the
potential
fast
and
accurate
kcatKm
prediction.
However,
three
challenges
remain:
(i)
limited
size
of
available
dataset
hinders
development
models;
(ii)
model
predictions
lacked
reliable
confidence
estimates;
(iii)
models
interpretable
insights
into
enzyme-catalyzed
reactions.
To
address
these
challenges,
we
proposed
IECata,
prediction
that
provides
uncertainty
estimation
interpretability.
IECata
collected
two
datasets
from
databases
literatures.
By
introducing
evidential
learning,
an
predictions.
Moreover,
it
uses
bilinear
attention
mechanism
to
focused
on
crucial
local
interactions
interpret
residues
substrate
atoms
in
Testing
results
indicate
performance
exceeds
state-of-the-art
benchmark
models.
Case
studies
further
highlight
incorporation
screening
highly
active
enzymes
can
effectively
reduce
false
positives,
thereby
improving
experimental
validation
accelerating
directed
enzyme
evolution.
public
usage
developed
online
platform:
http://mathtc.nscc-tj.cn/cataai/.
Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(3)
Published: May 1, 2025
Abstract
Catalytic
constant
(Kcat)
is
to
describe
the
efficiency
of
catalyzing
reactions.
The
Kcat
value
an
enzyme-substrate
pair
indicates
rate
enzyme
converts
saturated
substrates
into
product
during
catalytic
process.
However,
it
challenging
construct
robust
prediction
models
for
this
important
property.
Most
existing
models,
including
one
recently
published
by
Nature
Catalysis
(Li
et
al.),
are
suffering
from
overfitting
issue.
In
study,
we
proposed
a
novel
protocol
introducing
intermedia
step
separately
develop
substrate
and
protein
processors.
processor
leverages
analyzing
Simplified
Molecular
Input
Line
Entry
System
(SMILES)
strings
using
graph
neural
network
model,
attentive
FP,
while
abstracts
sequence
information
utilizing
long
short-term
memory
architecture.
This
not
only
mitigates
impact
data
imbalance
in
original
dataset
but
also
provides
greater
flexibility
customizing
general-purpose
model
enhance
accuracy
specific
classes.
Our
demonstrates
significantly
enhanced
stability
slightly
better
(R2
0.54
versus
0.50)
comparison
with
Li
al.’s
same
dataset.
Additionally,
our
modeling
enables
personalization
fine-tuning
categories
through
focused
learning.
Using
Cytochrome
P450
(CYP450)
enzymes
as
case
achieved
best
R2
0.64
model.
high-quality
performance
expandability
guarantee
its
broad
applications
engineering
drug
research
&
development.
International Journal of Molecular Sciences,
Journal Year:
2024,
Volume and Issue:
25(17), P. 9280 - 9280
Published: Aug. 27, 2024
Predicting
protein-ligand
binding
sites
is
an
integral
part
of
structural
biology
and
drug
design.
A
comprehensive
understanding
these
essential
for
advancing
innovation,
elucidating
mechanisms
biological
function,
exploring
the
nature
disease.
However,
accurately
identifying
remains
a
challenging
task.
To
address
this,
we
propose
PGpocket,
geometric
deep
learning-based
framework
to
improve
site
prediction.
Initially,
protein
surface
converted
into
point
cloud,
then
chemical
properties
each
are
calculated.
Subsequently,
cloud
graph
constructed
based
on
inter-point
distances,
neural
network
(GNN)
applied
extract
analyze
information
predict
potential
sites.
PGpocket
trained
scPDB
dataset,
its
performance
verified
two
independent
test
sets,
Coach420
HOLO4K.
The
results
show
that
achieves
58%
success
rate
dataset
56%
HOLO4K
dataset.
These
surpass
competing
algorithms,
demonstrating
PGpocket's
advancement
practicality