Advances in Zero‐Shot Prediction‐Guided Enzyme Engineering Using Machine Learning DOI Open Access
Chang Liu, Junxian Wu, Yongbo Chen

et al.

ChemCatChem, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 23, 2024

Abstract The advent of machine learning (ML) has significantly advanced enzyme engineering, particularly through zero‐shot (ZS) predictors that forecast the effects amino acid mutations on properties without requiring additional labeled data for target enzyme. This review comprehensively summarizes ZS developed over past decade, categorizing them into kinetic parameters, stability, solubility/aggregation, and fitness. It details algorithms used, encompassing traditional ML approaches deep models, emphasizing their predictive performance. Practical applications in engineering specific enzymes are discussed. Despite notable advancements, challenges persist, including limited training necessity to incorporate environmental factors (e.g., pH, temperature) dynamics these models. Future directions proposed advance prediction‐guided thereby enhancing practical utility predictors.

Language: Английский

CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters DOI Creative Commons
Veda Sheersh Boorla, Costas D. Maranas

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Feb. 28, 2025

Estimation of enzymatic activities still heavily relies on experimental assays, which can be cost and time-intensive. We present CatPred, a deep learning framework for predicting in vitro enzyme kinetic parameters, including turnover numbers (kcat), Michaelis constants (Km), inhibition (Ki). CatPred addresses key challenges such as the lack standardized datasets, performance evaluation sequences that are dissimilar to those used during training, model uncertainty quantification. explore diverse architectures feature representations, pretrained protein language models three-dimensional structural features, enable robust predictions. provides accurate predictions with query-specific estimates, lower predicted variances correlating higher accuracy. Pretrained features particularly enhance out-of-distribution samples. also introduces benchmark datasets extensive coverage (~23 k, 41 12 k data points kcat, Km, Ki respectively). Our performs competitively existing methods while offering reliable is parameters (kcat, Ki) from sequence features. It improves accuracy unseen enzymes using advancing computational characterization.

Language: Английский

Citations

3

Modern machine learning methods for protein property prediction DOI

Arjun Dosajh,

P. K. Agrawal,

Prathit Chatterjee

et al.

Current Opinion in Structural Biology, Journal Year: 2025, Volume and Issue: 90, P. 102990 - 102990

Published: Jan. 28, 2025

Language: Английский

Citations

0

RBC-GEM: A genome-scale metabolic model for systems biology of the human red blood cell DOI Creative Commons
Zachary B. Haiman, Alicia Key, Angelo D’Alessandro

et al.

PLoS Computational Biology, Journal Year: 2025, Volume and Issue: 21(3), P. e1012109 - e1012109

Published: March 12, 2025

Advancements with cost-effective, high-throughput omics technologies have had a transformative effect on both fundamental and translational research in the medical sciences. These advancements facilitated departure from traditional view of human red blood cells (RBCs) as mere carriers hemoglobin, devoid significant biological complexity. Over past decade, proteomic analyses identified growing number different proteins present within RBCs, enabling systems biology analysis their physiological functions. Here, we introduce RBC-GEM, one most comprehensive, curated genome-scale metabolic reconstructions specific cell type to-date. It was developed through meta-analysis data 29 studies published over two decades resulting an RBC proteome composed more than 4,600 distinct proteins. Through workflow-guided manual curation, compiled reactions carried out by this to form model (GEM) RBC. RBC-GEM is hosted version-controlled GitHub repository, ensuring adherence standardized protocols for reconstruction quality control stewardship principles. represents network consisting 820 genes encoding acting 1,685 unique metabolites 2,723 biochemical reactions: 740% size expansion its predecessor. We demonstrated utility creating context-specific proteome-constrained models derived stored RBCs 616 donors, classified based simulated abundance dependence. This up-to-date GEM can be used contextualization construction computational whole-cell

Language: Английский

Citations

0

Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature DOI Creative Commons
Sizhe Qiu, Bozhen Hu, Jing Zhao

et al.

Briefings in Bioinformatics, Journal Year: 2025, Volume and Issue: 26(2)

Published: March 1, 2025

Abstract An accurate deep learning predictor is needed for enzyme optimal temperature (${T}_{opt}$), which quantitatively describes how affects the catalytic activity. In comparison with existing models, a new model developed in this study, Seq2Topt, reached superior accuracy on ${T}_{opt}$ prediction just using protein sequences (RMSE = 12.26°C and R2 0.57), could capture key regions multi-head attention residues. Through case studies thermophilic selection predicting shifts caused by point mutations, Seq2Topt was demonstrated as promising computational tool mining in-silico design. Additionally, predictors of pH (Seq2pHopt, RMSE 0.88 0.42) melting (Seq2Tm, 7.57 °C 0.64) were based architecture suggesting that development potentially give rise to useful platform enzymes.

Language: Английский

Citations

0

Robust enzyme discovery and engineering with deep learning using CataPro DOI Creative Commons
Zechen Wang, Dongqi Xie, Di Wu

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: March 20, 2025

Abstract Accurate prediction of enzyme kinetic parameters is crucial for exploration and modification. Existing models face the problem either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets evaluate actual performance these methods proposed a deep learning model, CataPro, based on pre-trained molecular fingerprints predict turnover number ( k c t ), Michaelis constant K m catalytic efficiency / ). Compared with previous baseline models, CataPro demonstrates clearly enhanced datasets. representational mining project, by combining traditional methods, identified an (SsCSO) 19.53 times increased activity compared initial (CSO2) then successfully engineered it improve its 3.34 times. This reveals high potential as effective tool future discovery

Language: Английский

Citations

0

IECata: Interpretable bilinear attention network and evidential deep learning improve the catalytic efficiency prediction of enzymes DOI Creative Commons
Jingjing Wang, Yanpeng Zhao,

Zhijiang Yang

et al.

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: March 19, 2025

Abstract Enzyme catalytic efficiency (kcat / Km) is a key parameter for identifying high-activity enzymes. Recently deep learning techniques have demonstrated the potential fast and accurate kcat Km prediction. However, three challenges remain: (i) limited size of available dataset hinders development models; (ii) model predictions lacked reliable confidence estimates; (iii) models interpretable insights into enzyme-catalyzed reactions. To address these challenges, we proposed IECata, prediction that provides uncertainty estimation interpretability. IECata collected two datasets from databases literatures. By introducing evidential learning, an predictions. Moreover, it uses bilinear attention mechanism to focused on crucial local interactions interpret residues substrate atoms in Testing results indicate performance exceeds state-of-the-art benchmark models. Case studies further highlight incorporation screening highly active enzymes can effectively reduce false positives, thereby improving experimental validation accelerating directed enzyme evolution. public usage developed online platform: http://mathtc.nscc-tj.cn/cataai/.

Language: Английский

Citations

0

NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling DOI Creative Commons
Jingchen Zhai, Xiguang Qi, Lianjin Cai

et al.

Briefings in Bioinformatics, Journal Year: 2025, Volume and Issue: 26(3)

Published: May 1, 2025

Abstract Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value an enzyme-substrate pair indicates rate enzyme converts saturated substrates into product during catalytic process. However, it challenging construct robust prediction models for this important property. Most existing models, including one recently published by Nature Catalysis (Li et al.), are suffering from overfitting issue. In study, we proposed a novel protocol introducing intermedia step separately develop substrate and protein processors. processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using graph neural network model, attentive FP, while abstracts sequence information utilizing long short-term memory architecture. This not only mitigates impact data imbalance in original dataset but also provides greater flexibility customizing general-purpose model enhance accuracy specific classes. Our demonstrates significantly enhanced stability slightly better (R2 0.54 versus 0.50) comparison with Li al.’s same dataset. Additionally, our modeling enables personalization fine-tuning categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as case achieved best R2 0.64 model. high-quality performance expandability guarantee its broad applications engineering drug research & development.

Language: Английский

Citations

0

A Point Cloud Graph Neural Network for Protein–Ligand Binding Site Prediction DOI Open Access
Yanpeng Zhao, Song He, Yuting Xing

et al.

International Journal of Molecular Sciences, Journal Year: 2024, Volume and Issue: 25(17), P. 9280 - 9280

Published: Aug. 27, 2024

Predicting protein-ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding these essential for advancing innovation, elucidating mechanisms biological function, exploring the nature disease. However, accurately identifying remains a challenging task. To address this, we propose PGpocket, geometric deep learning-based framework to improve site prediction. Initially, protein surface converted into point cloud, then chemical properties each are calculated. Subsequently, cloud graph constructed based on inter-point distances, neural network (GNN) applied extract analyze information predict potential sites. PGpocket trained scPDB dataset, its performance verified two independent test sets, Coach420 HOLO4K. The results show that achieves 58% success rate dataset 56% HOLO4K dataset. These surpass competing algorithms, demonstrating PGpocket's advancement practicality

Language: Английский

Citations

3

Graph-Aware AURALSTM: An Attentive Unified Representation Architecture with BiLSTM for Enhanced Molecular Property Prediction DOI Creative Commons
Muhammed Ali Pala

Molecular Diversity, Journal Year: 2025, Volume and Issue: unknown

Published: April 25, 2025

Language: Английский

Citations

0

Termini and substrate cavity engineering of D-Carbamoylase coupled with reduction of ammonium ion inhibition enhanced bioproduction of D-P-Hydroxyphenylglycine DOI

Magezi Joshua,

Samaila Boyi Ajeje,

Hero Nmeri Godspower

et al.

International Journal of Biological Macromolecules, Journal Year: 2025, Volume and Issue: unknown, P. 144250 - 144250

Published: May 1, 2025

Language: Английский

Citations

0