Cytochrome P450 Enzyme Design by Constraining the Catalytic Pocket in a Diffusion Model DOI Creative Commons
Qian Wang, Xiaonan Liu, Hejian Zhang

et al.

Research, Journal Year: 2024, Volume and Issue: 7

Published: Jan. 1, 2024

Although cytochrome P450 enzymes are the most versatile biocatalysts in nature, there is insufficient comprehension of molecular mechanism underlying their functional innovation process. Here, by combining ancestral sequence reconstruction, reverse mutation assay, and progressive forward accumulation, we identified 5 founder residues catalytic pocket flavone 6-hydroxylase (F6H) proposed a “3-point fixation” model to elucidate mechanisms P450s nature. According this design principle pocket, further developed de novo diffusion (P450Diffusion) generate artificial P450s. Ultimately, among 17 non-natural generated, 10 designs exhibited significant F6H activity 6 1.3- 3.5-fold increase capacity compared natural CYP706X1. This work not only explores pockets P450s, but also provides an insight into with desired functions.

Language: Английский

Machine Learning-Guided Protein Engineering DOI Creative Commons
Petr Kouba, Pavel Kohout, Faraneh Haddadi

et al.

ACS Catalysis, Journal Year: 2023, Volume and Issue: 13(21), P. 13863 - 13895

Published: Oct. 13, 2023

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid the discovery annotation of enzymes, as well suggesting beneficial mutations for improving known targets. The field protein is gathering steam, driven by recent success stories notable other areas. It already encompasses ambitious tasks such understanding predicting structure function, catalytic efficiency, enantioselectivity, dynamics, stability, solubility, aggregation, more. Nonetheless, still evolving, with many challenges overcome questions address. In this Perspective, we provide an overview ongoing trends domain, highlight case studies, examine current limitations learning-based We emphasize crucial importance thorough validation emerging models before their use rational design. present our opinions on fundamental problems outline potential directions future research.

Language: Английский

Citations

95

De novo protein design—From new structures to programmable functions DOI Creative Commons
Tanja Kortemme

Cell, Journal Year: 2024, Volume and Issue: 187(3), P. 526 - 544

Published: Feb. 1, 2024

Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes molecular functions de novo, without starting found in nature. In this Perspective, I will discuss the state field novo protein design at juncture physics-based modeling approaches AI. New folds higher-order assemblies be designed considerable experimental success rates, difficult problems requiring tunable control over conformations precise shape complementarity for recognition are coming into reach. Emerging incorporate engineering principles-tunability, controllability, modularity-into process beginning. Exciting frontiers lie deconstructing cellular and, conversely, constructing synthetic signaling ground up. As methods improve, many more challenges unsolved.

Language: Английский

Citations

90

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering DOI Creative Commons
Jason Yang, Francesca-Zhoufan Li, Frances H. Arnold

et al.

ACS Central Science, Journal Year: 2024, Volume and Issue: 10(2), P. 226 - 241

Published: Feb. 5, 2024

Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even unlock new activities not found in nature. Because search space possible proteins is vast, enzyme engineering usually involves discovering an starting point that has some desired activity followed by directed evolution improve its "fitness" for a application. Recently, machine learning (ML) emerged powerful tool complement this empirical process. ML models contribute (1) discovery functional annotation known protein or generating novel with functions (2) navigating fitness landscapes optimization mappings between associated values. In Outlook, we explain how complements discuss future potential improved outcomes.

Language: Английский

Citations

76

Bilingual Language Model for Protein Sequence and Structure DOI Creative Commons
Michael Heinzinger, Konstantin Weißenow, Joaquin Gomez Sanchez

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 25, 2023

Abstract Adapting large language models (LLMs) to protein sequences spawned the development of powerful (pLMs). Concurrently, AlphaFold2 broke through in structure prediction. Now we can systematically and comprehensively explore dual nature proteins that act exist as three-dimensional (3D) machines evolve linear strings one-dimensional (1D) sequences. Here, leverage pLMs simultaneously model both modalities by combining 1D with 3D a single model. We encode structures token using 3Di-alphabet introduced 3D-alignment method Foldseek . This new foundation pLM extracts features patterns resulting “structure-sequence” representation. Toward this end, built non-redundant dataset from AlphaFoldDB fine-tuned an existing (ProtT5) translate between 3Di amino acid As proof-of-concept for our novel approach, dubbed Protein structure-sequence T5 ( ProstT5 ), showed improved performance subsequent prediction tasks, “inverse folding”, namely generation adopting given structural scaffold (“fold”). Our work showcased potential tap into information-rich revolution fueled AlphaFold2. paves way develop tools integrating vast resource predictions, opens research avenues post-AlphaFold2 era. is freely available all at https://github.com/mheinzinger/ProstT5

Language: Английский

Citations

65

Protein generation with evolutionary diffusion: sequence is all you need DOI Creative Commons
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Sept. 12, 2023

Abstract Deep generative models are increasingly powerful tools for the in silico design of novel proteins. Recently, a family called diffusion has demonstrated ability to generate biologically plausible proteins that dissimilar any actual seen nature, enabling unprecedented capability and control de novo protein design. However, current state-of-the-art structures, which limits scope their training data restricts generations small biased subset space. Here, we introduce general-purpose framework, EvoDiff, combines evolutionary-scale with distinct conditioning capabilities controllable generation sequence EvoDiff generates high-fidelity, diverse, structurally-plausible cover natural functional We show experimentally express, fold, exhibit expected secondary structure elements. Critically, can inaccessible structure-based models, such as those disordered regions, while maintaining scaffolds structural motifs. validate universality our sequence-based formulation by characterizing intrinsically-disordered mitochondrial targeting signals, metal-binding proteins, binders designed using EvoDiff. envision will expand engineering beyond structure-function paradigm toward programmable, sequence-first

Language: Английский

Citations

65

Computational scoring and experimental evaluation of enzymes generated by neural networks DOI Creative Commons
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: unknown

Published: April 23, 2024

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics assess the quality enzyme sequences produced by three contrasting models: ancestral reconstruction, adversarial network language model. Focusing on two families, we expressed purified over 500 natural with 70-90% identity most similar benchmark for in vitro activity. Over rounds experiments, filter that improved rate experimental success 50-150%. The proposed drive engineering research serving as helping select active variants testing.

Language: Английский

Citations

27

Bilingual language model for protein sequence and structure DOI Creative Commons
Michael Heinzinger, Konstantin Weißenow, Joaquin Gomez Sanchez

et al.

NAR Genomics and Bioinformatics, Journal Year: 2024, Volume and Issue: 6(4)

Published: Sept. 28, 2024

Adapting language models to protein sequences spawned the development of powerful (pLMs). Concurrently, AlphaFold2 broke through in structure prediction. Now we can systematically and comprehensively explore dual nature proteins that act exist as three-dimensional (3D) machines evolve linear strings one-dimensional (1D) sequences. Here, leverage pLMs simultaneously model both modalities a single model. We encode structures token using 3Di-alphabet introduced by 3D-alignment method

Language: Английский

Citations

27

Advances in microbial exoenzymes bioengineering for improvement of bioplastics degradation DOI Creative Commons
Farzad Rahmati, Debadatta Sethi, Weixi Shu

et al.

Chemosphere, Journal Year: 2024, Volume and Issue: 355, P. 141749 - 141749

Published: March 21, 2024

Plastic pollution has become a major global concern, posing numerous challenges for the environment and wildlife. Most conventional ways of plastics degradation are inefficient cause great damage to ecosystems. The development biodegradable offers promising solution waste management. These designed break down under various conditions, opening up new possibilities mitigate negative impact traditional plastics. Microbes, including bacteria fungi, play crucial role in bioplastics by producing secreting extracellular enzymes, such as cutinase, lipases, proteases. However, these microbial enzymes sensitive extreme environmental temperature acidity, affecting their functions stability. To address challenges, scientists have employed protein engineering immobilization techniques enhance enzyme stability predict structures. Strategies improving substrate interaction, increasing thermostability, reinforcing bonding between active site substrate, refining activity being utilized boost functionality. Recently, bioengineering through gene cloning expression potential microorganisms, revolutionized biodegradation bioplastics. This review aimed discuss most recent strategies modifying bioplastic-degrading terms functionality, thermostability enhancement, binding site, with other improvement surface action. Additionally, discovered exoenzymes metagenomics were emphasized.

Language: Английский

Citations

26

Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design DOI Creative Commons

Braun Markus,

Gruber Christian C,

Krassnigg Andreas

et al.

ACS Catalysis, Journal Year: 2023, Volume and Issue: 13(21), P. 14454 - 14469

Published: Oct. 26, 2023

Emerging computational tools promise to revolutionize protein engineering for biocatalytic applications and accelerate the development timelines previously needed optimize an enzyme its more efficient variant. For over a decade, benefits of predictive algorithms have helped scientists engineers navigate complexity functional sequence space. More recently, spurred by dramatic advances in underlying tools, faster, cheaper, accurate identification, characterization, has catapulted terms such as artificial intelligence machine learning must-have vocabulary field. This Perspective aims showcase current status pharmaceutical industry also discuss celebrate innovative approaches science highlighting their potential selected recent developments offering thoughts on future opportunities biocatalysis. It critically assesses technology's limitations, unanswered questions, unmet challenges.

Language: Английский

Citations

35

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks DOI Creative Commons
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 4, 2023

Abstract In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate computational metrics assess the quality of enzyme sequences produced by three contrasting models: ancestral reconstruction, a adversarial network, language model. Focusing on two families, we expressed purified over 440 natural with 70-90% identity most similar benchmark for in vitro activity. Over rounds experiments, filter that improved experimental success rates 44-100%. Surprisingly, neither nor AlphaFold2 residue-confidence scores were predictive The proposed drive engineering research serving as helping select active variants test experimentally.

Language: Английский

Citations

27