A Unified Framework of Scaffold-Lab for Critical Assessment of Protein Backbone Generation Methods DOI Creative Commons
Haifeng Chen,

Zhuoqi Zheng,

Bo Zhang

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: May 16, 2024

Abstract De novo protein design has undergone a rapid development in recent years, especially for backbone generation, which stands out as more challenging yet valuable, offering the ability to novel folds with fewer constraints. However, comprehensive delineation of its potential practical application engineering remains lacking, does standardized evaluation framework accurately assess diverse methodologies within this field. Here, we proposed Scaffold-Lab benchmark focusing on evaluating unconditional generation across metrics like designability, novelty, diversity, efficiency and structural properties. We also extrapolated our include motif-scaffolding problem, demonstrating utility these conditional models. Our findings reveal that FrameFlow RFdiffusion GPDL-H showcased most outstanding performances. Furthermore, described systematic study investigate applied it task, perspective analysis methods. All data scripts are available at https://github.com/Immortals-33/Scaffold-Lab.

Language: Английский

Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences DOI Creative Commons
Jeffrey A. Ruffolo, Stephen Nayfach, Joseph P. Gallagher

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: April 22, 2024

Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology, and human health. CRISPR-based gene editors derived from microbes, while powerful, often show significant functional tradeoffs when ported into non-native environments, such as cells. Artificial intelligence (AI) enabled design provides a powerful alternative with bypass evolutionary constraints generate optimal properties. Here, using large language models (LLMs) trained on biological diversity at scale, we demonstrate first successful precision of genome programmable editor designed AI. To achieve this goal, curated dataset over one million CRISPR operons through systematic mining 26 terabases assembled genomes meta-genomes. We capacity our by generating 4.8x number protein clusters across CRISPR-Cas families found nature tailoring single-guide RNA sequences for Cas9-like effector proteins. Several generated comparable or improved activity specificity relative SpCas9, prototypical effector, being 400 mutations away sequence. Finally, an AI-generated editor, denoted OpenCRISPR-1, exhibits compatibility base editing. release OpenCRISPR-1 publicly facilitate broad, ethical usage research commercial applications.

Language: Английский

Citations

40

Multistate and functional protein design using RoseTTAFold sequence space diffusion DOI Creative Commons
Sidney Lisanza,

Jacob Merle Gershon,

S. Tipps

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 25, 2024

Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but limited in their ability to guide proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space model based on RoseTTAFold that simultaneously generates sequences structures. Beginning from noised representation, PG structure pairs by iterative denoising, guided desired structural attributes. We designed thermostable varying amino acid compositions internal repeats cage bioactive peptides, such as melittin. By averaging logits between trajectories distinct constraints, multistate parent-child triples which same folds different supersecondary structures when intact parent versus split into two child domains. design can be experimental sequence-activity data, providing general approach integrated computational optimization function.

Language: Английский

Citations

25

Accelerated enzyme engineering by machine-learning guided cell-free expression DOI Creative Commons
Grant M. Landwehr, Jonathan W. Bogart,

Carol Magalhaes

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Jan. 20, 2025

Language: Английский

Citations

11

Conditional language models enable the efficient design of proficient enzymes DOI Creative Commons
Geraldene Munsamy, R. Illanes, Silvia Fruncillo

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 5, 2024

Abstract The design of functional enzymes holds promise for transformative solutions across various domains but presents significant challenges. Inspired by the success language models in generating nature-like proteins, we explored potential an enzyme-specific model designing catalytically active artificial enzymes. Here, introduce ZymCTRL (’enzyme control’), a conditional trained on enzyme sequence space, capable based user-defined specifications. Experimental validation at diverse data regimes and different families demonstrated ZymCTRL’s ability to generate identity ranges. Specifically, describe carbonic anhydrases lactate dehydrogenases zero-shot, without requiring further training model, showcasing activity identities below 40% compared natural proteins. Biophysical analysis confirmed globularity well-folded nature generated sequences. Furthermore, fine-tuning enabled generation outside space with comparable their counterparts. Two were selected scale production successfully lyophilised, maintaining demonstrating preliminary conversion one-pot enzymatic cascades under extreme conditions. Our findings open new door towards rapid cost-effective proficient dataset are freely available community.

Language: Английский

Citations

13

Scalable protein design using optimization in a relaxed sequence space DOI
Christopher L. Frank, Ali Khoshouei,

Lara Fuβ

et al.

Science, Journal Year: 2024, Volume and Issue: 386(6720), P. 439 - 445

Published: Oct. 24, 2024

Machine learning (ML)–based design approaches have advanced the field of de novo protein design, with diffusion-based generative methods increasingly dominating pipelines. Here, we report a “hallucination”-based approach that functions in relaxed sequence space, enabling efficient high-quality backbones over multiple scales and broad scope application without need for any form retraining. We experimentally produced characterized more than 100 proteins. Three high-resolution crystal structures two cryo–electron microscopy density maps designed single-chain proteins comprising up to 1000 amino acids validate accuracy method. Our pipeline can also be used synthetic protein-protein interactions, as validated by set heterodimers. Relaxed optimization offers attractive performance respect designability, applicability different problems, scalability across sizes.

Language: Английский

Citations

13

Using machine learning to enhance and accelerate synthetic biology DOI
Kshitij Rai, Yiduo Wang, Ronan W. O’Connell

et al.

Current Opinion in Biomedical Engineering, Journal Year: 2024, Volume and Issue: 31, P. 100553 - 100553

Published: Aug. 2, 2024

Language: Английский

Citations

4

Accelerated enzyme engineering by machine-learning guided cell-free expression DOI Creative Commons
Grant M. Landwehr, Jonathan W. Bogart,

Carol Magalhaes

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 2, 2024

Enzyme engineering is limited by the challenge of rapidly generating and using large datasets sequence-function relationships for predictive design. To address this challenge, we developed a machine learning (ML)-guided platform that integrates cell-free DNA assembly, gene expression, functional assays to map fitness landscapes across protein sequence space optimize enzymes multiple, distinct chemical reactions. We applied engineer amide synthetases evaluating substrate preference 1,217 enzyme variants in 10,953 unique used these data build augmented ridge regression ML models predicting synthetase capable making 9 small molecule pharmaceuticals. Our ML-guided, framework promises accelerate enabling iterative exploration specialized biocatalysts parallel.

Language: Английский

Citations

4

Computational protein design DOI Creative Commons
Katherine I. Albanese, Sophie Barbe, Derek N. Woolfson

et al.

Nature Reviews Methods Primers, Journal Year: 2025, Volume and Issue: 5(1)

Published: Feb. 27, 2025

Citations

0

Assessing Generative Model Coverage of Protein Structures with SHAPES DOI Creative Commons
Tianyu Lu,

Melissa Liu,

Yilin Chen

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 14, 2025

Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense loops and other complex structural motifs critical function. We introduce SHAPES (Structural Hierarchical Assessment Proteins with Embedding Similarity) evaluate five state-of-the-art models structures. Using embeddings across multiple hierarchies, ranging from local geometries global architectures, we reveal substantial undersampling observed structure space by these models. use Fréchet Protein Distance (FPD) quantify distributional coverage. Different are distinct coverage behavior different noise scales temperatures; frequency TERtiary Motifs (TERMs) further supports observations. More robust sequence design prediction methods likely crucial guiding development improved designable space.

Language: Английский

Citations

0

Advances in designed bionanomolecular assemblies for biotechnological and biomedical applications DOI Creative Commons
Jaka Snoj, Weijun Zhou, Ajasja Ljubetič

et al.

Current Opinion in Biotechnology, Journal Year: 2025, Volume and Issue: 92, P. 103256 - 103256

Published: Jan. 18, 2025

Recent advances in protein engineering have revolutionized the design of bionanomolecular assemblies for functional therapeutic and biotechnological applications. This review highlights progress creating complex architectures, encompassing both finite extended assemblies. AI tools, including AlphaFold, RFDiffusion, ProteinMPNN, significantly enhanced scalability success de novo designs. Finite assemblies, like nanocages coiled-coil-based structures, enable precise molecular encapsulation or domain presentation. Extended filaments 2D/3D lattices, offer unparalleled structural versatility applications such as vaccine development, responsive biomaterials, engineered cellular scaffolds. The convergence artificial intelligence-driven experimental validation promises strong acceleration development tailored offering new opportunities synthetic biology, materials science, biotechnology, biomedicine.

Language: Английский

Citations

0