
Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown
Published: May 16, 2024
Language: Английский
Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown
Published: May 16, 2024
Language: Английский
bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: April 22, 2024
Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology, and human health. CRISPR-based gene editors derived from microbes, while powerful, often show significant functional tradeoffs when ported into non-native environments, such as cells. Artificial intelligence (AI) enabled design provides a powerful alternative with bypass evolutionary constraints generate optimal properties. Here, using large language models (LLMs) trained on biological diversity at scale, we demonstrate first successful precision of genome programmable editor designed AI. To achieve this goal, curated dataset over one million CRISPR operons through systematic mining 26 terabases assembled genomes meta-genomes. We capacity our by generating 4.8x number protein clusters across CRISPR-Cas families found nature tailoring single-guide RNA sequences for Cas9-like effector proteins. Several generated comparable or improved activity specificity relative SpCas9, prototypical effector, being 400 mutations away sequence. Finally, an AI-generated editor, denoted OpenCRISPR-1, exhibits compatibility base editing. release OpenCRISPR-1 publicly facilitate broad, ethical usage research commercial applications.
Language: Английский
Citations
40Nature Biotechnology, Journal Year: 2024, Volume and Issue: unknown
Published: Sept. 25, 2024
Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but limited in their ability to guide proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space model based on RoseTTAFold that simultaneously generates sequences structures. Beginning from noised representation, PG structure pairs by iterative denoising, guided desired structural attributes. We designed thermostable varying amino acid compositions internal repeats cage bioactive peptides, such as melittin. By averaging logits between trajectories distinct constraints, multistate parent-child triples which same folds different supersecondary structures when intact parent versus split into two child domains. design can be experimental sequence-activity data, providing general approach integrated computational optimization function.
Language: Английский
Citations
25Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)
Published: Jan. 20, 2025
Language: Английский
Citations
11bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: May 5, 2024
Abstract The design of functional enzymes holds promise for transformative solutions across various domains but presents significant challenges. Inspired by the success language models in generating nature-like proteins, we explored potential an enzyme-specific model designing catalytically active artificial enzymes. Here, introduce ZymCTRL (’enzyme control’), a conditional trained on enzyme sequence space, capable based user-defined specifications. Experimental validation at diverse data regimes and different families demonstrated ZymCTRL’s ability to generate identity ranges. Specifically, describe carbonic anhydrases lactate dehydrogenases zero-shot, without requiring further training model, showcasing activity identities below 40% compared natural proteins. Biophysical analysis confirmed globularity well-folded nature generated sequences. Furthermore, fine-tuning enabled generation outside space with comparable their counterparts. Two were selected scale production successfully lyophilised, maintaining demonstrating preliminary conversion one-pot enzymatic cascades under extreme conditions. Our findings open new door towards rapid cost-effective proficient dataset are freely available community.
Language: Английский
Citations
13Science, Journal Year: 2024, Volume and Issue: 386(6720), P. 439 - 445
Published: Oct. 24, 2024
Machine learning (ML)–based design approaches have advanced the field of de novo protein design, with diffusion-based generative methods increasingly dominating pipelines. Here, we report a “hallucination”-based approach that functions in relaxed sequence space, enabling efficient high-quality backbones over multiple scales and broad scope application without need for any form retraining. We experimentally produced characterized more than 100 proteins. Three high-resolution crystal structures two cryo–electron microscopy density maps designed single-chain proteins comprising up to 1000 amino acids validate accuracy method. Our pipeline can also be used synthetic protein-protein interactions, as validated by set heterodimers. Relaxed optimization offers attractive performance respect designability, applicability different problems, scalability across sizes.
Language: Английский
Citations
13Current Opinion in Biomedical Engineering, Journal Year: 2024, Volume and Issue: 31, P. 100553 - 100553
Published: Aug. 2, 2024
Language: Английский
Citations
4bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: Aug. 2, 2024
Enzyme engineering is limited by the challenge of rapidly generating and using large datasets sequence-function relationships for predictive design. To address this challenge, we developed a machine learning (ML)-guided platform that integrates cell-free DNA assembly, gene expression, functional assays to map fitness landscapes across protein sequence space optimize enzymes multiple, distinct chemical reactions. We applied engineer amide synthetases evaluating substrate preference 1,217 enzyme variants in 10,953 unique used these data build augmented ridge regression ML models predicting synthetase capable making 9 small molecule pharmaceuticals. Our ML-guided, framework promises accelerate enabling iterative exploration specialized biocatalysts parallel.
Language: Английский
Citations
4Nature Reviews Methods Primers, Journal Year: 2025, Volume and Issue: 5(1)
Published: Feb. 27, 2025
Citations
0bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 14, 2025
Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense loops and other complex structural motifs critical function. We introduce SHAPES (Structural Hierarchical Assessment Proteins with Embedding Similarity) evaluate five state-of-the-art models structures. Using embeddings across multiple hierarchies, ranging from local geometries global architectures, we reveal substantial undersampling observed structure space by these models. use Fréchet Protein Distance (FPD) quantify distributional coverage. Different are distinct coverage behavior different noise scales temperatures; frequency TERtiary Motifs (TERMs) further supports observations. More robust sequence design prediction methods likely crucial guiding development improved designable space.
Language: Английский
Citations
0Current Opinion in Biotechnology, Journal Year: 2025, Volume and Issue: 92, P. 103256 - 103256
Published: Jan. 18, 2025
Recent advances in protein engineering have revolutionized the design of bionanomolecular assemblies for functional therapeutic and biotechnological applications. This review highlights progress creating complex architectures, encompassing both finite extended assemblies. AI tools, including AlphaFold, RFDiffusion, ProteinMPNN, significantly enhanced scalability success de novo designs. Finite assemblies, like nanocages coiled-coil-based structures, enable precise molecular encapsulation or domain presentation. Extended filaments 2D/3D lattices, offer unparalleled structural versatility applications such as vaccine development, responsive biomaterials, engineered cellular scaffolds. The convergence artificial intelligence-driven experimental validation promises strong acceleration development tailored offering new opportunities synthetic biology, materials science, biotechnology, biomedicine.
Language: Английский
Citations
0