Transferable deep generative modeling of intrinsically disordered protein conformations DOI Creative Commons
Giacomo Janson, Michael Feig

PLoS Computational Biology, Journal Year: 2024, Volume and Issue: 20(5), P. e1012144 - e1012144

Published: May 23, 2024

Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use computational and experimental methods. Molecular simulations are valuable strategy for constructing structural but highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data emerged as efficient alternative generating ensembles. However, such methods currently suffer limited transferability when modeling sequences conformations absent in the training data. Here, we develop novel model achieves high levels intrinsically protein approach, named idpSAM, latent diffusion transformer neural networks. It combines autoencoder to representation geometry sample encoded space. IdpSAM was trained large dataset regions performed with ABSINTH implicit solvent model. Thanks expressiveness its networks stability, idpSAM faithfully captures 3D test no similarity set. Our study also demonstrates potential full datasets sampling underscores importance set size generalization. We believe represents significant progress transferable ensemble learning.

Language: Английский

Evolutionary-scale prediction of atomic-level protein structure with a language model DOI Creative Commons
Zeming Lin, Halil Akin, Roshan Rao

et al.

Science, Journal Year: 2023, Volume and Issue: 379(6637), P. 1123 - 1130

Published: March 16, 2023

Recent advances in machine learning have leveraged evolutionary information multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level structure from primary using a large language model. As models sequences are scaled up 15 billion parameters, an atomic-resolution picture emerges the learned representations. This results order-of-magnitude acceleration high-resolution prediction, which enables large-scale structural characterization metagenomic proteins. apply this capability construct ESM Metagenomic Atlas by predicting structures for >617 million sequences, including >225 that predicted with high confidence, gives view into vast breadth and diversity natural

Language: Английский

Citations

2225

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences DOI Creative Commons
Mihály Váradi,

Damian Bertoni,

Paulyna Magaña

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(D1), P. D368 - D375

Published: Nov. 2, 2023

The AlphaFold Database Protein Structure (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled groundbreaking AlphaFold2 artificial intelligence (AI) system, predictions archived DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, a host of curated datasets. We detail access mechanisms direct file via FTP to advanced queries using Google Cloud Public Datasets programmatic endpoints database. also discuss improvements services added since its release, including Predicted Aligned Error viewer, customisation options for 3D search engine DB.

Language: Английский

Citations

647

ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model DOI Creative Commons
Hanyao Huang,

Ou Zheng,

Dongdong Wang

et al.

International Journal of Oral Science, Journal Year: 2023, Volume and Issue: 15(1)

Published: July 28, 2023

The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one the milestone Large Language Models (LLMs) with billions parameters. LLMs have stirred up much interest among researchers practitioners in their impressive skills natural language processing tasks, which profoundly impact various fields. This paper mainly discusses future applications dentistry. We introduce two primary LLM deployment methods dentistry, including automated dental diagnosis cross-modal diagnosis, examine potential applications. Especially, equipped encoder, single can manage multi-source data conduct advanced reasoning to perform complex clinical operations. also present cases demonstrate fully automatic Multi-Modal AI system for dentistry application. While offer significant benefits, challenges, such as privacy, quality, model bias, need further study. Overall, revolutionize treatment, indicates promising avenue application research

Language: Английский

Citations

157

Modeling conformational states of proteins with AlphaFold DOI Creative Commons
Davide Sala, Felipe Engelberger, Hassane S. Mchaourab

et al.

Current Opinion in Structural Biology, Journal Year: 2023, Volume and Issue: 81, P. 102645 - 102645

Published: June 29, 2023

Language: Английский

Citations

110

Efficient and accurate prediction of protein structure using RoseTTAFold2 DOI Open Access
Minkyung Baek, Ivan Anishchenko, Ian R. Humphreys

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: May 25, 2023

Abstract AlphaFold2 and RoseTTAFold predict protein structures with very high accuracy despite substantial architecture differences. We sought to develop an improved method combining features of both. The resulting method, RoseTTAFold2, extends the original three-track over full network, incorporating concepts Frame-aligned point error, recycling during training, use a distillation set from AlphaFold2. also took idea structurally coherent attention in updating pair features, but using more computationally efficient structure-biased as opposed triangle attention. model has on monomers, AlphaFold2-multimer complexes, better computational scaling for large proteins complexes. This excellent performance is achieved without hallmark AlphaFold2, invariant attention, indicating that these are not essential prediction. Almost all recent work structure prediction re-used basic architecture; our results show can be broader class models, opening door further exploration.

Language: Английский

Citations

100

Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning DOI Creative Commons
Kolja Stahl, Andrea Graziadei, Therese Dau

et al.

Nature Biotechnology, Journal Year: 2023, Volume and Issue: 41(12), P. 1810 - 1819

Published: March 20, 2023

While AlphaFold2 can predict accurate protein structures from the primary sequence, challenges remain for proteins that undergo conformational changes or which few homologous sequences are known. Here we introduce AlphaLink, a modified version of algorithm incorporates experimental distance restraint information into its network architecture. By employing sparse contacts as anchor points, AlphaLink improves on performance in predicting challenging targets. We confirm this experimentally by using noncanonical amino acid photo-leucine to obtain residue-residue inside cells crosslinking mass spectrometry. The program distinct conformations basis restraints provided, demonstrating value data driving structure prediction. noise-tolerant framework integrating prediction presented here opens path characterization in-cell data.

Language: Английский

Citations

95

Progress at protein structure prediction, as seen in CASP15 DOI Creative Commons
Arne Elofsson

Current Opinion in Structural Biology, Journal Year: 2023, Volume and Issue: 80, P. 102594 - 102594

Published: April 14, 2023

In Dec 2020, the results of AlphaFold version 2 were presented at CASP14, sparking a revolution in field protein structure predictions. For first time, purely computational method could challenge experimental accuracy for prediction single domains. The code v2 was released summer 2021, and since then, it has been shown that can be used to accurately predict most ordered proteins many protein–protein interactions. It also sparked an explosion development field, improving AI-based methods complexes, disordered regions, design. Here I will review some inventions by release AlphaFold.

Language: Английский

Citations

72

Protein generation with evolutionary diffusion: sequence is all you need DOI Creative Commons
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Sept. 12, 2023

Abstract Deep generative models are increasingly powerful tools for the in silico design of novel proteins. Recently, a family called diffusion has demonstrated ability to generate biologically plausible proteins that dissimilar any actual seen nature, enabling unprecedented capability and control de novo protein design. However, current state-of-the-art structures, which limits scope their training data restricts generations small biased subset space. Here, we introduce general-purpose framework, EvoDiff, combines evolutionary-scale with distinct conditioning capabilities controllable generation sequence EvoDiff generates high-fidelity, diverse, structurally-plausible cover natural functional We show experimentally express, fold, exhibit expected secondary structure elements. Critically, can inaccessible structure-based models, such as those disordered regions, while maintaining scaffolds structural motifs. validate universality our sequence-based formulation by characterizing intrinsically-disordered mitochondrial targeting signals, metal-binding proteins, binders designed using EvoDiff. envision will expand engineering beyond structure-function paradigm toward programmable, sequence-first

Language: Английский

Citations

65

State-specific protein–ligand complex structure prediction with a multiscale deep generative model DOI
Zhuoran Qiao, Weili Nie, Arash Vahdat

et al.

Nature Machine Intelligence, Journal Year: 2024, Volume and Issue: 6(2), P. 195 - 208

Published: Feb. 12, 2024

Language: Английский

Citations

55

Evaluation of AlphaFold antibody–antigen modeling with implications for improving predictive accuracy DOI Creative Commons
Rui Yin, Brian G. Pierce

Protein Science, Journal Year: 2023, Volume and Issue: 33(1)

Published: Dec. 11, 2023

Abstract High resolution antibody–antigen structures provide critical insights into immune recognition and can inform therapeutic design. The challenges of experimental structural determination the diversity repertoire underscore necessity accurate computational tools for modeling complexes. Initial benchmarking showed that despite overall success in protein–protein complexes, AlphaFold AlphaFold‐Multimer have limited interactions. In this study, we performed a thorough analysis AlphaFold's performance on 427 nonredundant complex structures, identifying useful confidence metrics predicting model quality, features complexes associated with improved success. Notably, found latest version improves near‐native to over 30%, versus approximately 20% previous version, while increased sampling gives 50% With success, generate models many cases, additional training or other optimization may further improve performance.

Language: Английский

Citations

52