Critical assessment of methods of protein structure prediction (CASP)—Round XV DOI Creative Commons

Andriy Kryshtafovych,

Torsten Schwede, Maya Topf

и другие.

Proteins Structure Function and Bioinformatics, Год журнала: 2023, Номер 91(12), С. 1539 - 1549

Опубликована: Ноя. 2, 2023

Abstract Computing protein structure from amino acid sequence information has been a long‐standing grand challenge. Critical assessment of prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation deep learning methods delivering accuracy comparable for many single proteins. There is an expectation that these will have much wider application in computational structural biology. Here we summarize results most recent experiment, CASP15, 2022, emphasis on new learning‐driven progress. Other papers special issue proteins provide more detailed analysis. For structures, AlphaFold2 method still superior other approaches, but there two points note. First, although was core all successful methods, wide variety implementation combination methods. Second, using standard protocol default parameters only produces highest quality result about thirds targets, extensive sampling required others. advance CASP enormous increase computed complexes, achieved by use overall do not fully match performance too, based perform best, again than defaults often required. Also note encouraging early compute ensembles macromolecular structures. Critically usability both derived estimates local global high quality, however interface regions slightly less reliable. CASP15 also included computation RNA structures first time. Here, classical approaches produced better agreement ones, limited. Also, time, protein–ligand area interest drug design. were ones. Many discussed conference, it clear continue advance.

Язык: Английский

Evolutionary-scale prediction of atomic level protein structure with a language model DOI Creative Commons
Zeming Lin, Halil Akin, Roshan Rao

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2022, Номер unknown

Опубликована: Июль 21, 2022

Abstract Artificial intelligence has the potential to open insight into structure of proteins at scale evolution. It only recently been possible extend protein prediction two hundred million cataloged proteins. Characterizing structures exponentially growing billions sequences revealed by large gene sequencing experiments would necessitate a break-through in speed folding. Here we show that direct inference from primary sequence using language model enables an order magnitude speed-up high resolution prediction. Leveraging models learn evolutionary patterns across millions sequences, train up 15B parameters, largest date. As are scaled they information three-dimensional individual atoms. This results is 60x faster than state-of-the-art while maintaining and accuracy. Building on this, present ESM Metage-nomic Atlas. first large-scale structural characterization metagenomic proteins, with more 617 structures. The atlas reveals 225 confidence predictions, including whose novel comparison experimentally determined structures, giving unprecedented view vast breadth diversity some least understood earth.

Язык: Английский

Процитировано

265

AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination DOI Creative Commons
Thomas C. Terwilliger, Dorothée Liebschner, Tristan I. Croll

и другие.

Nature Methods, Год журнала: 2023, Номер 21(1), С. 110 - 116

Опубликована: Ноя. 30, 2023

Abstract Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well can be expected to describe the a by comparing directly with experimental crystallographic maps. In many cases, matched maps remarkably closely. even very high-confidence differed from on global scale through distortion domain orientation, local in backbone side-chain conformation. We suggest considering exceptionally useful hypotheses. further that it is important consider confidence when interpreting carry out determination verify details, particularly those involve interactions included prediction.

Язык: Английский

Процитировано

168

SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2 DOI Creative Commons
Richard A. Stein, Hassane S. Mchaourab

PLoS Computational Biology, Год журнала: 2022, Номер 18(8), С. e1010483 - e1010483

Опубликована: Авг. 22, 2022

The unprecedented performance of Deepmind's Alphafold2 in predicting protein structure CASP XIV and the creation a database structures for multiple proteomes sequence repositories is reshaping structural biology. However, because this returns single structure, it brought into question Alphafold's ability to capture intrinsic conformational flexibility proteins. Here we present general approach drive model alternate conformations through simple manipulation alignment via silico mutagenesis. grounded hypothesis that must also encode heterogeneity, thus its rational will enable sample conformations. A systematic modeling pipeline benchmarked against canonical examples applied interrogate landscape membrane This work broadens applicability by generating be tested biologically, biochemically, biophysically, use structure-based drug design.

Язык: Английский

Процитировано

161

Can we predict T cell specificity with digital biology and machine learning? DOI Open Access
D. R. Hudson, Ricardo A. Fernandes, Mark Basham

и другие.

Nature reviews. Immunology, Год журнала: 2023, Номер 23(8), С. 511 - 521

Опубликована: Фев. 8, 2023

Язык: Английский

Процитировано

122

ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation DOI Creative Commons
Brennan Abanades, Guy Georges, Alexander Bujotzek

и другие.

Bioinformatics, Год журнала: 2022, Номер 38(7), С. 1877 - 1880

Опубликована: Янв. 27, 2022

Abstract Motivation Antibodies are a key component of the immune system and have been extensively used as biotherapeutics. Accurate knowledge their structure is central to understanding antigen-binding function. The area for antigen binding main structural variation in antibodies concentrated six complementarity determining regions (CDRs), with most important variable being CDR-H3 loop. sequence variability make it particularly challenging model. Recently deep learning methods offered step change our ability predict protein structures. Results In this work, we present ABlooper, an end-to-end equivariant learning-based CDR loop prediction tool. ABlooper rapidly predicts loops high accuracy provides confidence estimate each its predictions. On models Rosetta Antibody Benchmark, makes predictions average RMSD 2.49 Å, which drops 2.05 Å when considering only 75% confident Availability implementation https://github.com/oxpig/ABlooper. Supplementary information data available at Bioinformatics online.

Язык: Английский

Процитировано

113

Intrinsic protein disorder and conditional folding in AlphaFoldDB DOI Creative Commons
Damiano Piovesan, Alexander Miguel Monzón, Silvio C. E. Tosatto

и другие.

Protein Science, Год журнала: 2022, Номер 31(11)

Опубликована: Окт. 10, 2022

Abstract Intrinsically disordered regions (IDRs) defying the traditional protein structure–function paradigm have been difficult to analyze. The availability of accurate structure predictions on a large scale in AlphaFoldDB offers fresh perspective IDR prediction. Here, we establish three baselines for prediction from models based recent CAID dataset. Surprisingly, is highly competitive predicting both IDRs and conditionally folded binding regions, demonstrating plasticity disorder continuum.

Язык: Английский

Процитировано

101

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction DOI Creative Commons
Pascal Notin, Aaron W. Kollasch, Daniel P. Ritter

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Дек. 8, 2023

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease designing novel that can address our most pressing challenges climate, agriculture and healthcare. Despite a surge machine learning-based protein models tackle these questions, an assessment their respective benefits challenging due use distinct, often contrived, experimental datasets, variable performance across different families. Addressing requires scale. To end we introduce ProteinGym, large-scale holistic set benchmarks specifically designed for fitness prediction design. It encompasses both broad collection over 250 standardized deep mutational scanning assays, spanning millions mutated sequences, as well curated clinical datasets providing high-quality expert annotations about mutation effects. We devise robust evaluation framework combines metrics design, factors known limitations underlying methods, covers zero-shot supervised settings. report diverse 70 high-performing various subfields (eg., alignment-based, inverse folding) into unified benchmark suite. open source corresponding codebase, MSAs, structures, model predictions develop user-friendly website facilitates data access analysis.

Язык: Английский

Процитировано

100

Machine Learning-Guided Protein Engineering DOI Creative Commons
Petr Kouba, Pavel Kohout, Faraneh Haddadi

и другие.

ACS Catalysis, Год журнала: 2023, Номер 13(21), С. 13863 - 13895

Опубликована: Окт. 13, 2023

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid the discovery annotation of enzymes, as well suggesting beneficial mutations for improving known targets. The field protein is gathering steam, driven by recent success stories notable other areas. It already encompasses ambitious tasks such understanding predicting structure function, catalytic efficiency, enantioselectivity, dynamics, stability, solubility, aggregation, more. Nonetheless, still evolving, with many challenges overcome questions address. In this Perspective, we provide an overview ongoing trends domain, highlight case studies, examine current limitations learning-based We emphasize crucial importance thorough validation emerging models before their use rational design. present our opinions on fundamental problems outline potential directions future research.

Язык: Английский

Процитировано

98

Improved AlphaFold modeling with implicit experimental information DOI Creative Commons
Thomas C. Terwilliger, Billy K. Poon, Pavel V. Afonine

и другие.

Nature Methods, Год журнала: 2022, Номер 19(11), С. 1376 - 1382

Опубликована: Окт. 20, 2022

Abstract Machine-learning prediction algorithms such as AlphaFold and RoseTTAFold can create remarkably accurate protein models, but these models usually have some regions that are predicted with low confidence or poor accuracy. We hypothesized by implicitly including new experimental information a density map, greater portion of model could be accurately, this might synergistically improve parts the were not fully addressed either machine learning experiment alone. An iterative procedure was developed in which automatically rebuilt on basis maps used templates predictions. show improves beyond improvement obtained simple rebuilding guided data. This for modeling has been incorporated into an automated interpretation crystallographic electron cryo-microscopy maps.

Язык: Английский

Процитировано

97

Simulating 500 million years of evolution with a language model DOI
Thomas Hayes, Roshan Rao, Halil Akin

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Июль 2, 2024

Abstract More than three billion years of evolution have produced an image biology encoded into the space natural proteins. Here we show that language models trained on tokens generated by can act as evolutionary simulators to generate functional proteins are far away from known We present ESM3, a frontier multimodal generative model reasons over sequence, structure, and function ESM3 follow complex prompts combining its modalities is highly responsive biological alignment. prompted fluorescent with chain thought. Among generations synthesized, found bright protein at distance (58% identity) Similarly distant separated five hundred million evolution.

Язык: Английский

Процитировано

92