Critical assessment of methods of protein structure prediction (CASP)—Round XV DOI Creative Commons

Andriy Kryshtafovych,

Torsten Schwede, Maya Topf

et al.

Proteins Structure Function and Bioinformatics, Journal Year: 2023, Volume and Issue: 91(12), P. 1539 - 1549

Published: Nov. 2, 2023

Abstract Computing protein structure from amino acid sequence information has been a long‐standing grand challenge. Critical assessment of prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation deep learning methods delivering accuracy comparable for many single proteins. There is an expectation that these will have much wider application in computational structural biology. Here we summarize results most recent experiment, CASP15, 2022, emphasis on new learning‐driven progress. Other papers special issue proteins provide more detailed analysis. For structures, AlphaFold2 method still superior other approaches, but there two points note. First, although was core all successful methods, wide variety implementation combination methods. Second, using standard protocol default parameters only produces highest quality result about thirds targets, extensive sampling required others. advance CASP enormous increase computed complexes, achieved by use overall do not fully match performance too, based perform best, again than defaults often required. Also note encouraging early compute ensembles macromolecular structures. Critically usability both derived estimates local global high quality, however interface regions slightly less reliable. CASP15 also included computation RNA structures first time. Here, classical approaches produced better agreement ones, limited. Also, time, protein–ligand area interest drug design. were ones. Many discussed conference, it clear continue advance.

Language: Английский

Ankh ☥: Optimized Protein Language Model Unlocks General-Purpose Modelling DOI Creative Commons
Ahmed Elnaggar,

Hazem Essam,

Wafaa Salah-Eldin

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 18, 2023

Abstract As opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization. Although the proportionality between model size and richness of its learned representations is validated, prioritize accessibility pursue a path data-efficient, cost-reduced, knowledge-guided Through over twenty experiments ranging from masking, architecture, pre-training data, derive insights experimentation into building that interprets life, optimally. We present Ankh, first general-purpose PLM trained on Google’s TPU-v4 surpassing state-of-the-art with fewer parameters (<10% for pre-training, <7% inference, <30% embedding dimension). provide representative range structure function benchmarks where Ankh excels. further variant generation analysis High-N One-N input data scales succeeds in learning evolutionary conservation-mutation trends introducing functional diversity while retaining key structural-functional characteristics. dedicate our work promoting research innovation attainable resources.

Language: Английский

Citations

78

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms DOI Creative Commons
Nicola Bordin, Ian Sillitoe,

Vamsi Nallapareddy

et al.

Communications Biology, Journal Year: 2023, Volume and Issue: 6(1)

Published: Feb. 8, 2023

Abstract Deep-learning (DL) methods like DeepMind’s AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL for structural comparison and classification. Of ~370,000 models, 92% can be assigned 3253 superfamilies our CATH domain superfamily The remaining cluster into 2367 putative superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies further unusual features. Only 25 could confirmed. Although most map existing superfamilies, domains expand by 67% increases the number unique ‘global’ folds 36% will provide valuable insights function relationships. CATH-Assign harness huge expansion data provided DeepMind rationalise evolutionary changes driving functional divergence.

Language: Английский

Citations

76

Bilingual Language Model for Protein Sequence and Structure DOI Creative Commons
Michael Heinzinger, Konstantin Weißenow, Joaquin Gomez Sanchez

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 25, 2023

Abstract Adapting large language models (LLMs) to protein sequences spawned the development of powerful (pLMs). Concurrently, AlphaFold2 broke through in structure prediction. Now we can systematically and comprehensively explore dual nature proteins that act exist as three-dimensional (3D) machines evolve linear strings one-dimensional (1D) sequences. Here, leverage pLMs simultaneously model both modalities by combining 1D with 3D a single model. We encode structures token using 3Di-alphabet introduced 3D-alignment method Foldseek . This new foundation pLM extracts features patterns resulting “structure-sequence” representation. Toward this end, built non-redundant dataset from AlphaFoldDB fine-tuned an existing (ProtT5) translate between 3Di amino acid As proof-of-concept for our novel approach, dubbed Protein structure-sequence T5 ( ProstT5 ), showed improved performance subsequent prediction tasks, “inverse folding”, namely generation adopting given structural scaffold (“fold”). Our work showcased potential tap into information-rich revolution fueled AlphaFold2. paves way develop tools integrating vast resource predictions, opens research avenues post-AlphaFold2 era. is freely available all at https://github.com/mheinzinger/ProstT5

Language: Английский

Citations

66

Simulating 500 million years of evolution with a language model DOI
Thomas Hayes, Roshan Rao, Halil Akin

et al.

Science, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 16, 2025

More than three billion years of evolution have produced an image biology encoded into the space natural proteins. Here we show that language models trained at scale on evolutionary data can generate functional proteins are far away from known We present ESM3, a frontier multimodal generative model reasons over sequence, structure, and function ESM3 follow complex prompts combining its modalities is highly responsive to alignment improve fidelity. prompted fluorescent Among generations synthesized, found bright protein distance (58% sequence identity) proteins, which estimate equivalent simulating five hundred million evolution.

Language: Английский

Citations

58

Critical assessment of methods of protein structure prediction (CASP)—Round XV DOI Creative Commons

Andriy Kryshtafovych,

Torsten Schwede, Maya Topf

et al.

Proteins Structure Function and Bioinformatics, Journal Year: 2023, Volume and Issue: 91(12), P. 1539 - 1549

Published: Nov. 2, 2023

Abstract Computing protein structure from amino acid sequence information has been a long‐standing grand challenge. Critical assessment of prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation deep learning methods delivering accuracy comparable for many single proteins. There is an expectation that these will have much wider application in computational structural biology. Here we summarize results most recent experiment, CASP15, 2022, emphasis on new learning‐driven progress. Other papers special issue proteins provide more detailed analysis. For structures, AlphaFold2 method still superior other approaches, but there two points note. First, although was core all successful methods, wide variety implementation combination methods. Second, using standard protocol default parameters only produces highest quality result about thirds targets, extensive sampling required others. advance CASP enormous increase computed complexes, achieved by use overall do not fully match performance too, based perform best, again than defaults often required. Also note encouraging early compute ensembles macromolecular structures. Critically usability both derived estimates local global high quality, however interface regions slightly less reliable. CASP15 also included computation RNA structures first time. Here, classical approaches produced better agreement ones, limited. Also, time, protein–ligand area interest drug design. were ones. Many discussed conference, it clear continue advance.

Language: Английский

Citations

57