AlphaFold2 and its applications in the fields of biology and medicine DOI Creative Commons
Zhenyu Yang, Xiaoxi Zeng, Yi Zhao

et al.

Signal Transduction and Targeted Therapy, Journal Year: 2023, Volume and Issue: 8(1)

Published: March 14, 2023

Abstract AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction one the most challenging problems in computational biology and chemistry, has puzzled scientists for 50 years. The advent AF2 presents unprecedented progress protein attracted much attention. Subsequent release more than 200 million predicted further aroused great enthusiasm science community, especially fields medicine. thought to have a significant impact on structural research areas need information, such as drug discovery, design, function, et al. Though time not long since was developed, there are already quite few application studies medicine, many them having preliminarily proved potential AF2. To better understand promote its applications, we will this article summarize principle architecture well recipe success, particularly focus reviewing applications Limitations current also be discussed.

Language: Английский

Protein Sequence Analysis Using the MPI Bioinformatics Toolkit DOI Creative Commons
Felix Gabler,

Seung‐Zin Nam,

Sebastian Till

et al.

Current Protocols in Bioinformatics, Journal Year: 2020, Volume and Issue: 72(1)

Published: Dec. 1, 2020

The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) provides interactive access to a wide range of the best-performing bioinformatics tools and databases, including state-of-the-art protein sequence comparison methods HHblits HHpred. currently includes 35 external in-house tools, covering functionalities such as similarity searching, prediction features, classification. Due this breadth functionality, tight interconnection its constituent ease use, has become an important resource for biomedical research teaching analysis students in life sciences. In article, we provide detailed information on utilizing three most widely accessed within Toolkit: HHpred detection homologs, conjunction with MODELLER structure homology modeling, CLANS visualization relationships large datasets. © 2020 Authors. Basic Protocol 1: Sequence searching using Alternate Protocol: Pairwise Support Building custom multiple alignment PSI-BLAST forwarding it input 2: Calculation models 3: Cluster CLANS.

Language: Английский

Citations

681

ColabFold - Making protein folding accessible to all DOI Creative Commons
Milot Mirdita, Konstantin Schütze, Yoshitaka Moriwaki

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2021, Volume and Issue: unknown

Published: Aug. 15, 2021

ColabFold offers accelerated protein structure and complex predictions by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40 - 60× faster optimized model use allows predicting close to a thousand structures per day on server one GPU. Coupled Google Colaboratory, becomes free accessible platform for folding. is open-source software available at github.com/sokrypton/ColabFold . Its novel environmental databases are colabfold.mmseqs.com Contact [email protected] , [email protected] [email protected]

Language: Английский

Citations

555

Modeling aspects of the language of life through transfer-learning protein sequences DOI Creative Commons
Michael Heinzinger, Ahmed Elnaggar, Yu Wang

et al.

BMC Bioinformatics, Journal Year: 2019, Volume and Issue: 20(1)

Published: Dec. 1, 2019

Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning evolutionary information. However, some applications retrieving related proteins becoming too time-consuming. Additionally, information less powerful small families, e.g. the Dark Proteome. Both these problems are addressed by new methodology introduced here.We a novel way to represent sequences as continuous vectors (embeddings) using language model ELMo taken natural processing. By modeling sequences, effectively captured biophysical properties of life unlabeled big data (UniRef50). We refer embeddings SeqVec (Sequence-to-Vector) demonstrate their effectiveness training simple neural networks two different tasks. At per-residue level, secondary (Q3 = 79% ± 1, Q8 68% 1) regions with intrinsic disorder (MCC 0.59 0.03) were predicted significantly better than through one-hot encoding or Word2vec-like approaches. per-protein subcellular localization was in ten classes (Q10 membrane-bound distinguished water-soluble (Q2 87% 1). Although generated best predictions single no solution improved over existing method Nevertheless, our approach popular methods even did beat best. Thus, they prove condense underlying principles sequences. Overall, novelty speed: where lightning-fast HHblits needed on average about minutes generate target protein, created 0.03 s. As this speed-up independent size growing databases, provides highly scalable analysis proteomics, i.e. microbiome metaproteome analysis.Transfer-learning succeeded extract databases relevant various prediction modeled life, namely any features suggested textbooks methods. The exception information, however, that not available level sequence.

Language: Английский

Citations

512

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations DOI Creative Commons
Wei Zheng, Chengxin Zhang, Yang Li

et al.

Cell Reports Methods, Journal Year: 2021, Volume and Issue: 1(3), P. 100014 - 100014

Published: June 21, 2021

Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice number of non-homologous I-TASSER, which does not use contacts. When applied folding experiment on 8,266 Pfam families, successfully folded 4,162 domain including 504 folds are found PDB. Furthermore, it created correct 85% SARS-CoV-2 genome, despite quick mutation rate virus and sparse sequence profiles. The results demonstrated critical importance coupling whole-genome metagenome-based evolutionary information optimal structure simulations solving problem protein prediction.

Language: Английский

Citations

462

Clades of huge phages from across Earth’s ecosystems DOI Creative Commons
Basem Al-Shayeb, Rohan Sachdeva, Lin-Xing Chen

et al.

Nature, Journal Year: 2020, Volume and Issue: 578(7795), P. 425 - 431

Published: Feb. 12, 2020

Bacteriophages typically have small genomes

Language: Английский

Citations

445

DALI shines a light on remote homologs: One hundred discoveries DOI Open Access
Liisa Holm,

Aleksi Laiho,

Petri Törönen

et al.

Protein Science, Journal Year: 2022, Volume and Issue: 32(1)

Published: Nov. 24, 2022

Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis provides database searches and interactive visualization, including alignments annotated with secondary structure, protein families logos, 3D structure superimposition supported color-coded conservation. Here, we are using mine the AlphaFold Database version 1, which increased coverage of 20%. We found 100 homologous relationships hitherto unreported in current reference domains, Pfam 35.0. In particular, linked 35 domains unknown function (DUFs) previously characterized families, generating functional hypothesis can explored downstream biology studies. Other findings include gene fusions, tandem duplications, adjustments domain boundaries. evidence browsed interactively through live examples on DALI's website.

Language: Английский

Citations

372

Applying and improving AlphaFold at CASP14 DOI
John Jumper, Richard Evans, Alexander Pritzel

et al.

Proteins Structure Function and Bioinformatics, Journal Year: 2021, Volume and Issue: 89(12), P. 1711 - 1721

Published: Oct. 4, 2021

We describe the operation and improvement of AlphaFold, system that was entered by team AlphaFold2 to "human" category in 14th Critical Assessment Protein Structure Prediction (CASP14). The AlphaFold CASP14 is entirely different one CASP13. It used a novel end-to-end deep neural network trained produce protein structures from amino acid sequence, multiple sequence alignments, homologous proteins. In assessors' ranking summed z scores (>2.0), scored 244.0 compared 90.8 next best group. predictions made had median domain GDT_TS 92.4; this first time level average accuracy has been achieved during CASP, especially on more difficult Free Modeling targets, represents significant state art structure prediction. reported how run as human improved such it now achieves an equivalent performance without intervention, opening door highly accurate large-scale

Language: Английский

Citations

333

Generalized biomolecular modeling and design with RoseTTAFold All-Atom DOI
Rohith Krishna, Jue Wang, Woody Ahern

et al.

Science, Journal Year: 2024, Volume and Issue: 384(6693)

Published: March 7, 2024

Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids DNA bases with an atomic all other groups model assemblies that contain proteins, nucleic acids, small molecules, metals, covalent modifications, given their sequences chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion (RFdiffusionAA), builds structures around molecules. Starting from random distributions acid residues surrounding target designed experimentally validated, through crystallography binding measurements, proteins bind the cardiac disease therapeutic digoxigenin, enzymatic cofactor heme, light-harvesting molecule bilin.

Language: Английский

Citations

331

AlphaFold2 and the future of structural biology DOI
Patrick Cramer

Nature Structural & Molecular Biology, Journal Year: 2021, Volume and Issue: 28(9), P. 704 - 705

Published: Aug. 10, 2021

Language: Английский

Citations

303

PHROG: families of prokaryotic virus proteins clustered using remote homology DOI Creative Commons
Paul Terzian, Éric Olo Ndela, Clovis Galiez

et al.

NAR Genomics and Bioinformatics, Journal Year: 2021, Volume and Issue: 3(3)

Published: June 23, 2021

Abstract Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number different protein families encountered sequence heterogeneity each family. The recent increase sequenced viral genomes constitutes a great opportunity to gain new insights into this consequently urges development annotation resources help functional comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), library generated using clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses prokaryotes, 868 340 total 938 864 proteins were grouped 38 880 clusters that proved be 2-fold deeper than classical strategy BLAST-like similarity searches, yet remain homogeneous. Manual inspection similarities various databases led 5108 (containing 50.6 % dataset) with 705 terms, included 9 categories, specifically designed for viruses. Hopefully, will useful tool better annotate future prokaryotic sequences thus helping scientific community understand evolution ecology these

Language: Английский

Citations

288