Importance of updated benchmark sets for statistically correct AlphaFold applications DOI Creative Commons
László Dobson, Gábor Tusnády, Péter Tompa

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 6, 2024

Abstract AlphaFold2 changed structural biology by providing high-quality structure predictions for all possible proteins. Since its inception, a plethora of applications were built on AlphaFold2, expediting discoveries in virtually areas related to protein science. In many cases, however, optimism seems have made scientists forget about data leakage, serious issue that needs be addressed when evaluating machine learning methods. Here we provide rigorous benchmark set can used broad range around AlphaFold2/3. Graphical abstract Key Points When building AlphaFold, should consider the possibility leakage between AlphaFold training and independent test their method BETA provides multiple datasets with structures sequences not during These diverse use cases The protocol was applied simple disordered prediction method, showing different parameters required optimize proteins

Language: Английский

DisProt in 2024: improving function annotation of intrinsically disordered proteins DOI Creative Commons
Maria Cristina Aspromonte, María Victoria Nugnes, Federica Quaglia

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(D1), P. D434 - D441

Published: Oct. 30, 2023

Abstract DisProt (URL: https://disprot.org) is the gold standard database for intrinsically disordered proteins and regions, providing valuable information about their functions. The latest version of brings significant advancements, including a broader representation functions an enhanced curation process. These improvements aim to increase both quality annotations coverage at sequence level. Higher has been achieved by adopting additional evidence codes. Quality improved systematically applying Minimum Information About Disorder Experiments (MIADE) principles reporting all details experimental setup that could potentially influence structural state protein. now includes new thematic datasets expanded adoption Gene Ontology terms, resulting in extensive functional repertoire which automatically propagated UniProtKB. Finally, we show DisProt's curated strongly correlate with disorder predictions inferred from AlphaFold2 pLDDT (predicted Local Distance Difference Test) confidence scores. This comparison highlights utility explaining apparent uncertainty certain well-defined predicted structures, often correspond folding-upon-binding fragments. Overall, serves as comprehensive resource, combining enhance our understanding implications.

Language: Английский

Citations

58

A Perspective on the Prospective Use of AI in Protein Structure Prediction DOI
Raphaëlle Versini,

Sujith Sritharan,

Burcu Aykaç Fas

et al.

Journal of Chemical Information and Modeling, Journal Year: 2023, Volume and Issue: 64(1), P. 26 - 41

Published: Dec. 21, 2023

AlphaFold2 (AF2) and RoseTTaFold (RF) have revolutionized structural biology, serving as highly reliable effective methods for predicting protein structures. This article explores their impact limitations, focusing on integration into experimental pipelines application in diverse classes, including membrane proteins, intrinsically disordered proteins (IDPs), oligomers. In pipelines, AF2 models help X-ray crystallography resolving the phase problem, while complementarity with mass spectrometry NMR data enhances structure determination flexibility prediction. Predicting of remains challenging both RF due to difficulties capturing conformational ensembles interactions membrane. Improvements incorporating membrane-specific features effect mutations are crucial. For AF2's confidence score (pLDDT) serves a competitive disorder predictor, but integrative approaches molecular dynamics (MD) simulations or hydrophobic cluster analyses advocated accurate representation. show promising results oligomeric models, outperforming traditional docking methods, AlphaFold-Multimer showing improved performance. However, some caveats remain particular proteins. Real-life examples demonstrate predictive capabilities unknown structures, should be evaluated agreement data. Furthermore, can used complementarily MD simulations. this Perspective, we propose "wish list" improving deep-learning-based folding prediction using constraints modifying binding partners post-translational modifications. Additionally, meta-tool ranking suggesting composite is suggested, driving future advancements rapidly evolving field.

Language: Английский

Citations

16

MOBIDB in 2025: integrating ensemble properties and function annotations for intrinsically disordered proteins DOI Creative Commons
Damiano Piovesan, Alessio Del Conte, Mahta Mehdiabadi

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 53(D1), P. D495 - D503

Published: Oct. 29, 2024

The MobiDB database (URL: https://mobidb.org/) aims to provide structural and functional information about intrinsic protein disorder, aggregating annotations from the literature, experimental data, predictions for all known sequences. Here, we describe improvements made our resource capture more information, simplify access aggregated increase documentation of features. Compared previous release, underlying pipeline modules were updated. prediction module is ten times faster can detect if a predicted disordered region structurally extended or compact. PDB component now able process large cryo-EM structures extending number processed entries. entry page has been restyled highlight aspects disorder graphical have completely reimplemented better flexibility rendering. server improved optimise bulk downloads. Annotation provenance standardised by adopting ECO terms. Finally, propagated function (IDPO GO terms) DisProt exploiting sequence similarity embeddings. These improvements, along with addition comprehensive training material, offer intuitive interface novel knowledge disorder.

Language: Английский

Citations

6

Predicting Conformational Ensembles of Intrinsically Disordered Proteins: From Molecular Dynamics to Machine Learning DOI
Jana Aupič, Pavlína Pokorná, Sharon Ruthstein

et al.

The Journal of Physical Chemistry Letters, Journal Year: 2024, Volume and Issue: 15(32), P. 8177 - 8186

Published: Aug. 2, 2024

Intrinsically disordered proteins and regions (IDP/IDRs) are ubiquitous across all domains of life. Characterized by a lack stable tertiary structure, IDP/IDRs populate diverse set transiently formed structural states that can promiscuously adapt upon binding with specific interaction partners and/or certain alterations in environmental conditions. This malleability is foundational for their role as tunable hubs core cellular processes such signaling, transcription, translation. Tracing the conformational ensemble an IDP/IDR its perturbation response to regulatory cues thus paramount illuminating function. However, heterogeneity poses several challenges. Here, we review experimental computational methods devised disentangle landscape IDP/IDRs, highlighting recent advances permit proteome-wide scans conformations. We briefly evaluate selected using N-terminal human copper transporter 1 test case outline further challenges prediction.

Language: Английский

Citations

5

Regularly updated benchmark sets for statistically correct evaluations of AlphaFold applications DOI Creative Commons
László Dobson, Gábor Tusnády, Péter Tompa

et al.

Briefings in Bioinformatics, Journal Year: 2025, Volume and Issue: 26(2)

Published: March 1, 2025

Abstract AlphaFold2 changed structural biology by providing high-quality structure predictions for all possible proteins. Since its inception, a plethora of applications were built on AlphaFold2, expediting discoveries in virtually areas related to protein science. In many cases, however, optimism seems have made scientists forget about data leakage, serious issue that needs be addressed when evaluating machine learning methods. Here we provide rigorous benchmark set can used broad range around AlphaFold2/3.

Language: Английский

Citations

0

The 2024 Nucleic Acids Research database issue and the online molecular biology database collection DOI Creative Commons
Daniel J. Rigden, Xosé M. Fernández

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(D1), P. D1 - D9

Published: Nov. 30, 2023

Abstract The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 reporting on new databases 83 updates resources previously published in the Issue. Updates most recently elsewhere account for a further seven. acid include NAKB structural information Genbank, ENA, GEO, Tarbase JASPAR. Issue's Breakthrough Article concerns NMPFamsDB novel prokaryotic protein families AlphaFold Protein Structure Database has an important update. Metabolism is covered by Reactome, Wikipathways Metabolights. Microbes RefSeq, UNITE, SPIRE P10K; viruses ViralZone PhageScope. Medically-oriented familiar COSMIC, Drugbank TTD. Genomics-related Ensembl, UCSC Genome Browser Monarch. New arrivals cover plant imaging (OPIA PlantPAD) crop plants (SoyMD, TCOD CropGS-Hub). entire Issue freely available online website (https://academic.oup.com/nar). Over last year NAR Molecular Biology Collection been updated, reviewing 1060 entries, adding 97 eliminating 388 discontinued URLs bringing current total to 1959 databases. It at http://www.oxfordjournals.org/nar/database/c/.

Language: Английский

Citations

10

The Origin of Discrepancies between Predictions and Annotations in Intrinsically Disordered Proteins DOI Creative Commons
Mátyás Pajkos, Gábor Erdős, Zsuzsanna Dosztányi

et al.

Biomolecules, Journal Year: 2023, Volume and Issue: 13(10), P. 1442 - 1442

Published: Sept. 25, 2023

Disorder prediction methods that can discriminate between ordered and disordered regions have contributed fundamentally to our understanding of the properties prevalence intrinsically proteins (IDPs) in proteomes as well their functional roles. However, a recent large-scale assessment performance these indicated there is still room for further improvements, necessitating novel approaches understand strengths weaknesses individual methods. In this study, we compared two methods, IUPred disorder prediction, based on pLDDT scores derived from AlphaFold2 (AF2) models. We evaluated using dataset DisProt database, consisting experimentally characterized subsets associated with diverse experimental functions. AF2 provided consistent predictions 79% cases long regions; however, 15% cases, they both suggested order disagreement annotations. These discrepancies arose primarily due weak support, presence intermediate states, or context-dependent behavior, such binding-induced transitions. Furthermore, tended predict helical high within segments, while had limitations identifying linker regions. results provide valuable insights into inherent potential biases

Language: Английский

Citations

6

Best practices for the manual curation of intrinsically disordered proteins in DisProt DOI Creative Commons
Federica Quaglia, Anastasia Chasapi, María Victoria Nugnes

et al.

Database, Journal Year: 2024, Volume and Issue: 2024

Published: Jan. 1, 2024

Abstract The DisProt database is a resource containing manually curated data on experimentally validated intrinsically disordered proteins (IDPs) and regions (IDRs) from the literature. Developed in 2005, its primary goal was to collect structural functional information into that lack fixed three-dimensional structure. Today, has evolved major repository not only collects experimental but also contributes our understanding of IDPs/IDRs roles various biological processes, such as autophagy or life cycle mechanisms viruses their involvement diseases (such cancer neurodevelopmental disorders). offers detailed states IDPs/IDRs, including state transitions, interactions functions, all provided annotations. One central activities meticulous curation For this reason, ensure every expert volunteer curator possesses requisite knowledge for evaluation, collection integration, training courses materials are available. However, biocuration guidelines concur importance developing robust provide critical about consistency acquisition.This guideline aims both biocurators external users with best practices curating IDPs IDRs DisProt. It describes step literature process provides use cases IDP within Database URL: https://disprot.org/

Language: Английский

Citations

1

Are Protein Conformational Ensembles in Agreement with Experimental Data? A Geometrical Interpretation of the Problem DOI
Letizia Fiorucci, Marco Schiavina, Isabella C. Felli

et al.

Journal of Chemical Information and Modeling, Journal Year: 2024, Volume and Issue: 64(14), P. 5392 - 5401

Published: July 3, 2024

The conformational variability of biological macromolecules can play an important role in their function. Therefore, understanding is expected to be key for predicting the behavior a particular molecule context organism-wide studies. Several experimental methods have been developed and deployed accessing this information, computational are continuously updated profitable integration different sources. outcome endeavor ensembles, which may vary significantly properties composition when ensemble reconstruction used, raises issue comparing predicted ensembles against data. In article, we discuss geometrical formulation provide framework agreement prediction observations.

Language: Английский

Citations

0

Importance of updated benchmark sets for statistically correct AlphaFold applications DOI Creative Commons
László Dobson, Gábor Tusnády, Péter Tompa

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 6, 2024

Abstract AlphaFold2 changed structural biology by providing high-quality structure predictions for all possible proteins. Since its inception, a plethora of applications were built on AlphaFold2, expediting discoveries in virtually areas related to protein science. In many cases, however, optimism seems have made scientists forget about data leakage, serious issue that needs be addressed when evaluating machine learning methods. Here we provide rigorous benchmark set can used broad range around AlphaFold2/3. Graphical abstract Key Points When building AlphaFold, should consider the possibility leakage between AlphaFold training and independent test their method BETA provides multiple datasets with structures sequences not during These diverse use cases The protocol was applied simple disordered prediction method, showing different parameters required optimize proteins

Language: Английский

Citations

0