AlphaFold2's training set powers its predictions of fold-switched conformations DOI Creative Commons
Joseph W. Schafer, Lauren L. Porter

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Окт. 15, 2024

ABSTRACT AlphaFold2 (AF2), a deep-learning based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple conformations. In some cases, AF2 successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel secondary tertiary in response cellular stimuli. Whether learned enough folding principles reliably outside its training set is unclear. Here, we address this question by assessing whether CFold–an implementation the network trained on more limited subset experimentally determined structures– eight fold switchers six families. Previous work suggests these memorizing them during training. Unlike AF2, CFold’s contains only one Despite sampling 1300-4400 structures/protein with various sequence techniques, CFold structure accurately high confidence while also generating inconsistent higher confidence. Though results indicate AF2’s current success predicting stems largely data, pruning technique suggest developments could lead reliable generative future.

Язык: Английский

AlphaFold predictions of fold-switched conformations are driven by structure memorization DOI Creative Commons
Devlina Chakravarty, Joseph W. Schafer,

Ethan A. Chen

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Авг. 24, 2024

Abstract Recent work suggests that AlphaFold (AF)–a deep learning-based model can accurately infer protein structure from sequence–may discern important features of folded energy landscapes, defined by the diversity and frequency different conformations in state. Here, we test limits its predictive power on fold-switching proteins, which assume two structures with regions distinct secondary and/or tertiary structure. We find (1) AF is a weak predictor fold switching (2) some successes result memorization training-set rather than learned energetics. Combining >280,000 models several implementations AF2 AF3, 35% success rate was achieved for switchers likely AF’s training sets. AF2’s confidence metrics selected against consistent experimentally determined failed to discriminate between low high conformations. Further, captured only one out seven confirmed outside sets despite extensive sampling an additional ~280,000 models. Several observations indicate has memorized structural information during training, AF3 misassigns coevolutionary restraints. These limitations constrain scope successful predictions, highlighting need physically based methods readily predict multiple

Язык: Английский

Процитировано

35

Proteins with alternative folds reveal blind spots in AlphaFold-based protein structure prediction DOI Creative Commons
Devlina Chakravarty, Myeongsang Lee, Lauren L. Porter

и другие.

Current Opinion in Structural Biology, Год журнала: 2025, Номер 90, С. 102973 - 102973

Опубликована: Янв. 5, 2025

In recent years, advances in artificial intelligence (AI) have transformed structural biology, particularly protein structure prediction. Though AI-based methods, such as AlphaFold (AF), often predict single conformations of proteins with high accuracy and confidence, predictions alternative folds are inaccurate, low-confidence, or simply not predicted at all. Here, we review three blind spots that reveal about AF-based First, assume distinct from their training-set homologs can be mispredicted. Second, AF overrelies on its training set to conformations. Third, degeneracies pairwise representations lead high-confidence inconsistent experiment. These weaknesses suggest approaches more reliably.

Язык: Английский

Процитировано

10

Information Bottleneck Approach for Markov Model Construction DOI
Dedi Wang, Yunrui Qiu, Eric R. Beyerle

и другие.

Journal of Chemical Theory and Computation, Год журнала: 2024, Номер 20(12), С. 5352 - 5367

Опубликована: Июнь 11, 2024

Markov state models (MSMs) have proven valuable in studying dynamics of protein conformational changes via statistical analysis molecular (MD) simulations. In MSMs, the complex configuration space is coarse-grained into states, with modeled by a series Markovian transitions among these states at discrete lag times. Constructing model specific time necessitates defining that circumvent significant internal energy barriers, enabling relaxation within time. This process effectively coarse-grains and space, integrating out rapid motions metastable states. Thus, MSMs possess multi-resolution nature, where granularity can be adjusted according to time-resolution, offering flexibility capturing system dynamics. work introduces continuous embedding approach for conformations using predictive information bottleneck (SPIB), framework unifies dimensionality reduction partitioning continuous, machine learned basis set. Without explicit optimization VAMP-based scores, SPIB demonstrates state-of-the-art performance identifying slow dynamical processes constructing models. Through applications well-validated mini-proteins, showcases unique advantages compared competing methods. It autonomously self-consistently adjusts number based on specified minimal resolution, eliminating need manual tuning. While maintaining efficacy properties, excels accurately distinguishing numerous well-populated macrostates. contrasts existing methods, which often emphasize expense incorporating sparsely populated Furthermore, SPIB's ability learn low-dimensional underlying enhances interpretation dynamic pathways. With benefits, we propose as an easy-to-implement methodology end-to-end construction.

Язык: Английский

Процитировано

12

Sequence clustering confounds AlphaFold2 DOI
Joseph W. Schafer, Myeongsang Lee, Devlina Chakravarty

и другие.

Nature, Год журнала: 2025, Номер 638(8051), С. E8 - E12

Опубликована: Фев. 19, 2025

Язык: Английский

Процитировано

2

AlphaFold2 has more to learn about protein energy landscapes DOI Open Access
Devlina Chakravarty, Joseph W. Schafer,

Ethan A. Chen

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Дек. 13, 2023

Abstract Recent work suggests that AlphaFold2 (AF2)–a deep learning-based model can accurately infer protein structure from sequence–may discern important features of folded energy landscapes, defined by the diversity and frequency different conformations in state. Here, we test limits its predictive power on fold-switching proteins, which assume two structures with regions distinct secondary and/or tertiary structure. Using several implementations AF2, including published enhanced sampling approaches, generated >280,000 models 93 proteins whose experimentally determined were likely AF2’s training set. Combining all models, AF2 predicted fold switching a modest success rate ∼25%, indicating it does not readily sample both characterized most switchers. Further, confidence metrics selected against consistent favor inconsistent models. Accordingly, these metrics–though suggested to evaluate energetics reliably–did discriminate between low high states proteins. We then evaluated performance seven outside set, generating >159,000 total. Fold was one targets moderate confidence. demonstrated no ability predict alternative newly discovered without homologs set These results indicate has more learn about underlying ensembles highlight need for further developments methods multiple conformations.

Язык: Английский

Процитировано

20

Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases DOI Creative Commons

Joan Gizzio,

Abhishek Thakur, Allan Haldane

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Авг. 2, 2024

Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches – tyrosine (TKs) from serine/threonine (STKs). Using a co-variation Potts statistical energy model we previously concluded TK catalytic domains more likely than STKs to adopt an inactive conformation activation loop in autoinhibitory folded conformation, due intrinsic effects. Here investigate structural basis for this phenomenon by integrating sequence-based structure-based dynamics (MD) determine effects of mutations on free difference between active and conformations, using thermodynamic cycle involving many (n = 108) protein-mutation perturbation (FEP) simulations conformations. The results consistent support hypothesis DFG-out Activation Loop Folded, is functional regulatory state has been stabilized TKs relative over course their evolution via accumulation residue substitutions facilitate distinct substrate binding modes trans additional regulation cis TKs. In study, authors identify mechanism conformational preferences vs suggest kinase function can explain these differences.

Язык: Английский

Процитировано

9

A comprehensive exploration of the druggable conformational space of protein kinases using AI-predicted structures DOI Creative Commons
Noah B. Herrington, Yan Chak Li, David Stein

и другие.

PLoS Computational Biology, Год журнала: 2024, Номер 20(7), С. e1012302 - e1012302

Опубликована: Июль 24, 2024

Protein kinase function and interactions with drugs are controlled in part by the movement of DFG ɑC-Helix motifs that related to catalytic activity kinase. Small molecule ligands elicit therapeutic effects distinct selectivity profiles residence times often depend on active or inactive conformation(s) they bind. Modern AI-based structural modeling methods have potential expand upon limited availability experimentally determined structures states. Here, we first explored conformational space kinases PDB models generated AlphaFold2 (AF2) ESMFold, two prominent protein structure prediction methods. Our investigation AF2’s ability explore diversity kinome at various multiple sequence alignment (MSA) depths showed a bias within predicted DFG-in conformations, particularly those motif, based their overabundance PDB. We demonstrate predicting using AF2 lower MSA these alternative conformations more extensively, including identifying previously unobserved for 398 kinases. Ligand enrichment analyses 23 that, average, docked distinguished between molecules decoys better than random (average AUC (avgAUC) 64.58), but select perform well (e.g., avgAUCs PTK2 JAK2 were 79.28 80.16, respectively). Further analysis explained ligand discrepancy low- high-performing as binding site occlusions would preclude docking. The overall results our suggested although uncharted regions exhibited scores suitable rational drug discovery, rigorous refinement is likely still necessary discovery campaigns.

Язык: Английский

Процитировано

7

Modeling Boltzmann-weighted structural ensembles of proteins using artificial intelligence–based methods DOI Creative Commons
Akashnathan Aranganathan, Xinyu Gu, Dedi Wang

и другие.

Current Opinion in Structural Biology, Год журнала: 2025, Номер 91, С. 103000 - 103000

Опубликована: Фев. 8, 2025

Язык: Английский

Процитировано

1

Investigating the conformational landscape of AlphaFold2-predicted protein kinase structures DOI Creative Commons

Carmen Al-Masri,

Francesco Trozzi, Shu-Hang Lin

и другие.

Bioinformatics Advances, Год журнала: 2023, Номер 3(1)

Опубликована: Янв. 1, 2023

Protein kinases are a family of signaling proteins, crucial for maintaining cellular homeostasis. When dysregulated, drive the pathogenesis several diseases, and thus one largest target categories drug discovery. Kinase activity is tightly controlled by switching through active inactive conformations in their catalytic domain. inhibitors have been designed to engage specific conformational states, where each conformation presents unique physico-chemical environment therapeutic intervention. Thus, modeling across can enable design novel optimally selective kinase drugs. Due recent success AlphaFold2 accurately predicting 3D structure proteins based on sequence, we investigated landscape protein as modeled AlphaFold2. We observed that able model kinome, however, certain only families. Furthermore, show per residue predicted local distance difference test capture information describing structural flexibility kinases. Finally, evaluated docking performance structures enriching known ligands. Taken together, see an opportunity leverage models structure-based discovery against pharmacologically relevant states.

Язык: Английский

Процитировано

17

ColabFold predicts alternative protein structures from single sequences, coevolution unnecessary for AF-cluster DOI Open Access
Lauren L. Porter, Devlina Chakravarty, Joseph W. Schafer

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Ноя. 22, 2023

Abstract Though typically associated with a single folded state, globular proteins are dynamic and often assume alternative or transient structures important for their functions 1,2 . Wayment-Steele, et al. steered ColabFold 3 to predict of several using method they call AF-cluster 4 They propose that “enables sample alternate states known metamorphic high confidence” by first clustering multiple sequence alignments (MSAs) in way “deconvolves” coevolutionary information specific different conformations then these clusters as input ColabFold. Contrary this Coevolution Assumption, clustered MSAs not needed make predictions. Rather, can be predicted from sequences and/or similarity, indicating is unnecessary predictive success may used at all. These results suggest AF-cluster’s scope likely limited distinct-yet-homologous within ColabFold’s training set.

Язык: Английский

Процитировано

14