On the variation of structural divergence among residues in enzyme evolution DOI Open Access
Julián Echave, Mathilde Carpentier

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 23, 2024

A bstract Structural divergence varies among protein residues. Unlike the classic problem of substitution rate variation, this structural variation has been largely ignored. Here we show that in enzymes increases with both residue flexibility and distance from active site. Although these factors are correlated, demonstrate through modelling pattern arises two independent types constraints, non-functional functional. Their relative importance across enzyme families: as functional constraints increase 4% to 85%, decrease 96% 15%, reshaping pattern. This analysis overturns accepted views evolution: First, evolutionary thought mirror dynamics generally, but similarity exists only when dominate. Second, site conservation attributed alone, it stems their location rigid regions where high.

Language: Английский

Bilingual Language Model for Protein Sequence and Structure DOI Creative Commons
Michael Heinzinger, Konstantin Weißenow, Joaquin Gomez Sanchez

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 25, 2023

Abstract Adapting large language models (LLMs) to protein sequences spawned the development of powerful (pLMs). Concurrently, AlphaFold2 broke through in structure prediction. Now we can systematically and comprehensively explore dual nature proteins that act exist as three-dimensional (3D) machines evolve linear strings one-dimensional (1D) sequences. Here, leverage pLMs simultaneously model both modalities by combining 1D with 3D a single model. We encode structures token using 3Di-alphabet introduced 3D-alignment method Foldseek . This new foundation pLM extracts features patterns resulting “structure-sequence” representation. Toward this end, built non-redundant dataset from AlphaFoldDB fine-tuned an existing (ProtT5) translate between 3Di amino acid As proof-of-concept for our novel approach, dubbed Protein structure-sequence T5 ( ProstT5 ), showed improved performance subsequent prediction tasks, “inverse folding”, namely generation adopting given structural scaffold (“fold”). Our work showcased potential tap into information-rich revolution fueled AlphaFold2. paves way develop tools integrating vast resource predictions, opens research avenues post-AlphaFold2 era. is freely available all at https://github.com/mheinzinger/ProstT5

Language: Английский

Citations

65

Bilingual language model for protein sequence and structure DOI Creative Commons
Michael Heinzinger, Konstantin Weißenow, Joaquin Gomez Sanchez

et al.

NAR Genomics and Bioinformatics, Journal Year: 2024, Volume and Issue: 6(4)

Published: Sept. 28, 2024

Adapting language models to protein sequences spawned the development of powerful (pLMs). Concurrently, AlphaFold2 broke through in structure prediction. Now we can systematically and comprehensively explore dual nature proteins that act exist as three-dimensional (3D) machines evolve linear strings one-dimensional (1D) sequences. Here, leverage pLMs simultaneously model both modalities a single model. We encode structures token using 3Di-alphabet introduced by 3D-alignment method

Language: Английский

Citations

27

Viroid-like colonists of human microbiomes DOI Creative Commons
Ivan N. Zheludev, R. C. Edgar, María José López-Galiano

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 21, 2024

Here, we describe the "Obelisks," a previously unrecognised class of viroid-like elements that first identified in human gut metatranscriptomic data. "Obelisks" share several properties: (i) apparently circular RNA ~1kb genome assemblies, (ii) predicted rod-like secondary structures encompassing entire genome, and (iii) open reading frames coding for novel protein superfamily, which call "Oblins". We find Obelisks form their own distinct phylogenetic group with no detectable sequence or structural similarity to known biological agents. Further, are prevalent tested microbiome metatranscriptomes representatives detected ~7% analysed stool (29/440) ~50% oral (17/32). Obelisk compositions appear differ between anatomic sites capable persisting individuals, continued presence over >300 days observed one case. Large scale searches 29,959 (clustered at 90% nucleotide identity), examples from all seven continents diverse ecological niches. From this search, subset code Obelisk-specific variants hammerhead type-III self-cleaving ribozyme. Lastly, case bacterial species (Streptococcus sanguinis) defined laboratory strains harboured specific population. As such, comprise RNAs have colonised, gone unnoticed in, human, global microbiomes.

Language: Английский

Citations

19

Multiple Protein Structure Alignment at Scale with FoldMason DOI Creative Commons
Cameron L. M. Gilchrist, Milot Mirdita, Martin Steinegger

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 1, 2024

Abstract Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended our repository of available proteins structures, requiring fast and accurate MSTA methods. Here, we introduce FoldMason, a progressive method that leverages the alphabet from Foldseek, pairwise aligner, hundreds thousands protein exceeding quality state-of-the-art methods, while two orders magnitudes faster than other FoldMason computes confidence scores, offers interactive visualizations, provides speed accuracy large-scale analysis in era prediction. Using Flaviviridae glycoproteins, demonstrate how FoldMason’s MSTAs support phylogenetic below twilight zone. free open-source software: foldmason.foldseek.com webserver: search.foldseek.com/foldmason .

Language: Английский

Citations

16

Mapping glycoprotein structure reveals Flaviviridae evolutionary history DOI Creative Commons
Jonathon C.O. Mifsud, Spyros Lytras, Michael R. Oliver

et al.

Nature, Journal Year: 2024, Volume and Issue: 633(8030), P. 695 - 703

Published: Sept. 4, 2024

Language: Английский

Citations

16

Viroid-like colonists of human microbiomes DOI
Ivan N. Zheludev, R. C. Edgar, María José López-Galiano

et al.

Cell, Journal Year: 2024, Volume and Issue: 187(23), P. 6521 - 6536.e18

Published: Oct. 31, 2024

Language: Английский

Citations

15

The structural landscape and diversity of Pyricularia oryzae MAX effectors revisited DOI Creative Commons
Mounia Lahfa, Philippe Barthe, Karine de Guillen

et al.

PLoS Pathogens, Journal Year: 2024, Volume and Issue: 20(5), P. e1012176 - e1012176

Published: May 6, 2024

Magnaporthe AVRs and ToxB-like (MAX) effectors constitute a family of secreted virulence proteins in the fungus Pyricularia oryzae (syn . oryzae) , which causes blast disease on numerous cereals grasses. In spite high sequence divergence, MAX share common fold characterized by ß-sandwich core stabilized conserved disulfide bond. this study, we investigated structural landscape diversity within effector repertoire P Combining experimental protein structure determination silico modeling validated presence domain 77 out 94 groups orthologs (OG) identified previous population genomic study. Four novel structures determined NMR were remarkably good agreement with AlphaFold2 (AF2) predictions. Based comparison AF2-generated 3D models propose classification superfamily 20 that vary canonical fold, bond patterns, additional secondary N- C-terminal extensions. About one-third members remain singletons, without strong relationship to other effectors. Analysis surface properties AF2 also highlights variability at level, potentially reflecting wide their functions host targets.

Language: Английский

Citations

8

A fast approach for structural and evolutionary analysis based on energetic profile protein comparison DOI Creative Commons
Peyman Choopanian, Jaan‐Olle Andressoo, Mehdi Mirzaie

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: March 6, 2025

In structural bioinformatics, the efficiency of predicting protein similarity, function, and evolutionary relationships is crucial. Our approach proposed herein leverages energy profiles derived from a knowledge-based potential, deviating traditional methods relying on alignment or atomic distances. This method assigns unique to individual proteins, facilitating rapid comparative analysis for both similarities across various hierarchical levels. study demonstrates that contain substantial information about structure at class, fold, superfamily, family Notably, these accurately distinguish proteins species, illustrated by classification coronavirus spike glycoproteins bacteriocin proteins. Introducing separation measure based profile our shows significant correlation with network-based approach, emphasizing potential as efficient predictors drug combinations faster computational requirements. key insight sequence-based strongly correlates structure-derived energy, enabling comparisons solely sequences.

Language: Английский

Citations

0

Viro3D: a comprehensive database of virus protein structure predictions DOI Creative Commons
Ulad Litvin, Spyros Lytras, Andrew Jack

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 20, 2024

Abstract Viruses are intracellular parasites of organisms from all domains life. They infect and cause disease in humans, animals plants but also play crucial roles the ecology microbial communities. Tolerance to genetic change, high-mutation rates, adaptations hosts immune escape has driven high divergence viral genes, hampering their functional annotation phylogenetic inference. The protein structure is more conserved than sequence can be used for searches distant homologs evolutionary analysis divergent proteins. Structures proteins traditionally underrepresented public databases, recent advances prediction allows us address this issue. Combining two state-of-the-art approaches, AlphaFold2-ColabFold ESMFold, we predicted models 85,000 4,400 human animal viruses, expanding structural coverage by 30 times compared experimental structures. We performed network analyses demonstrate utility inference relationships. Taking approach, examined deep history class-I fusion glycoproteins, gaining insights on origins coronavirus spike protein. To enable further discoveries, have created Viro3D ( https://viro3d.cvr.gla.ac.uk/ ), a virus species-centred database. It users search, browse download interest explore similar structures present other species. This resource will facilitate fundamental molecular virology, investigation evolution, may structure-informed design therapies vaccines.

Language: Английский

Citations

1

Molecular and structural innovations of the stator motor complex at the dawn of flagellar motility DOI Creative Commons
Caroline Puente-Lelièvre, Pietro Ridone, Jordan Douglas

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 23, 2024

Abstract The rotation of the bacterial flagellum is powered by MotAB stator complex, which converts ion flux into torque. origin and evolution this remarkable complex understudied. Here, we perform first phylogenetic structural characterisation classification nonflagellar relatives. Using 193 genomes sampled across 27 phyla, estimated phylogenies ancestral sequences, generated AlphaFold predictions for all extant reconstructed proteins. We then mapped them onto phylogeny to determine patterns diversity distribution innovations. identify two discrete groups: Flagellar Ion Transporters (FIT) Generic (GIT). FIT proteins are structurally conserved have a square fold domain torque-generating interface (TGI). divided clades, termed TGI4 TGI5, referring whether there 4 or 5 short helices in TGI. TGI5 motors predominantly found Proteobacteria include well-studied E. coli K12 system, while diverse phyla Na + -powered polar Vibrio (PomAB). GIT proteins, on other hand, lack these attributes. interaction between A B subunits jointly necessary function, with genes typically adjacent within an operon. Motility assays show that elements unique play important role flagellar motility. Our results indicate motor has single shares motility-related traits. Significance Statement motility key feature pathogenicity survival. It allows bacteria propel themselves direct movement according environmental conditions. investigated molecular provide motive force power rotation. This study integrates phylogenetics, 3D protein structure modeling, state reconstruction (ASR) insights mechanisms motor.

Language: Английский

Citations

0