Viroid-like colonists of human microbiomes DOI Creative Commons
Ivan N. Zheludev, R. C. Edgar, María José López-Galiano

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 21, 2024

Here, we describe the "Obelisks," a previously unrecognised class of viroid-like elements that first identified in human gut metatranscriptomic data. "Obelisks" share several properties: (i) apparently circular RNA ~1kb genome assemblies, (ii) predicted rod-like secondary structures encompassing entire genome, and (iii) open reading frames coding for novel protein superfamily, which call "Oblins". We find Obelisks form their own distinct phylogenetic group with no detectable sequence or structural similarity to known biological agents. Further, are prevalent tested microbiome metatranscriptomes representatives detected ~7% analysed stool (29/440) ~50% oral (17/32). Obelisk compositions appear differ between anatomic sites capable persisting individuals, continued presence over >300 days observed one case. Large scale searches 29,959 (clustered at 90% nucleotide identity), examples from all seven continents diverse ecological niches. From this search, subset code Obelisk-specific variants hammerhead type-III self-cleaving ribozyme. Lastly, case bacterial species (Streptococcus sanguinis) defined laboratory strains harboured specific population. As such, comprise RNAs have colonised, gone unnoticed in, human, global microbiomes.

Language: Английский

Structural mechanism of bridge RNA-guided recombination DOI Creative Commons
Masahiro Hiraizumi, Nicholas T. Perry, Matthew G. Durrant

et al.

Nature, Journal Year: 2024, Volume and Issue: 630(8018), P. 994 - 1002

Published: June 26, 2024

Insertion sequence (IS) elements are the simplest autonomous transposable found in prokaryotic genomes

Language: Английский

Citations

27

Computational scoring and experimental evaluation of enzymes generated by neural networks DOI Creative Commons
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: unknown

Published: April 23, 2024

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics assess the quality enzyme sequences produced by three contrasting models: ancestral reconstruction, adversarial network language model. Focusing on two families, we expressed purified over 500 natural with 70-90% identity most similar benchmark for in vitro activity. Over rounds experiments, filter that improved rate experimental success 50-150%. The proposed drive engineering research serving as helping select active variants testing.

Language: Английский

Citations

26

TemStaPro: protein thermostability prediction using sequence representations from protein language models DOI Creative Commons
Ieva Pudžiuvelytė, Kliment Olechnovič, Eglė Godliauskaitė

et al.

Bioinformatics, Journal Year: 2024, Volume and Issue: 40(4)

Published: March 18, 2024

Abstract Motivation Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This problem can be tackled using machine learning by taking advantage the recent blossoming deep methods analysis. These facilitate training on more data and, possibly, enable development versatile predictors multiple ranges temperatures. Results We applied principle transfer to predict embeddings generated language models (pLMs) an input sequence. used large pLMs that were pre-trained hundreds millions known sequences. The such allowed us efficiently train validate a high-performing method over one million sequences we collected organisms with annotated growth Our method, TemStaPro (Temperatures Stability Proteins), was CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups C2EPs in terms largely tune previously published our newly obtained experimental data. Availability implementation software related are freely available https://github.com/ievapudz/TemStaPro https://doi.org/10.5281/zenodo.7743637.

Language: Английский

Citations

22

Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies DOI Creative Commons
George Bouras, Ghais Houtak, Ryan R. Wick

et al.

Microbial Genomics, Journal Year: 2024, Volume and Issue: 10(5)

Published: May 8, 2024

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- long-reads) assembly approaches. Complete allow a deeper understanding evolution genomic variation beyond single nucleotide variants. They also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids missed or misassembled by algorithms. Here, we present Hybracter allows fast, automatic scalable recovery near-perfect first approach. can be run either as assembler only assembler. We compared to existing automated tools diverse panel samples varying levels with manually curated ground truth reference genomes. demonstrate is more accurate faster than gold standard Unicycler. show long-reads most comparable methods accurately recovering plasmids.

Language: Английский

Citations

20

Viroid-like colonists of human microbiomes DOI Creative Commons
Ivan N. Zheludev, R. C. Edgar, María José López-Galiano

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 21, 2024

Here, we describe the "Obelisks," a previously unrecognised class of viroid-like elements that first identified in human gut metatranscriptomic data. "Obelisks" share several properties: (i) apparently circular RNA ~1kb genome assemblies, (ii) predicted rod-like secondary structures encompassing entire genome, and (iii) open reading frames coding for novel protein superfamily, which call "Oblins". We find Obelisks form their own distinct phylogenetic group with no detectable sequence or structural similarity to known biological agents. Further, are prevalent tested microbiome metatranscriptomes representatives detected ~7% analysed stool (29/440) ~50% oral (17/32). Obelisk compositions appear differ between anatomic sites capable persisting individuals, continued presence over >300 days observed one case. Large scale searches 29,959 (clustered at 90% nucleotide identity), examples from all seven continents diverse ecological niches. From this search, subset code Obelisk-specific variants hammerhead type-III self-cleaving ribozyme. Lastly, case bacterial species (Streptococcus sanguinis) defined laboratory strains harboured specific population. As such, comprise RNAs have colonised, gone unnoticed in, human, global microbiomes.

Language: Английский

Citations

19