The long and short of it: benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies DOI Creative Commons
Ryan Cook, Nathan Brown, Branko Rihtman

et al.

Microbial Genomics, Journal Year: 2024, Volume and Issue: 10(2)

Published: Feb. 20, 2024

Viral metagenomics has fuelled a rapid change in our understanding of global viral diversity and ecology. Long-read sequencing hybrid assembly approaches that combine long- short-read technologies are now being widely implemented bacterial genomics metagenomics. However, the use long-read to investigate communities is still its infancy. While Nanopore PacBio have been applied metagenomics, it not known what extent different will impact reconstruction community. Thus, we constructed mock bacteriophage community previously sequenced phage genomes them using Illumina, tested number approaches. When single technology, Illumina assemblies were best at recovering genomes. Nanopore- PacBio-only performed poorly comparison both genome recovery error rates, which varied with assembler used. The had errors manifested as SNPs INDELs frequencies 41 157 % higher than found only assemblies, respectively. 12 78 Illumina-only Despite high-read coverage, long-read-only recovered maximum one complete from any assembly, unless reads down-sampled prior assembly. Overall approach was by combination reads, reduced rates levels comparable short-read-only assemblies. approach. differences between technology downstream impacts on gene prediction, subsequent estimates within sample. These findings provide starting point for others choice algorithms analysis viromes.

Language: Английский

INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes DOI Open Access
Ryan Cook, Nathan Brown, Tamsin Redgwell

et al.

PHAGE, Journal Year: 2021, Volume and Issue: 2(4), P. 214 - 223

Published: Oct. 6, 2021

Background: With advances in sequencing technology and decreasing costs, the number of phage genomes that have been sequenced has increased markedly past decade. Materials Methods: We developed an automated retrieval analysis system for (https://github.com/RyanCook94/inphared) to produce INfrastructure a PHAge REference Database (INPHARED) associated metadata. Results: As January 2021, 14,244 complete sequenced. The INPHARED data set is dominated by phages infect small bacterial genera, with 75% isolated on only 30 genera. There further bias, significantly more lytic (∼70%) than temperate (∼30%) within our database. Collectively, this results ∼54% originating from just three host much debate carriage antibiotic resistance genes their potential safety therapy, we searched putative genes. Frequency gene was found be higher again varied host. Conclusions: Given bias currently genomes, suggest fully understand diversity, efforts should made isolate sequence larger phages, particular greater diversity hosts.

Language: Английский

Citations

248

IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata DOI Creative Commons
Antônio Pedro Camargo, Stephen Nayfach, I-Min A. Chen

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D733 - D743

Published: Nov. 18, 2022

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration the global virosphere, progressively revealing extensive genomic diversity viruses on Earth and highlighting myriad ways by which impact biological processes. IMG/VR provides access to largest collection viral sequences obtained from (meta)genomes, along with functional annotation rich metadata. A web interface users efficiently browse search based genome features and/or sequence similarity. Here, we present fourth version IMG/VR, composed >15 million virus genomes fragments, a ≈6-fold increase in size compared previous version. These clustered into 8.7 operational taxonomic units, including 231 408 at least one high-quality representative. Viral now systematically identified genomes, metagenomes, metatranscriptomes using new detection approach (geNomad), IMG standard complemented quality estimation CheckV, classification reflecting latest standards, microbial host taxonomy prediction. v4 is available https://img.jgi.doe.gov/vr, underlying data download https://genome.jgi.doe.gov/portal/IMG_VR.

Language: Английский

Citations

235

Expansion of the global RNA virome reveals diverse clades of bacteriophages DOI Creative Commons
Uri Neri, Yuri I. Wolf, Simon Roux

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(21), P. 4023 - 4037.e18

Published: Sept. 28, 2022

High-throughput RNA sequencing offers broad opportunities to explore the Earth virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million virus contigs. Analysis of >330,000 RNA-dependent polymerases (RdRPs) shows that this expansion corresponds a 5-fold increase known diversity. Gene content analysis revealed multiple protein domains previously not found in viruses and implicated virus-host interactions. Extended RdRP phylogeny supports monophyly five established phyla reveals two putative additional bacteriophage numerous classes orders. The dramatically expanded phylum Lenarviricota, consisting bacterial related eukaryotic viruses, now accounts for third Identification CRISPR spacer matches bacteriolytic proteins suggests subsets picobirnaviruses partitiviruses, associated with eukaryotes, infect prokaryotic hosts.

Language: Английский

Citations

203

iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria DOI Creative Commons
Simon Roux, Antônio Pedro Camargo, Felipe H. Coutinho

et al.

PLoS Biology, Journal Year: 2023, Volume and Issue: 21(4), P. e3002083 - e3002083

Published: April 21, 2023

The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available predict host(s) uncultivated based on their genome sequences, but thus far individual limited either precision or recall, i.e., for a number they yield erroneous predictions no prediction at all. Here, we describe iPHoP, two-step framework that integrates multiple methods reliably taxonomy genus rank broad range archaea, while retaining low false discovery rate. Based large dataset virus genomes from IMG/VR database, illustrate how iPHoP can provide extensive guide further characterization viruses.

Language: Английский

Citations

202

Identification of mobile genetic elements with geNomad DOI Creative Commons
Antônio Pedro Camargo, Simon Roux, Frederik Schulz

et al.

Nature Biotechnology, Journal Year: 2023, Volume and Issue: 42(8), P. 1303 - 1312

Published: Sept. 21, 2023

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications impact on public health. Here we introduce geNomad, a classification annotation framework that combines information from gene content deep neural network to identify sequences of plasmids viruses. geNomad uses dataset more than 200,000 marker protein profiles provide functional taxonomic assignment viral genomes. Using conditional random field model, also detects proviruses integrated into host genomes with high precision. In benchmarks, achieved performance diverse viruses (Matthews correlation coefficient 77.8% 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed scalability, processed over 2.7 trillion base pairs data, leading the discovery millions are available through IMG/VR IMG/PR databases. at https://portal.nersc.gov/genomad .

Language: Английский

Citations

197

BACPHLIP: predicting bacteriophage lifestyle from conserved protein domains DOI Creative Commons
Adam J. Hockenberry, Claus O. Wilke

PeerJ, Journal Year: 2021, Volume and Issue: 9, P. e11396 - e11396

Published: May 6, 2021

Bacteriophages are broadly classified into two distinct lifestyles: temperate and virulent. Temperate phages capable of a latent phase infection within host cell (lysogenic cycle), whereas virulent directly replicate lyse cells upon (lytic cycle). Accurate lifestyle identification is critical for determining the role individual phage species ecosystems their effect on evolution. Here, we present BACPHLIP, BACterioPHage LIfestyle Predictor. BACPHLIP detects presence set conserved protein domains an input genome uses this data to predict via Random Forest classifier that was trained dataset 634 genomes. On independent test 423 phages, has accuracy 98% greatly exceeding previously existing tools (79%). freely available GitHub ( https://github.com/adamhockenberry/bacphlip ) code used build provided in separate repository https://github.com/adamhockenberry/bacphlip-model-dev users wishing interrogate re-train underlying classification model.

Language: Английский

Citations

192

Deep sea sediments associated with cold seeps are a subsurface reservoir of viral diversity DOI Creative Commons
Zexin Li,

Donald Pan,

Guangshan Wei

et al.

The ISME Journal, Journal Year: 2021, Volume and Issue: 15(8), P. 2366 - 2378

Published: March 1, 2021

Abstract In marine ecosystems, viruses exert control on the composition and metabolism of microbial communities, influencing overall biogeochemical cycling. Deep sea sediments associated with cold seeps are known to host taxonomically diverse but little is about infecting these microorganisms. Here, we probed metagenomes from seven geographically across global oceans assess viral diversity, virus–host interaction, virus-encoded auxiliary metabolic genes (AMGs). Gene-sharing network comparisons inhabiting other ecosystems reveal that seep harbour considerable unexplored diversity. Most display high degrees endemism fluid flux being one main drivers community composition. silico predictions linked 14.2% populations many belonging poorly understood candidate bacterial archaeal phyla. Lysis was predicted be a predominant lifestyle based lineage-specific virus/host abundance ratios. Metabolic prokaryotic genomes AMGs suggest influence hydrocarbon biodegradation at seeps, as well carbon, sulfur nitrogen cycling via virus-induced mortality and/or augmentation. Overall, findings diversity biogeography indicate how may manipulate ecology biogeochemistry.

Language: Английский

Citations

170

The oral microbiome: diversity, biogeography and human health DOI
Jonathon L. Baker, Jessica L. Mark Welch, Kathryn M. Kauffman

et al.

Nature Reviews Microbiology, Journal Year: 2023, Volume and Issue: 22(2), P. 89 - 104

Published: Sept. 12, 2023

Language: Английский

Citations

164

Soil viral diversity, ecology and climate change DOI
Janet Jansson, Ruonan Wu

Nature Reviews Microbiology, Journal Year: 2022, Volume and Issue: 21(5), P. 296 - 311

Published: Nov. 9, 2022

Language: Английский

Citations

118

Mutualistic interplay between bacteriophages and bacteria in the human gut DOI
Andrey N. Shkoporov, Christopher Turkington, Colin Hill

et al.

Nature Reviews Microbiology, Journal Year: 2022, Volume and Issue: 20(12), P. 737 - 749

Published: June 30, 2022

Language: Английский

Citations

105