Recovery of highly contiguous genomes from complex terrestrial habitats reveals over 15,000 novel prokaryotic species and expands characterization of soil and sediment microbial communities DOI Creative Commons
Mantas Sereika, Aaron J. Mussig, Chenjing Jiang

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 21, 2024

Abstract Genomes are fundamental to understanding microbial ecology and evolution. The emergence of high-throughput, long-read DNA sequencing has enabled recovery genomes from environmental samples at scale. However, expanding the genome catalogue soils sediments been challenging due enormous complexity these environments. Here, we performed deep, Nanopore 154 soil sediment collected across Denmark through an optimised bioinformatics pipeline, recovered 15,314 novel species, including 4,757 high-quality genomes. span 1,086 genera provide first reference for 612 previously known genera, phylogenetic diversity prokaryotic tree life by 8 %. assemblies also thousands complete rRNA operons, biosynthetic gene clusters CRISPR-Cas systems, all which were underrepresented highly fragmented in previous terrestrial catalogues. Furthermore, incorporation MAGs into public databases significantly improved species-level classification rates metagenomic datasets, thereby enhancing microbiome characterization. With this study, demonstrate that bioinformatics, allows cost-effective complex ecosystems, remain largest untapped source biodiversity filling gaps life.

Language: Английский

A metagenomic perspective on the microbial prokaryotic genome census DOI Creative Commons
Dongying Wu, R. Seshadri, Nikos C. Kyrpides

et al.

Science Advances, Journal Year: 2025, Volume and Issue: 11(3)

Published: Jan. 17, 2025

Following 30 years of sequencing, we assessed the phylogenetic diversity (PD) >1.5 million microbial genomes in public databases, including metagenome-assembled (MAGs) uncultivated microbes. As compared to vast uncovered by metagenomic sequences, cultivated taxa account for a modest portion overall diversity, 9.73% bacteria and 6.55% archaea, while MAGs contribute 48.54% 57.05%, respectively. Therefore, substantial fraction bacterial (41.73%) archaeal PD (36.39%) still lacks any genomic representation. This unrepresented manifests primarily at lower taxonomic ranks, exemplified 134,966 species identified 18,087 samples. Our study exposes hotspots freshwater, marine subsurface, sediment, soil, other environments, whereas human samples yielded minimal novelty within context existing datasets. These results offer roadmap future genome recovery efforts, delineating uncaptured underexplored environments underscoring necessity renewed isolation sequencing.

Language: Английский

Citations

5

From soil to sequence: filling the critical gap in genome-resolved metagenomics is essential to the future of soil microbial ecology DOI Creative Commons
Winston Anthony, Steven Allison, Caitlin M. Broderick

et al.

Environmental Microbiome, Journal Year: 2024, Volume and Issue: 19(1)

Published: Aug. 2, 2024

Soil microbiomes are heterogeneous, complex microbial communities. Metagenomic analysis is generating vast amounts of data, creating immense challenges in sequence assembly and analysis. Although advances technology have resulted the ability to easily collect large soil samples containing thousands unique taxa often poorly characterized. These reduce usefulness genome-resolved metagenomic (GRM) seen other fields microbiology, such as creation high quality assembled genomes adoption genome scale modeling approaches. The absence these resources restricts future research, limiting hypothesis generation predictive Creating publicly available databases MAGs, similar produced for microbiomes, has potential transform scientific insights about without requiring computational domain expertise binning.

Language: Английский

Citations

5

GenomeOcean: An Efficient Genome Foundation Model Trained on Large-Scale Metagenomic Assemblies DOI Creative Commons
Zhihan Zhou, Robert Riley, Satria A. Kautsar

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 5, 2025

ABSTRACT Genome foundation models hold transformative potential for precision medicine, drug discovery, and understanding complex biological systems. However, existing are often inefficient, constrained by suboptimal tokenization architectural design, biased toward reference genomes, limiting their representation of low-abundance, uncultured microbes in the rare biosphere. To address these challenges, we developed GenomeOcean , a 4-billion-parameter generative genome model trained on over 600 Gbp high-quality contigs derived from 220 TB metagenomic datasets collected diverse habitats across Earth’s ecosystems. A key innovation is training directly large-scale co-assemblies samples, enabling enhanced microbial species improving generalizability beyond genome-centric approaches. We implemented byte-pair encoding (BPE) strategy sequence generation, alongside optimizations, achieving up to 150× faster generation while maintaining high fidelity. excels representing generating protein-coding genes evolutionary principles. Additionally, its fine-tuned demonstrates ability discover novel biosynthetic gene clusters (BGCs) natural genomes perform zero-shot synthesis biochemically plausible, complete BGCs. sets new benchmark research, product synthetic biology, offering robust advancing fields.

Language: Английский

Citations

0

A review of neural networks for metagenomic binning DOI Creative Commons
Jair Herazo-Álvarez, Marco Mora, Sara Cuadros-Orellana

et al.

Briefings in Bioinformatics, Journal Year: 2025, Volume and Issue: 26(2)

Published: March 1, 2025

One of the main goals metagenomic studies is to describe taxonomic diversity microbial communities. A crucial step in analysis binning, which involves (supervised) classification or (unsupervised) clustering sequences. Various machine learning models have been applied address this task. In review, contributions artificial neural networks (ANN) context binning are detailed, addressing both supervised, unsupervised, and semi-supervised approaches. 34 ANN-based tools systematically compared, detailing their architectures, input features, datasets, advantages, disadvantages, other relevant aspects. The findings reveal that deep approaches, such as convolutional autoencoders, achieve higher accuracy scalability than traditional methods. Gaps benchmarking practices highlighted, future directions proposed, including standardized datasets optimization for third-generation sequencing. This review provides support researchers identifying trends selecting suitable problem.

Language: Английский

Citations

0

Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient DOI Creative Commons
Anna Kazarina,

Hallie Wiechman,

Soumyadev Sarkar

et al.

Scientific Data, Journal Year: 2025, Volume and Issue: 12(1)

Published: March 28, 2025

Soil contains a diverse community of organisms; these can include archaea, fungi, viruses, and bacteria. In situ identification soil microorganisms is challenging. The use genome-centric metagenomics enables the assembly microbial populations, allowing categorization exploration potential functions living in complex environment. However, heterogeneity soil-inhabiting microbes poses tremendous challenge, with their left unknown, difficult to culture lab settings. this study, using genome assembling strategies from both field core samples enriched monolith samples, we assembled 679 highly complete metagenome-assembled genomes (MAGs). ability identify MAGs across precipitation gradient state Kansas (USA) provided insights into impact levels on populations. Metabolite modeling revealed that more than 80% populations possessed carbohydrate-active enzymes, capable breaking down chitin starch.

Language: Английский

Citations

0

The dynamic history of prokaryotic phyla: discovery, diversity and division DOI
Mark J. Pallen

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, Journal Year: 2024, Volume and Issue: 74(9)

Published: Sept. 9, 2024

Here, I review the dynamic history of prokaryotic phyla. Following leads set by Darwin, Haeckel and Woese, concept phylum has evolved from a group sharing common phenotypes to organisms ancestry, with modern taxonomy based on phylogenetic classifications drawn macromolecular sequences. Phyla came as surprising latecomers formalities nomenclature in 2021. Since then names have been validly published for 46 phyla, replacing some established neologisms, prompting criticism debate within scientific community. Molecular barcoding enabled analysis microbial ecosystems without cultivation, leading identification candidate divisions (or phyla) diverse environments. The introduction metagenome-assembled genomes marked significant advance identifying classifying uncultured lumper–splitter dichotomy led disagreements, experts cautioning against pressure create profusion new phyla prominent databases adopting conservative stance. Candidatus designation widely used provide provisional status taxa, named under this convention now clearly surpassing those names. Genome Taxonomy Database (GTDB) offered stable, standardized normalized taxonomic ranks, which both lumping splitting pre-existing GTDB framework introduced unwieldy alphanumeric placeholder labels, recent publication over 100 user-friendly Latinate unnamed Most remain ‘known unknowns’, limited knowledge their genomic diversity, ecological roles, or Whether still reflect evolutionary partitions across life remains an area active debate. However, practical importance microbiome analyses, particularly clinical research. Despite potential diminishing returns discovery biodiversity, offer extensive research opportunities microbiologists foreseeable future.

Language: Английский

Citations

2

GenomeFace: a deep learning-based metagenome binner trained on 43,000 microbial genomes DOI Open Access

Richard Lettich,

Robert W. Egan,

Robert Riley

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 8, 2024

Abstract Metagenomic binning, the process of grouping DNA sequences into taxonomic units, is critical for understanding functions, interactions, and evolutionary dynamics microbial communities. We propose a deep learning approach to binning using two neural networks, one based on composition another environmental abundance, dynamically weighting contribution each characteristics input data. Trained over 43,000 prokaryotic genomes, our network composition-based inspired by metric techniques used facial recognition. Using task-specific, multi-GPU accelerated algorithm cluster embeddings produced network, binner leverages marker genes observed be universally present in nearly all taxa grade select optimal clusters from hierarchy candidates. evaluate four simulated datasets with known ground truth. Our linear time integration recovers more near complete genomes than state art but computationally infeasible solutions them, while being an order magnitude faster. Finally, we demonstrate scalability acuity testing it three largest metagenome assemblies ever performed. Compared other binners, 47%-183% genomes. From these datasets, find 3000 new candidate species which have never been previously cataloged, representing potential 4% expansion bacterial tree life.

Language: Английский

Citations

1

Deep long-read metagenomic sequencing reveals niche differentiation in carbon cycling potential between benthic and planktonic microbial populations DOI Creative Commons
Tomeu Viver, Katrin Knittel, Rudolf Amann

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: June 4, 2024

Coastal marine sediments function as large-scale natural biocatalytic filters, remineralizing and transforming organic matter. Benthic microbiomes exhibit remarkable temporal stability, contrasting with the dynamic, substrate-driven successions of bacterioplankton. Nonetheless, understanding their role in carbon cycling interactions between these microbial groups is limited due to complexity benthic microbiomes. Here, we used a seasonally resolved, deep short- long-read metagenomic approach examine distinctive genomic features recovered from sediment, overlaying water column, particle-attached bacteria archaea North Sea. We 115 metagenome-assembled genomes (MAGs) that belonged Woeseiales , Rhizobiales Planctomycetia Gemmatimonadota Desulfobacterota species. While Proteobacteria Actinobacteriota were characteristic phyla sediments, Acidimicrobiia Desulfocapsaceae species shared fractions indicative significant bentho-pelagic coupling. Predominant members family Woeseiaceae carried polysaccharide utilization loci (PULs) predicted target laminarin, alginate, α-glucan sediments. In contrast, column lacked PULs encoded significantly higher fraction sulfatases peptidases, indicating degradation protein-rich sulfated Our findings disentangle family-level adaptations niche differentiation globally populations involved matter storage.

Language: Английский

Citations

1

Soil redox drives virus-host community dynamics and plant biomass degradation in tropical rainforest soils DOI Creative Commons
Gareth Trubl, Ikaia Leleiwi, Ashley Campbell

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 14, 2024

Abstract Background Wet tropical forest soils store a vast amount of organic carbon and cycle over third terrestrial net primary production. The microbiomes these have global impact on greenhouse gases tolerate remarkably dynamic redox environment—driven by high availability reductant, soil moisture, fine-textured that limit oxygen diffusion. Yet microbiomes, particularly virus-host interactions, remain poorly characterized, we little understanding how they will shape future cycling as high-intensity drought precipitation events make conditions less predictable. Results To investigate the effects shifting active viral communities virus-microbe conducted 44-day manipulation experiment using from Luquillo Experimental Forest, Puerto Rico, amended with 13 C-enriched plant biomass. We sequenced 10 bulk metagenomes 85 stable isotope probing targeted generated extracting whole community DNA, performing density fractionation, conducting shotgun sequencing. Viral microbial genomes were assembled resulting in 5,420 populations (vOTUs) 927 medium-to-high-quality metagenome-assembled across 25 bacterial phyla. Notably, half (54%) vOTUs C-enriched, highlighting their role degradation litter. These primarily infected phyla Pseudomonadota , Acidobacteriota Actinomycetota 57% unique to particular treatment. anoxic samples exhibited most distinct communities, an increased potential for modulating host metabolism carrying redox-specific glycoside hydrolases. However, present all conditions, suggesting selection cosmopolitan viruses occurs naturally experience conditions. Conclusions Our study demonstrates interactions soils. By applying different DNA assembly methods incubating under various regimes, identified observed significant variations composition function. findings highlight specialized roles diverse environmental providing important insights into contributions broader implications climate change.

Language: Английский

Citations

1

Exabiome: Advancing Microbial Science through Exascale Computing DOI Creative Commons
Steven Hofmeyr, Aydın Buluç, Robert Riley

et al.

Computing in Science & Engineering, Journal Year: 2024, Volume and Issue: 26(2), P. 8 - 15

Published: April 1, 2024

The Exabiome project seeks to improve the understanding of microbiomes through development methods for accelerating metagenomic science using exascale computing. This article gives an overview scientific impact three components project: metagenome assembly, protein family detection and comparative analysis metagenomes. developed MetaHipMer, only assembler capable scaling full systems. MetaHipMer has enabled ground-breaking assemblies on Frontier supercomputer, with many benefits, such as discovery rare species viral genomes. To investigate families, two tools, PASTIS HipMCL. Together, these can utilize resources understand functional diversity billions "dark matter" proteins novel families. For analysis, kmerprof, a tool that be used compare huge metagenomes different purposes, example, grouping human according body location.

Language: Английский

Citations

0