Expanding the genome information on Bacillales for biosynthetic gene cluster discovery DOI Creative Commons
Lijie Song, Lasse Johan Dyrbye Nielsen, Xinming Xu

et al.

Scientific Data, Journal Year: 2024, Volume and Issue: 11(1)

Published: Nov. 21, 2024

This study showcases 121 new genomes of spore-forming Bacillales from strains collected globally a variety habitats, assembled using Oxford Nanopore long-read and MGI short-read sequences. Bacilli are renowned for their capacity to produce diverse secondary metabolites with use in agriculture, biotechnology, medicine. These encoded within biosynthetic gene clusters (smBGCs). smBGCs have significant research interest due potential as sources bioactivate compounds. Our dataset includes 62 complete genomes, 2 at chromosome level, 57 contig covering genomic size range 3.50 Mb 7.15 Mb. Phylotaxonomic analysis revealed that these span 16 genera, 69 them belonging Bacillus. A total 1,176 predicted BGCs were identified by silico genome mining. We anticipate the open-access data presented here will expand reported information facilitate deeper understanding genetic basis Bacillales' metabolite production.

Language: Английский

A treasure trove of 1034 actinomycete genomes DOI Creative Commons
Tue Sparholt Jørgensen, Omkar S. Mohite, Eva Baggesgaard Sterndorff

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 52(13), P. 7487 - 7503

Published: June 22, 2024

Abstract Filamentous Actinobacteria, recently renamed Actinomycetia, are the most prolific source of microbial bioactive natural products. Studies on biosynthetic gene clusters benefit from or require chromosome-level assemblies. Here, we provide DNA sequences >1000 isolates: 881 complete genomes and 153 near-complete genomes, representing 28 genera 389 species, including 244 likely novel species. All filamentous isolates class Actinomycetia NBC culture collection. The largest genus is Streptomyces with 886 742 We use this data to show that analysis can bring biological understanding not previously derived more fragmented less systematic datasets. document central structured location core genes distal specialized metabolite duplicate linear chromosome, analyze content length terminal inverted repeats which characteristic for Streptomyces. then diversity trans-AT polyketide synthase clusters, encodes machinery a biotechnologically highly interesting compound class. These insights have both ecological biotechnological implications in importance high quality genomic resources complex role synteny plays biology.

Language: Английский

Citations

9

Pangenome mining of the Streptomyces genus redefines species’ biosynthetic potential DOI Creative Commons
Omkar S. Mohite, Tue Sparholt Jørgensen, Thomas Booth

et al.

Genome biology, Journal Year: 2025, Volume and Issue: 26(1)

Published: Jan. 14, 2025

Background Streptomyces is a highly diverse genus known for the production of secondary or specialized metabolites with wide range applications in medical and agricultural industries. Several thousand complete nearly genome sequences are now available, affording opportunity to deeply investigate biosynthetic potential within these organisms advance natural product discovery initiatives. Results We perform pangenome analysis on 2371 genomes, including approximately 1200 assemblies. Employing data-driven approach based similarities, was classified into 7 primary 42 Mash-clusters, forming basis comprehensive mining. A refined workflow grouping gene clusters (BGCs) redefines their diversity across different Mash-clusters. This also reassigns 2729 BGC families only 440 families, reduction caused by inaccuracies boundary detections. When genomic location BGCs included analysis, conserved structure, synteny, among becomes apparent species synteny suggests that vertical inheritance major factor diversification BGCs. Conclusions Our dataset at scale thousands genomes refines predictions using Mash-clusters as analysis. The observed conservation order BGCs’ locations shows vertically inherited. presented in-depth pave way large-scale investigations enhance our understanding genus.

Language: Английский

Citations

1

Targeted genome mining with GATOR-GC maps the evolutionary landscape of biosynthetic diversity DOI Creative Commons
José D. D. Cediel-Becerra, Andrés Cumsille, Sebastian Guerra

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 28, 2025

Gene clusters, groups of physically adjacent genes that work collectively, are pivotal to bacterial fitness and valuable in biotechnology medicine. While various genome mining tools can identify characterize gene they often overlook their evolutionary diversity, a crucial factor revealing novel cluster functions applications. To address this gap, we developed GATOR-GC, targeted tool enables comprehensive flexible exploration clusters single execution. We show GATOR-GC identified diversity over 4 million similar experimentally validated biosynthetic (BGCs) other fail detect. highlight the utility previously uncharacterized co-occurring conserved potentially involved mycosporine-like amino acid biosynthesis mapped taxonomic patterns genomic islands modify DNA with 7-deazapurines. Additionally, its proximity-weighted similarity scoring, successfully differentiated BGCs FK-family metabolites (e.g., rapamycin, FK506/520) according chemistries. anticipate will be assess for targeted, exploratory, mining. is available at https://github.com/chevrettelab/gator-gc .

Language: Английский

Citations

1

Context matters: assessing the impacts of genomic background and ecology on microbial biosynthetic gene cluster evolution DOI Creative Commons
Rauf Salamzade, Lindsay Kalan

mSystems, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 24, 2025

ABSTRACT Encoded within many microbial genomes, biosynthetic gene clusters (BGCs) underlie the synthesis of various secondary metabolites that often mediate ecologically important functions. Several studies and bioinformatics methods developed over past decade have advanced our understanding both pangenomes BGC evolution. In this minireview, we first highlight challenges in broad evolutionary analysis BGCs, including delineation boundaries clustering BGCs across genomes. We further summarize key findings from comparative genomics on conservation taxa habitats discuss potential fitness effects different settings. Afterward, recent research showing importance genomic context production evolution is highlighted. These draw parallels to recent, broader, investigations gene-to-gene associations pangenomes. Finally, describe mechanisms by which evolve, ranging acquisition or origination entire micro-evolutionary trends individual genes. An outlook how expansions capabilities some might support theories open are result adaptive also discussed. conclude with remarks about future work leveraging longitudinal metagenomics diverse ecosystems likely significantly improve genomes BGCs.

Language: Английский

Citations

0

antiSMASH 8.0: extended gene cluster detection capabilities and analyses of chemistry, enzymology, and regulation DOI Creative Commons
Kai Blin, Simon J. Shaw, Lisa Vader

et al.

Nucleic Acids Research, Journal Year: 2025, Volume and Issue: unknown

Published: April 25, 2025

Abstract Microorganisms synthesize small bioactive compounds through their secondary or specialized metabolism. Those play an important role in microbial interactions and soil health, but are also crucial for the development of pharmaceuticals agrochemicals. Over past decades, advancements genome sequencing have enabled identification large numbers biosynthetic gene clusters directly from genomes. Since its inception 2011, antiSMASH (https://antismash.secondarymetabolites.org/), has become leading tool detecting characterizing these bacteria fungi. This paper introduces version 8 antiSMASH, which increased number detectable cluster types 81 to 101, improved analysis support terpenoids tailoring enzymes, as well improvements modular enzymes like polyketide synthases nonribosomal peptide synthetases. These modifications keep up-to-date with developments field extend overall predictive capabilities natural product mining.

Language: Английский

Citations

0

PanKB: An interactive microbial pangenome knowledgebase for research, biotechnological innovation, and knowledge mining DOI Creative Commons

Sun Boying,

Liubov Pashkova, Pascal A. Pieters

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 53(D1), P. D806 - D818

Published: Nov. 22, 2024

Abstract The exponential growth of microbial genome data presents unprecedented opportunities for unlocking the potential microorganisms. burgeoning field pangenomics offers a framework extracting insights from this big biological data. Recent advances in pangenomic research have generated substantial and literature, yielding valuable knowledge across diverse species. PanKB (pankb.org), knowledgebase designed biotechnological applications, was built to capitalize on wealth information. currently includes 51 pangenomes 8 industrially relevant families, comprising 8402 genomes, over 500 000 genes 7M mutations. To describe data, implements four main components: (1) Interactive analytics facilitate exploration, intuition, discoveries; (2) Alleleomic analytics, pangenomic-scale analysis variants, providing into intra-species sequence variation mutations applications; (3) A global search function enabling broad deep investigations power bioengineering workflows; (4) bibliome 833 open-access papers an interface with LLM that can answer in-depth questions using its knowledge. empowers researchers bioengineers harness serves as resource bridging gap between practical applications.

Language: Английский

Citations

2

Pan-genome-scale metabolic modeling of Bacillus subtilis reveals functionally distinct groups DOI Creative Commons
M J Neal,

William Brakewood,

Michael J. Betenbaugh

et al.

mSystems, Journal Year: 2024, Volume and Issue: 9(11)

Published: Oct. 4, 2024

ABSTRACT Bacillus subtilis is an important industrial and environmental microorganism known to occupy many niches produce compounds of interest. Although it one the best-studied organisms, much this focus including reconstruction genome-scale metabolic models has been placed on a few key laboratory strains. Here, we substantially expand these prior pan-genome-scale, representing 481 genomes B. with 2,315 orthologous gene clusters, 1,874 metabolites, 2,239 reactions. Furthermore, incorporate data from carbon utilization experiments for eight strains refine validate its predictions. This comprehensive pan-genome model enables assessment strain-to-strain differences related nutrient utilization, fermentation outputs, robustness, other aspects. Using phenotypic predictions, divide into five groups distinct patterns behavior that correlate across features. The offers deep insights subtilis’ metabolism as varies environments provides understanding how different have adapted dynamic habitats. IMPORTANCE As volume genomic computational power increased, so number models. These encapsulate totality functions given organism. strain 168 first bacteria which network was reconstructed. Since then, several updated reconstructions generated microorganism. single pan-genome-scale model, consists individual By evaluating between strains, identified allowing rapid classification any particular strain. aids identification suitable application.

Language: Английский

Citations

1

Predicting metallophore structure and function through genome mining DOI
Zachary L. Reitz

Methods in enzymology on CD-ROM/Methods in enzymology, Journal Year: 2024, Volume and Issue: unknown, P. 371 - 401

Published: Jan. 1, 2024

Language: Английский

Citations

0

PanKB: An interactive microbial pangenome knowledgebase for research, biotechnological innovation, and knowledge mining DOI Creative Commons

Sun Boying,

Лариса Пашкова,

PA Pieters

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 19, 2024

Abstract The exponential growth of microbial genome data presents unprecedented opportunities for mining the potential microorganisms. burgeoning field pangenomics offers a framework extracting insights from this big biological data. Recent advances in pangenomic research have generated substantial and literature, yielding valuable knowledge across diverse species. PanKB (pankb.org), knowledgebase designed biotechnological applications, was built to capitalize on wealth information. currently includes 51 pangenomes 8 industrially relevant families, comprising 8, 402 genomes, over 500, 000 genes, 7M mutations. To describe data, implements four main components: 1) Interactive analytics facilitate exploration, intuition, discoveries; 2) Alleleomic analytics, pangenomic- scale analysis variants, providing into intra-species sequence variation mutations applications; 3) A global search function enabling broad deep investigations power bioengineering workflows; 4) bibliome 833 open- access papers an interface with LLM that can answer in-depth questions using their knowledge. empowers researchers bioengineers harness full serves as resource bridging gap between practical applications. Graphical

Language: Английский

Citations

0

HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes DOI Creative Commons
Amit Kumar Yadav, Srikrishna Subramanian

BMC Genomics, Journal Year: 2024, Volume and Issue: 25(1)

Published: Nov. 16, 2024

Microbes produce diverse bioactive natural products with applications in fields such as medicine and agriculture. In their genomes, these are encoded by physically clustered genes known biosynthetic gene clusters (BGCs). Genome metagenome sequencing advances have enabled high-throughput identification of BGCs a promising avenue for product discovery. BGC mining from (meta)genomes using silico tools has allowed access to vast diversity potentially novel products. However, fundamental limitation been the ability assemble complete BGCs, especially complex metagenomes. With fragmented assemblies, short-read technologies struggle recover long repetitive nonribosomal peptide synthetase (NRPS) polyketide synthase (PKS). Recent long-read sequencing, High Fidelity (HiFi) technology PacBio, reduced this can help retrieve both accurate metagenomes, warranting improvement existing approach better utilization HiFi data. Here, we present HiFiBGC, command-line-based workflow identify PacBio HiFiBGC leverages an ensemble assemblies three HiFi-tailored assemblers reads not represented assemblies. Based on our analyses four metagenomic datasets different environments, show that identifies, average, 78% more than top-performing single-assembler-based method. This increase is due HiFiBGC's assembly approach, which improves recovery 25%, well inclusion mostly identified unmapped reads. computational identifying implemented majorly Python programming language manager Snakemake. available GitHub at https://github.com/ay-amityadav/HiFiBGC under MIT license. The code related figures presented manuscript https://github.com/ay-amityadav/HiFiBGC_analyses .

Language: Английский

Citations

0