Taxonomic resolution of the ribosomal RNA operon in bacteria: implications for its use with long-read sequencing DOI Creative Commons
Leonardo de Oliveira Martins, Andrew J. Page, Alison E. Mather

et al.

NAR Genomics and Bioinformatics, Journal Year: 2019, Volume and Issue: 2(1)

Published: Nov. 14, 2019

Abstract DNA barcoding through the use of amplified regions ribosomal operon, such as 16S gene, is a routine method to gain an overview microbial taxonomic diversity within sample without need isolate and culture microbes present. However, bacterial cells usually have multiple copies this choosing ‘wrong’ copy could provide misleading species classification. While presents less problem for well-characterized organisms with large sequence databases interrogate, it significant challenge lesser known unknown number diversity. Using entire length which encompasses 16S, 23S, 5S internal transcribed spacer regions, should greater resolution but has not been well explored. Here, we publicly available reference genomes explore theoretical boundaries when using concatenated genes full-length operons, made possible by development uptake long-read sequencing technologies. We quantify issues both choice operon in phylogenetic context demonstrate that longer improve signal while maintaining accuracy.

Language: Английский

Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications DOI Creative Commons
Keith A. Jolley, James E. Bray, Martin Maiden

et al.

Wellcome Open Research, Journal Year: 2018, Volume and Issue: 3, P. 124 - 124

Published: Sept. 24, 2018

The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species genera. Although the PubMLST was conceived as part development first multi-locus typing (MLST) scheme in 1998 software it uses, Bacterial Isolate Genome Sequence database (BIGSdb, published 2010), enables to include all levels data, from single gene sequences up including complete, finished genomes. Here we describe developments BIGSdb made publication June 2018 show how platform realises genomics wide range applications. system is based on gene-by-gene analysis genomes, each deposited annotated identify genes present systematically catalogue their variation. Originally intended means characterising isolates schemes, synthesis records genetic variation permits highly scalable (whole genome tens thousands isolates) addressing functional questions, including: prediction antimicrobial resistance; likely cross-reactivity vaccine antigens; activities variants lead key phenotypes. There are no limitations number sequences, loci, allelic or schemes (combinations loci) can be included, enabling represent an expanding question. In addition providing web-accessible analyses links third-party visualisation tools, includes RESTful application programming interface (API) access underlying applications pipelines.

Language: Английский

Citations

2374

Fast and flexible bacterial genomic epidemiology with PopPUNK DOI Creative Commons
John A. Lees, Simon R. Harris, Gerry Tonkin‐Hill

et al.

Genome Research, Journal Year: 2019, Volume and Issue: 29(2), P. 304 - 316

Published: Jan. 24, 2019

The routine use of genomics for disease surveillance provides the opportunity high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core accessory genomic variation, they cannot both automatically identify, subsequently expand, clusters significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (

Language: Английский

Citations

348

Update on the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes DOI
Raúl Riesco, Martha E. Trujillo

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, Journal Year: 2024, Volume and Issue: 74(3)

Published: March 21, 2024

The field of microbial taxonomy is dynamic, aiming to provide a stable and contemporary classification system for prokaryotes. Traditionally, reliance on phenotypic characteristics limited the comprehensive understanding diversity evolution. introduction molecular techniques, particularly DNA sequencing genomics, has transformed our perception prokaryotic diversity. In past two decades, advancements in genome have transitioned from traditional methods genome-based taxonomic framework, not only define species, but also higher ranks. As technology databases rapidly expand, maintaining updated standards crucial. This work seeks revise 2018 guidelines applying data taxonomy, adapting minimal recommendations reflect technological progress during this period.

Language: Английский

Citations

156

Typing methods based on whole genome sequencing data DOI Creative Commons
Laura Uelze, Josephine Grützke, Maria Borowiak

et al.

One Health Outlook, Journal Year: 2020, Volume and Issue: 2(1)

Published: Feb. 18, 2020

Abstract Whole genome sequencing (WGS) of foodborne pathogens has become an effective method for investigating the information contained in sequence bacterial pathogens. In addition, its highly discriminative power enables comparison genetic relatedness between bacteria even on a sub-species level. For this reason, WGS is being implemented worldwide and across sectors (human, veterinary, food, environment) investigation disease outbreaks, source attribution, improved risk characterization models. order to extract relevant from large quantity complex data produced by WGS, host bioinformatics tools been developed, allowing users analyze interpret data, starting simple gene-searches phylogenetic studies. Depending research question, complexity dataset their skill set, can choose great variety analysis data. review, we describe approaches phylogenomic studies outbreak give overview selected based Despite efforts last years, harmonization standardization typing are still urgently needed allow easy laboratories, moving towards one health surveillance system

Language: Английский

Citations

153

From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic DOI Creative Commons
François Balloux, Ola Brynildsrud, Lucy van Dorp

et al.

Trends in Microbiology, Journal Year: 2018, Volume and Issue: 26(12), P. 1035 - 1048

Published: Sept. 4, 2018

Language: Английский

Citations

156

SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology DOI Creative Commons
Simon R. Harris

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2018, Volume and Issue: unknown

Published: Oct. 25, 2018

Abstract Genome sequencing is revolutionising infectious disease epidemiology, providing a huge step forward in sensitivity and specificity over more traditional molecular typing techniques. However, the complexity of genome data often means that its analysis interpretation requires high-performance compute infrastructure dedicated bioinformatics support. Furthermore, current methods have limitations can differ between analyses are opaque to user, their reliance on multiple external dependencies makes reproducibility difficult. Here I introduce SKA, toolkit for sequence from closely-related, small, haploid genomes. SKA uses split kmers rapidly identify variation sequences, making it possible analyse hundreds genomes standard home computer. Tests publicly available simulated real-life show both faster efficient than gold used today while retaining similar levels accuracy epidemiological purposes. take raw read or assemblies as input calculate pairwise distances, create single linkage clusters align reference using reference-free approach. few decisions be made by which, along with computational efficiency, allows become accessible those only basic training. The also far transparent approaches, future improvements mitigate these possible. Overall, powerful addition armoury genomic epidemiologist. source code Github ( https://github.com/simonrharris/SKA ).

Language: Английский

Citations

84

Robust analysis of prokaryotic pangenome gene gain and loss rates with Panstripe DOI Creative Commons
Gerry Tonkin‐Hill, Rebecca A. Gladstone, Anna K. Pöntinen

et al.

Genome Research, Journal Year: 2023, Volume and Issue: 33(1), P. 129 - 140

Published: Jan. 1, 2023

Horizontal gene transfer (HGT) plays a critical role in the evolution and diversification of many microbial species. The resulting dynamics gain loss can have important implications for development antibiotic resistance design vaccine drug interventions. Methods analysis presence/absence patterns typically do not account errors introduced automated annotation clustering sequences. In particular, methods adapted from ecological studies, including pangenome accumulation curve, be misleading as they may reflect underlying diversity temporal sampling genomes rather than difference HGT. Here, we introduce Panstripe, method based on generalized linear regression that is robust to population structure, bias, predicted genes. We show using simulations Panstripe effectively identify differences rate number genes involved HGT events, illustrate its capability by analyzing several diverse bacterial genome data sets representing major human pathogens.

Language: Английский

Citations

23

Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability DOI Creative Commons
Galo A. Goig, Silvia Álvarez‐Blanco, Alberto L. García‐Basteiro

et al.

BMC Biology, Journal Year: 2020, Volume and Issue: 18(1)

Published: March 2, 2020

Abstract Background Contaminant DNA is a well-known confounding factor in molecular biology and genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data commonly do not account errors potentially introduced by contamination, which could lead to the wrong assessment of allele frequency both basic clinical research. Results We used taxonomic filter remove contaminant reads from more than 4000 bacterial samples 20 different studies performed comprehensive evaluation extent impact WGS. found that contamination pervasive can introduce large biases variant analysis. showed these result hundreds false positive negative SNPs, even with slight contamination. Studies investigating complex biological traits be completely biased if neglected during bioinformatic analysis, we demonstrate removing classifier permits accurate calling. real simulated evaluate implement reliable, contamination-aware pipelines. Conclusion As technologies consolidate as precision tools are increasingly adopted research context, our results urge implementation Taxonomic classifiers powerful tool such

Language: Английский

Citations

69

A fast alignment-free bioinformatics procedure to infer accurate distance-based phylogenetic trees from genome assemblies DOI Creative Commons
Alexis Criscuolo

Research Ideas and Outcomes, Journal Year: 2019, Volume and Issue: 5

Published: June 7, 2019

This paper describes a novel alignment-free distance-based procedure for inferring phylogenetic trees from genome contig sequences using publicly available bioinformatics tools. For each pair of genomes, dissimilarity measure is first computed and next transformed to obtain an estimation the number substitution events that have occurred during their evolution. These pairwise evolutionary distances are then used infer tree assess confidence support internal branch. Analyses both simulated real datasets show this allows accurate be reconstructed with fast running times, especially when launched on multiple threads. Implemented in script, named JolyTree, useful approach quickly species without burden potential biases sequence alignments.

Language: Английский

Citations

68

Seamless, rapid and accurate analyses of outbreak genomic data using Split K-mer Analysis (SKA) DOI Creative Commons
Romain Derelle, Johanna von Wachsmann, Tommi Mäklin

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 29, 2024

Abstract Sequence variation observed in populations of pathogens can be used for important public health and evolution genomic analyses, especially outbreak analysis transmission reconstruction. Identifying this is typically achieved by aligning sequence reads to a reference genome, but approach susceptible biases requires careful filtering called genotypes. Additionally, while the volume bacterial genomes continues grow, tools which accurately quickly call genetic between sequences have not kept pace. There need process large data, providing rapid results, remain simple so they without highly trained bioinformaticians, expensive data analysis, long term storage processing files. Here we describe Split K-mer Analysis (SKA2), method supports both reference-free reference-based mapping genotype bacteria using sequencing or genome assemblies. SKA2 accurate closely related samples, simulations show superior variant recall compared methods, with no false positives. We also that within strains, where it possible construct clonal frame, map variants reference, recombination detection methods rapidly reconstruct vertical evolutionary history. many times faster than comparable add new an existing set, allowing sequential use reanalyse entire collections. Given its robust implementation, inherent absence bias high accuracy, has potential become tool choice genotyping help expand uses epidemiological analyses. implemented Rust freely available at https://github.com/bacpop/ska.rust .

Language: Английский

Citations

7