Accurate and fast graph-based pangenome annotation and clustering with ggCaller DOI Creative Commons
Samuel Horsfield, Gerry Tonkin‐Hill, Nicholas J. Croucher

et al.

Genome Research, Journal Year: 2023, Volume and Issue: 33(9), P. 1622 - 1637

Published: Aug. 24, 2023

Bacterial genomes differ in both gene content and sequence mutations, which underlie extensive phenotypic diversity, including variation susceptibility to antimicrobials or vaccine-induced immunity. To identify quantify important variants, all genes within a population must be predicted, functionally annotated, clustered, representing the “pangenome.” Despite volume of genome data available, prediction annotation are currently conducted isolation on individual genomes, is computationally inefficient frequently inconsistent across genomes. Here, we introduce open-source software graph-gene-caller (ggCaller). ggCaller combines prediction, functional annotation, clustering into single workflow using population-wide de Bruijn graphs, removing redundancy resulting more accurate predictions orthologue clustering. We applied simulated real-world bacterial sets containing hundreds thousands comparing it current state-of-the-art tools. has considerable speed-ups with equivalent greater accuracy, particularly complex sources error, such as assembly contamination fragmentation. also an extension genome-wide association studies, enabling querying annotated graphs for analyses. highlight this application by annotating DNA sequences significant associations tetracycline macrolide resistance Streptococcus pneumoniae , identifying key determinants that were missed when only reference genome. novel analysis tool applications evolution epidemiology.

Language: Английский

Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study DOI Creative Commons
John A. Lees, Michelle Kendall, Julian Parkhill

et al.

Wellcome Open Research, Journal Year: 2018, Volume and Issue: 3, P. 33 - 33

Published: May 29, 2018

Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are available methods to infer phylogenies, and these have various advantages disadvantages, but few unbiased comparisons of the range approaches been made. Methods: We simulated defined 'true tree' using realistic evolutionary model. built phylogenies this methods, compared reconstructed trees true tree two measures, noting computational time needed for different phylogenetic reconstructions. also used real Streptococcus pneumoniae alignments compare individual core gene tree. Results: found that, as expected, maximum likelihood good quality were most accurate, computationally intensive. Using less accurate we able obtain results comparable accuracy; that approximate can rapidly be obtained genetic distance based methods. In highly conserved genes, such those involved translation, gave an inaccurate topology, whereas genes recombination events branch lengths. show tree-of-trees, relating reconstructions each other. Conclusions: recommend three approaches, depending on requirements accuracy time. For tree, either RAxML or IQ-TREE with alignment variable sites produced by mapping reference best. Quicker do not perform full optimisation may useful requiring phylogeny, generating high input likely major limiting factor topology. publicly released our code enable further comparisons.

Language: Английский

Citations

55

Pneumococcal Vaccines: Host Interactions, Population Dynamics, and Design Principles DOI Open Access
Nicholas J. Croucher, Alessandra Løchen, Stephen D. Bentley

et al.

Annual Review of Microbiology, Journal Year: 2018, Volume and Issue: 72(1), P. 521 - 549

Published: Sept. 8, 2018

Streptococcus pneumoniae (the pneumococcus) is a nasopharyngeal commensal and respiratory pathogen. Most isolates express capsule, the species-wide diversity of which has been immunologically classified into ∼100 serotypes. Capsule polysaccharides have combined multivalent vaccines widely used in adults, but T cell independence antibody response means they are not protective infants. Polysaccharide conjugate (PCVs) trigger cell–dependent through attaching carrier protein to capsular polysaccharides. The immune stimulated by PCVs infants inhibits carriage vaccine serotypes (VTs), resulting population-wide herd immunity. These were replaced non-VTs. Nevertheless, drove reductions infant pneumococcal disease, due lower mean invasiveness postvaccination bacterial population; age-varying serotype resulted smaller reduction adult disease. Alternative being tested trials designed provide protection stimulating innate cellular responses, alongside antibodies conserved antigens.

Language: Английский

Citations

55

Visualizing variation within Global Pneumococcal Sequence Clusters (GPSCs) and country population snapshots to contextualize pneumococcal isolates DOI Creative Commons
Rebecca A. Gladstone, Stephanie W. Lo, Richard Goater

et al.

Microbial Genomics, Journal Year: 2020, Volume and Issue: 6(5)

Published: May 1, 2020

Knowledge of pneumococcal lineages, their geographic distribution and antibiotic resistance patterns, can give insights into global disease. We provide interactive bioinformatic outputs to explore such topics, aiming increase dissemination genomic the wider community, without need for specialist training. prepared 12 country-specific phylogenetic snapshots, international snapshots 73 common Global Pneumococcal Sequence Clusters (GPSCs) previously defined using PopPUNK, present them in Microreact. Gene presence absence Roary, recombination profiles derived from Gubbins are presented Phandango each GPSC. Temporal signal was assessed GPSC BactDating. examples how resources be used. In our example use a snapshot we determined that serotype 14 observed nine unrelated genetic backgrounds South Africa. The GPSC9, which most isolates Africa were observed, highlights there three independent sub-clusters represented by African isolates. estimated GPSC9-dated tree established during 1980s. show plots allowed identification 20 kb spanning capsular polysaccharide locus within GPSC97. This consistent with switch 6A 19A have occured 1990s GPSC97-dated tree. Plots gene presence/absence genes ( tet , erm cat ) across GPSC23 phylogeny acquisition composite transposon. GPSC23-dated occurred between 1953 1975. Finally, demonstrate assignment GPSC31 17 externally generated 1 assemblies Utah via Pathogenwatch. Most clustered USA-specific clade recent ancestor 1958 1981. provided used data, test hypothesis generate new hypotheses. accessible GPSCs allows others contextualize own collections beyond data here.

Language: Английский

Citations

46

Challenges in prokaryote pangenomics DOI Creative Commons
Gerry Tonkin‐Hill, Jukka Corander, Julian Parkhill

et al.

Microbial Genomics, Journal Year: 2023, Volume and Issue: 9(5)

Published: May 25, 2023

Horizontal gene transfer (HGT) and the resulting patterns of gain loss are a fundamental part bacterial evolution. Investigating these can help us to understand role selection in evolution pangenomes how bacteria adapt new niche. Predicting presence or absence genes be highly error-prone process that confound efforts dynamics horizontal transfer. This review discusses both challenges accurately constructing pangenome potential consequences errors have on downstream analyses. We hope by summarizing issues researchers will able avoid pitfalls, leading improved

Language: Английский

Citations

15

Accurate and fast graph-based pangenome annotation and clustering with ggCaller DOI Creative Commons
Samuel Horsfield, Gerry Tonkin‐Hill, Nicholas J. Croucher

et al.

Genome Research, Journal Year: 2023, Volume and Issue: 33(9), P. 1622 - 1637

Published: Aug. 24, 2023

Bacterial genomes differ in both gene content and sequence mutations, which underlie extensive phenotypic diversity, including variation susceptibility to antimicrobials or vaccine-induced immunity. To identify quantify important variants, all genes within a population must be predicted, functionally annotated, clustered, representing the “pangenome.” Despite volume of genome data available, prediction annotation are currently conducted isolation on individual genomes, is computationally inefficient frequently inconsistent across genomes. Here, we introduce open-source software graph-gene-caller (ggCaller). ggCaller combines prediction, functional annotation, clustering into single workflow using population-wide de Bruijn graphs, removing redundancy resulting more accurate predictions orthologue clustering. We applied simulated real-world bacterial sets containing hundreds thousands comparing it current state-of-the-art tools. has considerable speed-ups with equivalent greater accuracy, particularly complex sources error, such as assembly contamination fragmentation. also an extension genome-wide association studies, enabling querying annotated graphs for analyses. highlight this application by annotating DNA sequences significant associations tetracycline macrolide resistance Streptococcus pneumoniae , identifying key determinants that were missed when only reference genome. novel analysis tool applications evolution epidemiology.

Language: Английский

Citations

14