Accurate and fast graph-based pangenome annotation and clustering with ggCaller DOI Creative Commons
Samuel Horsfield, Gerry Tonkin‐Hill, Nicholas J. Croucher

et al.

Genome Research, Journal Year: 2023, Volume and Issue: 33(9), P. 1622 - 1637

Published: Aug. 24, 2023

Bacterial genomes differ in both gene content and sequence mutations, which underlie extensive phenotypic diversity, including variation susceptibility to antimicrobials or vaccine-induced immunity. To identify quantify important variants, all genes within a population must be predicted, functionally annotated, clustered, representing the “pangenome.” Despite volume of genome data available, prediction annotation are currently conducted isolation on individual genomes, is computationally inefficient frequently inconsistent across genomes. Here, we introduce open-source software graph-gene-caller (ggCaller). ggCaller combines prediction, functional annotation, clustering into single workflow using population-wide de Bruijn graphs, removing redundancy resulting more accurate predictions orthologue clustering. We applied simulated real-world bacterial sets containing hundreds thousands comparing it current state-of-the-art tools. has considerable speed-ups with equivalent greater accuracy, particularly complex sources error, such as assembly contamination fragmentation. also an extension genome-wide association studies, enabling querying annotated graphs for analyses. highlight this application by annotating DNA sequences significant associations tetracycline macrolide resistance Streptococcus pneumoniae , identifying key determinants that were missed when only reference genome. novel analysis tool applications evolution epidemiology.

Language: Английский

Producing polished prokaryotic pangenomes with the Panaroo pipeline DOI Creative Commons
Gerry Tonkin‐Hill, Neil MacAlasdair, Christopher Ruis

et al.

Genome biology, Journal Year: 2020, Volume and Issue: 21(1)

Published: July 22, 2020

Abstract Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content resulting from horizontal transfer, duplication and loss. However, automated annotation is imperfect, errors due to fragmented assemblies, contamination, diverse families mis-assemblies accumulate over population, leading profound consequences when analysing set all genes found a species. Here, we introduce Panaroo, graph-based pangenome clustering tool that able for many sources error introduced during genome assemblies. Panaroo available at https://github.com/gtonkinhill/panaroo .

Language: Английский

Citations

709

Horizontal gene transfer and adaptive evolution in bacteria DOI
Brian J. Arnold, I-Ting Huang, William P. Hanage

et al.

Nature Reviews Microbiology, Journal Year: 2021, Volume and Issue: 20(4), P. 206 - 218

Published: Nov. 12, 2021

Language: Английский

Citations

498

Fast and flexible bacterial genomic epidemiology with PopPUNK DOI Creative Commons
John A. Lees, Simon R. Harris, Gerry Tonkin‐Hill

et al.

Genome Research, Journal Year: 2019, Volume and Issue: 29(2), P. 304 - 316

Published: Jan. 24, 2019

The routine use of genomics for disease surveillance provides the opportunity high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core accessory genomic variation, they cannot both automatically identify, subsequently expand, clusters significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (

Language: Английский

Citations

348

Population genomics of bacterial host adaptation DOI
Samuel K. Sheppard, David S. Guttman, J. Ross Fitzgerald

et al.

Nature Reviews Genetics, Journal Year: 2018, Volume and Issue: 19(9), P. 549 - 565

Published: July 4, 2018

Language: Английский

Citations

223

International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact DOI Creative Commons
Rebecca A. Gladstone, Stephanie W. Lo, John A. Lees

et al.

EBioMedicine, Journal Year: 2019, Volume and Issue: 43, P. 338 - 346

Published: April 16, 2019

Pneumococcal conjugate vaccines have reduced the incidence of invasive pneumococcal disease, caused by vaccine serotypes, but non-vaccine-serotypes remain a concern. We used whole genome sequencing to study serotype, antibiotic resistance and invasiveness, in context genetic background.Our dataset 13,454 genomes, combined with four published genomic datasets, represented Africa (40%), Asia (25%), Europe (19%), North America (12%), South (5%). These 20,027 genomes were clustered into lineages using PopPUNK, named Global Sequence Clusters (GPSCs). From our dataset, we additionally derived serotype sequence type, predicted sensitivity. then measured invasiveness odds ratios that relating prevalence disease carriage.The collections (n = 20,027) 621 GPSCs. Thirty-five GPSCs observed >100 isolates, subsequently classed as dominant-GPSCs. In 22/35 (63%) dominant-GPSCs both non-vaccine serotypes years up until, including, first year introduction. Penicillin multidrug higher (p < .05) subset (14/35, 9/35 respectively), an increasing number classes was associated increased recombination (R2 0.27 p .0001). 28/35 dominant-GPSCs, country isolation significant predictor its antibiogram (mean misclassification error 0.28, SD ± 0.13). detected six backgrounds, when compared other backgrounds expressing same serotype. Up 1.6-fold changes ratio observed.We define can be assigned any aid international comparisons. Existing most preclude removal these vaccines; leaving potential for replacement. A resistance, and/or serotype-independent invasiveness.

Language: Английский

Citations

205

Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics DOI Open Access
Mark R. Davies, Liam McIntyre, Ankur Mutreja

et al.

Nature Genetics, Journal Year: 2019, Volume and Issue: 51(6), P. 1035 - 1043

Published: May 27, 2019

Language: Английский

Citations

176

Genomics and pathotypes of the many faces ofEscherichia coli DOI Creative Commons
Jeroen Geurtsen,

Mark de Been,

Eveline Weerdenburg

et al.

FEMS Microbiology Reviews, Journal Year: 2022, Volume and Issue: 46(6)

Published: June 24, 2022

Escherichia coli is the most researched microbial organism in world. Its varied impact on human health, consisting of commensalism, gastrointestinal disease, or extraintestinal pathologies, has generated a separation species into at least eleven pathotypes (also known as pathovars). These are broadly split two groups, intestinal pathogenic E. (InPEC) and (ExPEC). However, components coli's infinite open accessory genome horizontally transferred with substantial frequency, creating hybrid strains that defy clear pathotype designation. Here, we take birds-eye view species, characterizing it from historical, clinical, genetic perspectives. We examine wide spectrum disease caused by coli, content bacterium, its propensity to acquire, exchange, maintain antibiotic resistance genes virulence traits. Our portrayal also discusses elements have shaped overall population structure summarizes current state vaccine development targeted frequent pathovars. In our conclusions, advocate streamlining efforts for clinical reporting ExPEC, emphasize potential exists throughout entire species.

Language: Английский

Citations

100

Diversification of Colonization Factors in a Multidrug-Resistant Escherichia coli Lineage Evolving under Negative Frequency-Dependent Selection DOI Creative Commons
Alan McNally, Teemu Kallonen, Christopher Connor

et al.

mBio, Journal Year: 2019, Volume and Issue: 10(2)

Published: April 22, 2019

Infections with multidrug-resistant (MDR) strains of Escherichia coli are a significant global public health concern. To combat these pathogens, we need deeper understanding how they evolved from their background populations. By the processes that underpin emergence, can design new strategies to limit evolution clones and existing clones. combining population genomics modelling approaches, show dominant MDR E. under influence negative frequency-dependent selection, preventing them rising fixation in population. Furthermore, this selection acts on genes involved anaerobic metabolism, suggesting key trait, ability colonize human intestinal tracts, is step .

Language: Английский

Citations

121

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions DOI Creative Commons
John A. Lees, The Tien Mai, Marco Galardini

et al.

mBio, Journal Year: 2020, Volume and Issue: 11(4)

Published: July 6, 2020

Being able to identify the genetic variants responsible for specific bacterial phenotypes has been goal of genetics since its inception and is fundamental our current level understanding bacteria. This identification based primarily on painstaking experimentation, but availability large data sets whole genomes with associated phenotype metadata promises revolutionize this approach, not least important clinical that are amenable laboratory analysis. These models phenotype-genotype association can in future be used rapid prediction clinically such as antibiotic resistance virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide study (GWAS) approaches cope bacterium-specific problems, strong population structure horizontal gene exchange, yet optimal. We describe a method advances methodology both generation portable models.

Language: Английский

Citations

101

Global emergence and population dynamics of divergent serotype 3 CC180 pneumococci DOI Creative Commons
Taj Azarian, Patrick K. Mitchell, Maria Georgieva

et al.

PLoS Pathogens, Journal Year: 2018, Volume and Issue: 14(11), P. e1007438 - e1007438

Published: Nov. 26, 2018

Streptococcus pneumoniae serotype 3 remains a significant cause of morbidity and mortality worldwide, despite inclusion in the 13-valent pneumococcal conjugate vaccine (PCV13). Serotype increased carriage since implementation PCV13 USA, while invasive disease rates remain unchanged. We investigated persistence disease, through genomic analyses global sample 301 isolates Netherlands3-31 (PMEN31) clone CC180, combined with associated patient data PCV utilization among countries isolate collection. assessed phenotypic variation between dominant clades capsule charge (zeta potential), capsular polysaccharide shedding, susceptibility to opsonophagocytic killing, which have previously been duration, invasiveness, escape. identified recent shift CC180 population attributed lineage termed Clade II, was estimated by Bayesian coalescent analysis first appeared 1968 [95% HPD: 1939-1989] prevalence effective size thereafter. II are divergent from pre-PCV13 non-capsular antigenic composition, competence, antibiotic susceptibility, last resulting acquisition Tn916-like conjugative transposon. Differences recombination correlated variations ATP-binding subunit Clp protease, as well amino acid substitutions comCDE operon. Opsonophagocytic killing assays elucidated low observed efficacy against 3. Variation use sampled not independently shift; therefore, genotypic differences protein antigens and, particular, resistance may contributed increase II. Our emphasizes need for routine, representative sampling disperse geographic regions, including historically under-sampled areas. also highlight value genomics resolving epidemiological within serotype, implications future development.

Language: Английский

Citations

92