NAR Genomics and Bioinformatics,
Journal Year:
2019,
Volume and Issue:
2(1)
Published: Nov. 14, 2019
Abstract
DNA
barcoding
through
the
use
of
amplified
regions
ribosomal
operon,
such
as
16S
gene,
is
a
routine
method
to
gain
an
overview
microbial
taxonomic
diversity
within
sample
without
need
isolate
and
culture
microbes
present.
However,
bacterial
cells
usually
have
multiple
copies
this
choosing
‘wrong’
copy
could
provide
misleading
species
classification.
While
presents
less
problem
for
well-characterized
organisms
with
large
sequence
databases
interrogate,
it
significant
challenge
lesser
known
unknown
number
diversity.
Using
entire
length
which
encompasses
16S,
23S,
5S
internal
transcribed
spacer
regions,
should
greater
resolution
but
has
not
been
well
explored.
Here,
we
publicly
available
reference
genomes
explore
theoretical
boundaries
when
using
concatenated
genes
full-length
operons,
made
possible
by
development
uptake
long-read
sequencing
technologies.
We
quantify
issues
both
choice
operon
in
phylogenetic
context
demonstrate
that
longer
improve
signal
while
maintaining
accuracy.
Wellcome Open Research,
Journal Year:
2018,
Volume and Issue:
3, P. 124 - 124
Published: Sept. 24, 2018
The
PubMLST.org
website
hosts
a
collection
of
open-access,
curated
databases
that
integrate
population
sequence
data
with
provenance
and
phenotype
information
for
over
100
different
microbial
species
genera.
Although
the
PubMLST
was
conceived
as
part
development
first
multi-locus
typing
(MLST)
scheme
in
1998
software
it
uses,
Bacterial
Isolate
Genome
Sequence
database
(BIGSdb,
published
2010),
enables
to
include
all
levels
data,
from
single
gene
sequences
up
including
complete,
finished
genomes.
Here
we
describe
developments
BIGSdb
made
publication
June
2018
show
how
platform
realises
genomics
wide
range
applications.
system
is
based
on
gene-by-gene
analysis
genomes,
each
deposited
annotated
identify
genes
present
systematically
catalogue
their
variation.
Originally
intended
means
characterising
isolates
schemes,
synthesis
records
genetic
variation
permits
highly
scalable
(whole
genome
tens
thousands
isolates)
addressing
functional
questions,
including:
prediction
antimicrobial
resistance;
likely
cross-reactivity
vaccine
antigens;
activities
variants
lead
key
phenotypes.
There
are
no
limitations
number
sequences,
loci,
allelic
or
schemes
(combinations
loci)
can
be
included,
enabling
represent
an
expanding
question.
In
addition
providing
web-accessible
analyses
links
third-party
visualisation
tools,
includes
RESTful
application
programming
interface
(API)
access
underlying
applications
pipelines.
Genome Research,
Journal Year:
2019,
Volume and Issue:
29(2), P. 304 - 316
Published: Jan. 24, 2019
The
routine
use
of
genomics
for
disease
surveillance
provides
the
opportunity
high-resolution
bacterial
epidemiology.
Current
whole-genome
clustering
and
multilocus
typing
approaches
do
not
fully
exploit
core
accessory
genomic
variation,
they
cannot
both
automatically
identify,
subsequently
expand,
clusters
significantly
similar
isolates
in
large
data
sets
spanning
entire
species.
Here,
we
describe
PopPUNK
(
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY,
Journal Year:
2024,
Volume and Issue:
74(3)
Published: March 21, 2024
The
field
of
microbial
taxonomy
is
dynamic,
aiming
to
provide
a
stable
and
contemporary
classification
system
for
prokaryotes.
Traditionally,
reliance
on
phenotypic
characteristics
limited
the
comprehensive
understanding
diversity
evolution.
introduction
molecular
techniques,
particularly
DNA
sequencing
genomics,
has
transformed
our
perception
prokaryotic
diversity.
In
past
two
decades,
advancements
in
genome
have
transitioned
from
traditional
methods
genome-based
taxonomic
framework,
not
only
define
species,
but
also
higher
ranks.
As
technology
databases
rapidly
expand,
maintaining
updated
standards
crucial.
This
work
seeks
revise
2018
guidelines
applying
data
taxonomy,
adapting
minimal
recommendations
reflect
technological
progress
during
this
period.
One Health Outlook,
Journal Year:
2020,
Volume and Issue:
2(1)
Published: Feb. 18, 2020
Abstract
Whole
genome
sequencing
(WGS)
of
foodborne
pathogens
has
become
an
effective
method
for
investigating
the
information
contained
in
sequence
bacterial
pathogens.
In
addition,
its
highly
discriminative
power
enables
comparison
genetic
relatedness
between
bacteria
even
on
a
sub-species
level.
For
this
reason,
WGS
is
being
implemented
worldwide
and
across
sectors
(human,
veterinary,
food,
environment)
investigation
disease
outbreaks,
source
attribution,
improved
risk
characterization
models.
order
to
extract
relevant
from
large
quantity
complex
data
produced
by
WGS,
host
bioinformatics
tools
been
developed,
allowing
users
analyze
interpret
data,
starting
simple
gene-searches
phylogenetic
studies.
Depending
research
question,
complexity
dataset
their
skill
set,
can
choose
great
variety
analysis
data.
review,
we
describe
approaches
phylogenomic
studies
outbreak
give
overview
selected
based
Despite
efforts
last
years,
harmonization
standardization
typing
are
still
urgently
needed
allow
easy
laboratories,
moving
towards
one
health
surveillance
system
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2018,
Volume and Issue:
unknown
Published: Oct. 25, 2018
Abstract
Genome
sequencing
is
revolutionising
infectious
disease
epidemiology,
providing
a
huge
step
forward
in
sensitivity
and
specificity
over
more
traditional
molecular
typing
techniques.
However,
the
complexity
of
genome
data
often
means
that
its
analysis
interpretation
requires
high-performance
compute
infrastructure
dedicated
bioinformatics
support.
Furthermore,
current
methods
have
limitations
can
differ
between
analyses
are
opaque
to
user,
their
reliance
on
multiple
external
dependencies
makes
reproducibility
difficult.
Here
I
introduce
SKA,
toolkit
for
sequence
from
closely-related,
small,
haploid
genomes.
SKA
uses
split
kmers
rapidly
identify
variation
sequences,
making
it
possible
analyse
hundreds
genomes
standard
home
computer.
Tests
publicly
available
simulated
real-life
show
both
faster
efficient
than
gold
used
today
while
retaining
similar
levels
accuracy
epidemiological
purposes.
take
raw
read
or
assemblies
as
input
calculate
pairwise
distances,
create
single
linkage
clusters
align
reference
using
reference-free
approach.
few
decisions
be
made
by
which,
along
with
computational
efficiency,
allows
become
accessible
those
only
basic
training.
The
also
far
transparent
approaches,
future
improvements
mitigate
these
possible.
Overall,
powerful
addition
armoury
genomic
epidemiologist.
source
code
Github
(
https://github.com/simonrharris/SKA
).
Genome Research,
Journal Year:
2023,
Volume and Issue:
33(1), P. 129 - 140
Published: Jan. 1, 2023
Horizontal
gene
transfer
(HGT)
plays
a
critical
role
in
the
evolution
and
diversification
of
many
microbial
species.
The
resulting
dynamics
gain
loss
can
have
important
implications
for
development
antibiotic
resistance
design
vaccine
drug
interventions.
Methods
analysis
presence/absence
patterns
typically
do
not
account
errors
introduced
automated
annotation
clustering
sequences.
In
particular,
methods
adapted
from
ecological
studies,
including
pangenome
accumulation
curve,
be
misleading
as
they
may
reflect
underlying
diversity
temporal
sampling
genomes
rather
than
difference
HGT.
Here,
we
introduce
Panstripe,
method
based
on
generalized
linear
regression
that
is
robust
to
population
structure,
bias,
predicted
genes.
We
show
using
simulations
Panstripe
effectively
identify
differences
rate
number
genes
involved
HGT
events,
illustrate
its
capability
by
analyzing
several
diverse
bacterial
genome
data
sets
representing
major
human
pathogens.
BMC Biology,
Journal Year:
2020,
Volume and Issue:
18(1)
Published: March 2, 2020
Abstract
Background
Contaminant
DNA
is
a
well-known
confounding
factor
in
molecular
biology
and
genomic
repositories.
Strikingly,
analysis
workflows
for
whole-genome
sequencing
(WGS)
data
commonly
do
not
account
errors
potentially
introduced
by
contamination,
which
could
lead
to
the
wrong
assessment
of
allele
frequency
both
basic
clinical
research.
Results
We
used
taxonomic
filter
remove
contaminant
reads
from
more
than
4000
bacterial
samples
20
different
studies
performed
comprehensive
evaluation
extent
impact
WGS.
found
that
contamination
pervasive
can
introduce
large
biases
variant
analysis.
showed
these
result
hundreds
false
positive
negative
SNPs,
even
with
slight
contamination.
Studies
investigating
complex
biological
traits
be
completely
biased
if
neglected
during
bioinformatic
analysis,
we
demonstrate
removing
classifier
permits
accurate
calling.
real
simulated
evaluate
implement
reliable,
contamination-aware
pipelines.
Conclusion
As
technologies
consolidate
as
precision
tools
are
increasingly
adopted
research
context,
our
results
urge
implementation
Taxonomic
classifiers
powerful
tool
such
Research Ideas and Outcomes,
Journal Year:
2019,
Volume and Issue:
5
Published: June 7, 2019
This
paper
describes
a
novel
alignment-free
distance-based
procedure
for
inferring
phylogenetic
trees
from
genome
contig
sequences
using
publicly
available
bioinformatics
tools.
For
each
pair
of
genomes,
dissimilarity
measure
is
first
computed
and
next
transformed
to
obtain
an
estimation
the
number
substitution
events
that
have
occurred
during
their
evolution.
These
pairwise
evolutionary
distances
are
then
used
infer
tree
assess
confidence
support
internal
branch.
Analyses
both
simulated
real
datasets
show
this
allows
accurate
be
reconstructed
with
fast
running
times,
especially
when
launched
on
multiple
threads.
Implemented
in
script,
named
JolyTree,
useful
approach
quickly
species
without
burden
potential
biases
sequence
alignments.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 29, 2024
Abstract
Sequence
variation
observed
in
populations
of
pathogens
can
be
used
for
important
public
health
and
evolution
genomic
analyses,
especially
outbreak
analysis
transmission
reconstruction.
Identifying
this
is
typically
achieved
by
aligning
sequence
reads
to
a
reference
genome,
but
approach
susceptible
biases
requires
careful
filtering
called
genotypes.
Additionally,
while
the
volume
bacterial
genomes
continues
grow,
tools
which
accurately
quickly
call
genetic
between
sequences
have
not
kept
pace.
There
need
process
large
data,
providing
rapid
results,
remain
simple
so
they
without
highly
trained
bioinformaticians,
expensive
data
analysis,
long
term
storage
processing
files.
Here
we
describe
Split
K-mer
Analysis
(SKA2),
method
supports
both
reference-free
reference-based
mapping
genotype
bacteria
using
sequencing
or
genome
assemblies.
SKA2
accurate
closely
related
samples,
simulations
show
superior
variant
recall
compared
methods,
with
no
false
positives.
We
also
that
within
strains,
where
it
possible
construct
clonal
frame,
map
variants
reference,
recombination
detection
methods
rapidly
reconstruct
vertical
evolutionary
history.
many
times
faster
than
comparable
add
new
an
existing
set,
allowing
sequential
use
reanalyse
entire
collections.
Given
its
robust
implementation,
inherent
absence
bias
high
accuracy,
has
potential
become
tool
choice
genotyping
help
expand
uses
epidemiological
analyses.
implemented
Rust
freely
available
at
https://github.com/bacpop/ska.rust
.