PeerJ,
Год журнала:
2018,
Номер
6, С. e4925 - e4925
Опубликована: Май 28, 2018
High-throughput
amplicon
sequencing
(HTAS)
of
conserved
DNA
regions
is
a
powerful
technique
to
characterize
microbial
communities.
Recently,
spike-in
mock
communities
have
been
used
measure
accuracy
platforms
and
data
analysis
pipelines.
To
assess
the
ability
processing
pipelines
using
fungal
internal
transcribed
spacer
(ITS)
amplicons,
we
created
two
ITS
control
composed
cloned
in
plasmids:
biological
community,
consisting
sequences
from
taxa,
synthetic
community
(SynMock),
non-biological
ITS-like
sequences.
Using
these
controls
show
that:
(1)
(e.g.,
SynMock)
best
solution
for
parameterizing
bioinformatics
pipelines,
(2)
pre-clustering
steps
variable
length
amplicons
are
critically
important,
(3)
major
source
bias
attributed
initial
polymerase
chain
reaction
(PCR)
thus
HTAS
read
abundances
typically
not
representative
starting
values.
We
developed
AMPtk,
versatile
software
equipped
deal
with
quality
filter
based
on
controls.
While
describe
herein
SynMock
sequences,
concept
AMPtk
can
be
widely
applied
any
dataset
improve
quality.
PeerJ,
Год журнала:
2016,
Номер
4, С. e2584 - e2584
Опубликована: Окт. 18, 2016
VSEARCH
is
an
open
source
and
free
of
charge
multithreaded
64-bit
tool
for
processing
preparing
metagenomics,
genomics
population
nucleotide
sequence
data.
It
designed
as
alternative
to
the
widely
used
USEARCH
(Edgar,
2010)
which
code
not
publicly
available,
algorithm
details
are
only
rudimentarily
described,
a
memory-confined
32-bit
version
freely
available
academic
use.When
searching
sequences,
uses
fast
heuristic
based
on
words
shared
by
query
target
sequences
in
order
quickly
identify
similar
strategy
probably
USEARCH.
then
performs
optimal
global
alignment
against
potential
using
full
dynamic
programming
instead
seed-and-extend
Pairwise
alignments
computed
parallel
vectorisation
multiple
threads.VSEARCH
includes
most
commands
analysing
7
several
those
8,
including
(exact
or
alignment),
clustering
similarity
(using
length
pre-sorting,
abundance
pre-sorting
user-defined
order),
chimera
detection
(reference-based
de
novo),
dereplication
(full
prefix),
pairwise
alignment,
reverse
complementation,
sorting,
subsampling.
also
FASTQ
file
processing,
i.e.,
format
detection,
filtering,
read
quality
statistics,
merging
paired
reads.
Furthermore,
extends
functionality
with
new
improvements,
shuffling,
rereplication,
masking
low-complexity
well-known
DUST
algorithm,
choice
among
different
definitions,
conversion.
here
shown
be
more
accurate
than
when
performing
searching,
clustering,
subsampling,
while
par
paired-ends
merging.
slower
but
significantly
faster
paired-end
reads
dereplication.
at
https://github.com/torognes/vsearch
under
either
BSD
2-clause
license
GNU
General
Public
License
3.0.VSEARCH
has
been
fast,
full-fledged
A
open-source
versatile
analysis
now
metagenomics
community.
The ISME Journal,
Год журнала:
2017,
Номер
11(12), С. 2639 - 2643
Опубликована: Июль 21, 2017
Abstract
Recent
advances
have
made
it
possible
to
analyze
high-throughput
marker-gene
sequencing
data
without
resorting
the
customary
construction
of
molecular
operational
taxonomic
units
(OTUs):
clusters
reads
that
differ
by
less
than
a
fixed
dissimilarity
threshold.
New
methods
control
errors
sufficiently
such
amplicon
sequence
variants
(ASVs)
can
be
resolved
exactly,
down
level
single-nucleotide
differences
over
sequenced
gene
region.
The
benefits
finer
resolution
are
immediately
apparent,
and
arguments
for
ASV
focused
on
their
improved
resolution.
Less
obvious,
but
we
believe
more
important,
broad
derive
from
status
ASVs
as
consistent
labels
with
intrinsic
biological
meaning
identified
independently
reference
database.
Here
discuss
how
these
features
grant
combined
advantages
closed-reference
OTUs—including
computational
costs
scale
linearly
study
size,
simple
merging
between
processed
sets,
forward
prediction—and
de
novo
accurate
measurement
diversity
applicability
communities
lacking
deep
coverage
in
databases.
We
argue
improvements
reusability,
reproducibility
comprehensiveness
great
should
replace
OTUs
standard
unit
analysis
reporting.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2016,
Номер
unknown
Опубликована: Окт. 15, 2016
Abstract
Amplicon
sequencing
of
tags
such
as
16S
and
ITS
ribosomal
RNA
is
a
popular
method
for
investigating
microbial
populations.
In
experiments,
sequence
errors
caused
by
PCR
are
difficult
to
distinguish
from
true
biological
variation.
I
describe
UNOISE2,
an
updated
version
the
UNOISE
algorithm
denoising
(error-correcting)
Illumina
amplicon
reads
show
that
it
has
comparable
or
better
accuracy
than
DADA2.
Conservation Genetics,
Год журнала:
2015,
Номер
17(1), С. 1 - 17
Опубликована: Сен. 8, 2015
Environmental
DNA
(eDNA)
refers
to
the
genetic
material
that
can
be
extracted
from
bulk
environmental
samples
such
as
soil,
water,
and
even
air.
The
rapidly
expanding
study
of
eDNA
has
generated
unprecedented
ability
detect
species
conduct
analyses
for
conservation,
management,
research,
particularly
in
scenarios
where
collection
whole
organisms
is
impractical
or
impossible.
While
number
studies
demonstrating
successful
detection
increased
recent
years,
less
research
explored
"ecology"
eDNA—myriad
interactions
between
extraorganismal
its
environment—and
influence
on
detection,
quantification,
analysis,
application
conservation
research.
Here,
we
outline
a
framework
understanding
ecology
eDNA,
including
origin,
state,
transport,
fate
material.
Using
this
framework,
review
synthesize
findings
diverse
environments,
taxa,
fields
highlight
important
concepts
knowledge
gaps
application.
Additionally,
identify
frontiers
conservation-focused
see
most
potential
growth,
use
estimating
population
size,
genomic
via
inclusion
other
indicator
biomolecules
RNA
proteins,
automated
sample
consideration
an
expanded
array
creative
samples.
We
discuss
how
more
complete
integral
advancing
these
maximizing
future
applications
F1000Research,
Год журнала:
2016,
Номер
5, С. 1492 - 1492
Опубликована: Ноя. 2, 2016
High-throughput
sequencing
of
PCR-amplified
taxonomic
markers
(like
the
16S
rRNA
gene)
has
enabled
a
new
level
analysis
complex
bacterial
communities
known
as
microbiomes.
Many
tools
exist
to
quantify
and
compare
abundance
levels
or
OTU
composition
in
different
conditions.
The
reads
have
be
denoised
assigned
closest
taxa
from
reference
database.
Common
approaches
use
notion
97%
similarity
normalize
data
by
subsampling
equalize
library
sizes.
In
this
paper,
we
show
that
statistical
models
allow
more
accurate
estimates.
By
providing
complete
workflow
R,
enable
user
do
sophisticated
downstream
analyses,
whether
parametric
nonparametric.
We
provide
examples
using
R
packages
dada2,
phyloseq,
DESeq2,
ggplot2
vegan
filter,
visualize
test
microbiome
data.
also
supervised
analyses
random
forests
nonparametric
testing
community
networks
ggnetwork
package.
F1000Research,
Год журнала:
2016,
Номер
5, С. 1492 - 1492
Опубликована: Июнь 24, 2016
High-throughput
sequencing
of
PCR-amplified
taxonomic
markers
(like
the
16S
rRNA
gene)
has
enabled
a
new
level
analysis
complex
bacterial
communities
known
as
microbiomes.
Many
tools
exist
to
quantify
and
compare
abundance
levels
or
microbial
composition
in
different
conditions.
The
reads
have
be
denoised
assigned
closest
taxa
from
reference
database.
Common
approaches
use
notion
97%
similarity
normalize
data
by
subsampling
equalize
library
sizes.
In
this
paper,
we
show
that
statistical
models
allow
more
accurate
estimates.
By
providing
complete
workflow
R,
enable
user
do
sophisticated
downstream
analyses,
including
both
parameteric
nonparametric
methods.
We
provide
examples
using
R
packages
dada2,
phyloseq,
DESeq2,
ggplot2
vegan
filter,
visualize
test
microbiome
data.
also
supervised
analyses
random
forests,
partial
least
squares
linear
well
testing
community
networks
ggnetwork
package.
Protein & Cell,
Год журнала:
2020,
Номер
12(5), С. 315 - 330
Опубликована: Май 11, 2020
Abstract
Advances
in
high-throughput
sequencing
(HTS)
have
fostered
rapid
developments
the
field
of
microbiome
research,
and
massive
datasets
are
now
being
generated.
However,
diversity
software
tools
complexity
analysis
pipelines
make
it
difficult
to
access
this
field.
Here,
we
systematically
summarize
advantages
limitations
methods.
Then,
recommend
specific
for
amplicon
metagenomic
analyses,
describe
commonly-used
databases,
help
researchers
select
appropriate
tools.
Furthermore,
introduce
statistical
visualization
methods
suitable
analysis,
including
alpha-
beta-diversity,
taxonomic
composition,
difference
comparisons,
correlation,
networks,
machine
learning,
evolution,
source
tracing,
common
styles
informed
choices.
Finally,
a
step-by-step
reproducible
guide
is
introduced.
We
hope
review
will
allow
carry
out
data
more
effectively
quickly
order
efficiently
mine
biological
significance
behind
data.
Bioinformatics,
Год журнала:
2018,
Номер
34(14), С. 2371 - 2375
Опубликована: Фев. 27, 2018
The
16S
ribosomal
RNA
(rRNA)
gene
is
widely
used
to
survey
microbial
communities.
Sequences
are
often
clustered
into
Operational
Taxonomic
Units
(OTUs)
as
proxies
for
species.
canonical
clustering
threshold
97%
identity,
which
was
proposed
in
1994
when
few
rRNA
sequences
were
available,
motivating
a
reassessment
on
current
data.Using
large
set
of
high-quality
from
finished
genomes,
I
assessed
the
correspondence
OTUs
species
five
representative
algorithms
using
four
accuracy
metrics.
All
had
comparable
tuned
given
metric.
Optimal
identity
thresholds
∼99%
full-length
and
∼100%
V4
hypervariable
region.Reference
source
code
provided
Supplementary
Material.Supplementary
data
available
at
Bioinformatics
online.