Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data
eLife,
Journal Year:
2024,
Volume and Issue:
13
Published: May 20, 2024
Variant
calling
is
fundamental
in
bacterial
genomics,
underpinning
the
identification
of
disease
transmission
clusters,
construction
phylogenetic
trees,
and
antimicrobial
resistance
detection.
This
study
presents
a
comprehensive
benchmarking
variant
accuracy
genomes
using
Oxford
Nanopore
Technologies
(ONT)
sequencing
data.
We
evaluated
three
ONT
basecalling
models
both
simplex
(single-strand)
duplex
(dual-strand)
read
types
across
14
diverse
species.
Our
findings
reveal
that
deep
learning-based
callers,
particularly
Clair3
DeepVariant,
significantly
outperform
traditional
methods
even
exceed
Illumina
sequencing,
especially
when
applied
to
ONT’s
super-high
model.
superior
performance
attributed
its
ability
overcome
Illumina’s
errors,
which
often
arise
from
difficulties
aligning
reads
repetitive
variant-dense
genomic
regions.
Moreover,
use
high-performing
callers
with
data
mitigates
errors
homopolymers.
also
investigated
impact
depth
on
calling,
demonstrating
10×
super-accuracy
can
achieve
precision
recall
comparable
to,
or
better
than,
full-depth
sequencing.
These
results
underscore
potential
combined
advanced
algorithms,
replace
short-read
resource-limited
settings.
Language: Английский
Characteristics and filtering of low-frequency artificial short deletion variations based on nanopore sequencing
GigaScience,
Journal Year:
2025,
Volume and Issue:
14
Published: Jan. 1, 2025
Nanopore
sequencing
is
characterized
by
high
portability
and
long
reads,
albeit
accompanied
systematic
errors
causing
short
deletions.
Few
tools
can
filter
low-frequency
artificial
deletions,
especially
in
single
samples.
To
solve
this
problem,
we
first
synthesized
or
purchased
17
DNA/RNA
standards
for
nanopore
with
R9
R10
flowcells
to
obtain
benchmarking
datasets.
False-positive
(FP)
deletions
were
prevalent
(75.86%-96.26%),
while
the
majority
(62.07%-79.68%)
located
homopolymeric
regions.
The
10-mer
base-quality
scores
(Q
scores)
speeds
flanking
FP
marginally
differed
from
true-positive
(TP)
We
thus
investigated
raw
current
signals
after
normalizing
them
length.
found
more
significant
differences
between
reads
without
Indexes
including
MRPP
A
(Multiple
Response
Permutation
Procedure,
statistic
A),
accumulative
difference
of
normalized
signals,
Q
score
tested
power
distinguishing
TP
outperformed
other
indexes
regions
achieved
highest
accuracy
76.73%
challenging
1-base
When
depth
was
low,
performed
better
than
A.
developed
Delter
(Deletion
filter)
samples,
which
removed
60.98%
100%
real
Low-frequency
deletion
variations,
most
could
be
effectively
filtered
using
according
employed
strategies.
Language: Английский
Transcriptomics in the era of long-read sequencing
Nature Reviews Genetics,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 28, 2025
Language: Английский
Custom barcoded primers for influenza A nanopore sequencing: enhanced performance with reduced preparation time
Frontiers in Cellular and Infection Microbiology,
Journal Year:
2025,
Volume and Issue:
15
Published: April 15, 2025
Highly
pathogenic
avian
influenza
is
endemic
and
widespread
in
wild
birds
causing
major
outbreaks
poultry
worldwide
U.S.
dairy
cows,
with
several
recent
human
cases,
highlighting
the
need
for
reliable
rapid
sequencing
to
track
mutations
that
may
facilitate
viral
replication
different
hosts.
SNP
analysis
a
useful
molecular
epidemiology
tool
outbreaks,
but
it
requires
accurate
whole-genome
(WGS)
sufficient
read
depth
across
all
eight
segments.
In
outbreak
situations,
where
timely
data
critical
controlling
spread
of
virus,
reducing
preparation
time
while
maintaining
high-quality
standards
particularly
important.
this
study,
we
optimized
custom
barcoded
primer
strategy
A
on
nanopore
platform,
combining
high
performance
Native
Barcoding
Kit
prompt
Rapid
Kit.
Custom
primers
were
designed
perform
barcode
attachment
during
RT-PCR
amplification,
eliminating
separate
barcoding
clean-up
steps,
thus
library
time.
We
compared
method
kits
terms
quality,
depth,
output.
The
results
show
provided
comparable
by
2.3X
kit
being
only
15
minutes
longer
than
better
sequencing.
Additionally,
was
evaluated
variety
clinical
sample
types.
This
approach
offers
promising
solution
sequencing,
providing
both
throughput
efficiency,
which
significantly
improves
time-to-result
turnaround,
making
more
accessible
real-time
surveillance.
Language: Английский
Artificial intelligence in variant calling: a review
Frontiers in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
5
Published: April 23, 2025
Artificial
intelligence
(AI)
has
revolutionized
numerous
fields,
including
genomics,
where
it
significantly
impacted
variant
calling,
a
crucial
process
in
genomic
analysis.
Variant
calling
involves
the
detection
of
genetic
variants
such
as
single
nucleotide
polymorphisms
(SNPs),
insertions/deletions
(InDels),
and
structural
from
high-throughput
sequencing
data.
Traditionally,
statistical
approaches
have
dominated
this
task,
but
advent
AI
led
to
development
sophisticated
tools
that
promise
higher
accuracy,
efficiency,
scalability.
This
review
explores
state-of-the-art
AI-based
tools,
DeepVariant,
DNAscope,
DeepTrio,
Clair,
Clairvoyante,
Medaka,
HELLO.
We
discuss
their
underlying
methodologies,
strengths,
limitations,
performance
metrics
across
different
technologies,
alongside
computational
requirements,
focusing
primarily
on
SNP
InDel
detection.
By
comparing
these
AI-driven
techniques
with
conventional
methods,
we
highlight
transformative
advancements
introduced
its
potential
further
enhance
research.
Language: Английский
Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Dec. 13, 2023
Abstract
Improvements
in
the
accuracy
and
availability
of
long-read
sequencing
mean
that
complete
bacterial
genomes
are
now
routinely
reconstructed
using
hybrid
(i.e.
short-
long-reads)
assembly
approaches.
Complete
allow
a
deeper
understanding
evolution
genomic
variation
beyond
single
nucleotide
variants
(SNVs).
They
also
crucial
for
identifying
plasmids,
which
often
carry
medically
significant
antimicrobial
resistance
(AMR)
genes.
However,
small
plasmids
missed
or
misassembled
by
algorithms.
Here,
we
present
Hybracter
allows
fast,
automatic,
scalable
recovery
near-perfect
first
approach.
can
be
run
either
as
assembler
only
assembler.
We
compared
to
existing
automated
tools
diverse
panel
samples
varying
levels
with
manually
curated
ground
truth
reference
genomes.
demonstrate
is
more
accurate
faster
than
gold
standard
Unicycler.
show
long-reads
most
comparable
methods
accurately
recovering
plasmids.
Data
Summary
developed
Python
Snakemake
command-line
software
tool
Linux
MacOS
systems.
freely
available
under
an
MIT
License
on
GitHub
(
https://github.com/gbouras13/hybracter
)
documentation
at
Read
Docs
https://hybracter.readthedocs.io/en/latest/
).
install
via
PyPI
https://pypi.org/project/hybracter/
Bioconda
https://anaconda.org/bioconda/hybracter
A
Docker/Singularity
container
https://quay.io/repository/gbouras13/hybracter
.
All
code
used
benchmark
Hybracter,
including
genomes,
publicly
https://github.com/gbouras13/hybracter_benchmarking
released
DOI
https://zenodo.org/doi/10.5281/zenodo.10910108
Zenodo.
The
subsampled
FASTQ
files
benchmarking
Zenodo
https://doi.org/10.5281/zenodo.10906937
super
simplex
ATCC
reads
sequenced
part
this
study
found
BioProject
PRJNA1042815.
Hall
et
al.
fast
duplex
read
(prior
subsampling)
SRA
PRJNA1087001.
raw
Lermaniaux
PRJNA1020811.
Staphylococcus
aureus
JKD6159
PRJNA50759.
Mycobacterium
tuberculosis
H37R2
PRJNA836783.
list
BioSample
accession
numbers
each
benchmarked
sample
Supplementary
Table
1.
output
Pypolca
outputs
https://zenodo.org/doi/10.5281/zenodo.10072192
Impact
Statement
genome
routine
vital
genomics,
especially
identification
mobile
genetic
elements
As
becomes
cheaper,
easier
access
accurate,
crucial.
With
new
widely-used
both
only.
Additionally,
it
solves
problems
assemblers
struggling
plasmid
from
performing
par
methods.
natively
exploit
parallelisation
high-performance
computing
(HPC)
clusters
cloud-based
environments,
enabling
users
assemble
hundreds
thousands
one
line
code.
source
GitHub,
PyPi.
Language: Английский
Rapid whole genome characterization of antimicrobial-resistant pathogens using long-read sequencing to identify potential healthcare transmission
Infection Control and Hospital Epidemiology,
Journal Year:
2024,
Volume and Issue:
46(2), P. 129 - 135
Published: Dec. 27, 2024
Whole
genome
sequencing
(WGS)
can
help
identify
transmission
of
pathogens
causing
healthcare-associated
infections
(HAIs).
However,
the
current
gold
standard
short-read,
Illumina-based
WGS
is
labor
and
time
intensive.
Given
recent
improvements
in
long-read
Oxford
Nanopore
Technologies
(ONT)
sequencing,
we
sought
to
establish
a
low
resource
approach
providing
accurate
WGS-pathogen
comparison
within
frame
allowing
for
infection
prevention
control
(IPC)
interventions.
was
prospectively
performed
on
at
increased
risk
potential
healthcare
using
ONT
MinION
sequencer
with
R10.4.1
flow
cells
Dorado
basecaller.
Potential
assessed
via
Ridom
SeqSphere+
core
multilocus
sequence
typing
MINTyper
reference-based
single
nucleotide
polymorphisms
previously
published
cutoff
values.
The
accuracy
our
pipeline
determined
relative
Illumina.
Over
six-month
period,
242
bacterial
isolates
from
216
patients
were
sequenced
by
operator.
Compared
Illumina
standard,
achieved
mean
identity
score
Q60
assembled
genomes,
even
coverage
rate
as
40×.
initiating
DNA
extraction
complete
analysis
2
days
(IQR
2-3.25
days).
We
identified
five
clusters
comprising
21
(8.7%
strains).
Integrating
epidemiological
data,
>70%
(15/21)
putative
cluster
originated
links.
Via
stand-alone
pipeline,
detected
potentially
transmitted
HAI
rapidly
accurately,
aligning
closely
data.
Our
low-resource
method
has
assist
IPC
efforts.
Language: Английский