Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 11, 2024
Abstract
Nanopore
signal
analysis
enables
detection
of
nucleotide
modifications
from
native
DNA
and
RNA
sequencing,
providing
both
accurate
genetic/transcriptomic
epigenetic
information
without
additional
library
preparation.
Presently,
only
a
limited
set
can
be
directly
basecalled
(e.g.
5-methylcytosine),
while
most
others
require
exploratory
methods
that
often
begin
with
alignment
nanopore
to
reference.
We
present
Uncalled4,
toolkit
for
alignment,
analysis,
visualization.
Uncalled4
features
an
efficient
banded
algorithm,
BAM
file
format,
statistics
comparing
methods,
reproducible
de
novo
training
method
k-mer-based
pore
models,
revealing
potential
errors
in
ONT’s
state-of-the-art
model.
apply
6-methyladenine
(m6A)
seven
human
cell
lines,
identifying
26%
more
than
Nanopolish
using
m6Anet,
including
several
genes
where
m6A
has
known
implications
cancer.
is
available
open-source
at
github.com/skovaka/uncalled4
.
Language: Английский
RawHash2: Mapping Raw Nanopore Signals Using Hash-Based Seeding and Adaptive Quantization
Bioinformatics,
Journal Year:
2024,
Volume and Issue:
40(8)
Published: July 30, 2024
Raw
nanopore
signals
can
be
analyzed
while
they
are
being
generated,
a
process
known
as
real-time
analysis.
Real-time
analysis
of
raw
is
essential
to
utilize
the
unique
features
that
sequencing
provides,
enabling
early
stopping
read
or
entire
run
based
on
The
state-of-the-art
mechanism,
RawHash,
offers
first
hash-based
efficient
and
accurate
similarity
identification
between
reference
genome
by
quickly
matching
their
hash
values.
In
this
work,
we
introduce
RawHash2,
which
provides
major
improvements
over
including
more
sensitive
quantization
chaining
algorithms,
weighted
mapping
decisions,
frequency
filters
reduce
ambiguous
seed
hits,
minimizers
for
sketching,
support
R10.4
flow
cell
version
POD5
SLOW5
file
formats.
Compared
RawHash2
better
F1
accuracy
(on
average
10.57%
up
20.25%)
throughput
4.0×
9.9×)
than
RawHash.
Language: Английский
Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment
Nature Methods,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 28, 2025
Nanopore
signal
analysis
enables
detection
of
nucleotide
modifications
from
native
DNA
and
RNA
sequencing,
providing
both
accurate
genetic
or
transcriptomic
epigenetic
information
without
additional
library
preparation.
At
present,
only
a
limited
set
can
be
directly
basecalled
(for
example,
5-methylcytosine),
while
most
others
require
exploratory
methods
that
often
begin
with
alignment
nanopore
to
reference.
We
present
Uncalled4,
toolkit
for
alignment,
visualization.
Uncalled4
features
an
efficient
banded
algorithm,
BAM
file
format,
statistics
comparing
reproducible
de
novo
training
method
k-mer-based
pore
models,
revealing
potential
errors
in
Oxford
Technologies'
state-of-the-art
model.
apply
6-methyladenine
(m6A)
seven
human
cell
lines,
identifying
26%
more
than
Nanopolish
using
m6Anet,
including
several
genes
where
m6A
has
known
implications
cancer.
is
available
open
source
at
github.com/skovaka/uncalled4
.
Language: Английский
Improved Pangenomic Classification Accuracy with Chain Statistics
Lecture notes in computer science,
Journal Year:
2025,
Volume and Issue:
unknown, P. 190 - 208
Published: Jan. 1, 2025
Language: Английский
Faster Maximal Exact Matches with Lazy LCP Evaluation
Published: March 19, 2024
MONI
(Rossi
et
al.,
Language: Английский
Improved pangenomic classification accuracy with chain statistics
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 2, 2024
Abstract
Compressed
full-text
indexes
enable
efficient
sequence
classification
against
a
pangenome
or
tree-of-life
index.
Past
work
on
compressed-index
used
matching
statistics
pseudo-matching
lengths
to
capture
the
fine-grained
co-linearity
of
exact
matches.
But
these
fail
coarse-grained
information
about
whether
seeds
appear
co-linearly
in
reference.
We
present
novel
approach
that
additionally
obtains
(“chain”)
statistics.
do
this
without
using
chaining
algorithm,
which
would
require
superlinear
time
number
start
with
collection
strings,
avoiding
multiple-alignment
step
required
by
graph
approaches.
rapidly
compute
multi-maximal
unique
matches
(multi-MUMs)
and
identify
BWT
sub-runs
correspond
multi-MUMs.
From
these,
we
select
those
can
be
“tunneled,”
mark
corresponding
multi-MUM
identifiers.
This
yields
an
ℴ(
r
+
n/d
)-space
index
for
d
sequences
having
length-
n
consisting
maximal
equal-character
runs.
Using
index,
simultaneously
chain
linear
respect
query
length.
found
substantially
improves
accuracy
compared
past
compressed-indexing
approaches
reaches
same
level
as
less
alignmentbased
methods.
Language: Английский