A Hitchhiker's Guide to long-read genomic analysis
Genome Research,
Journal Year:
2025,
Volume and Issue:
35(4), P. 545 - 558
Published: April 1, 2025
Over
the
past
decade,
long-read
sequencing
has
evolved
into
a
pivotal
technology
for
uncovering
hidden
and
complex
regions
of
genome.
Significant
cost
efficiency,
scalability,
accuracy
advancements
have
driven
this
evolution.
Concurrently,
novel
analytical
methods
emerged
to
harness
full
potential
long
reads.
These
enabled
milestones
such
as
first
fully
completed
human
genome,
enhanced
identification
understanding
genomic
variants,
deeper
insights
interplay
between
epigenetics
variation.
This
mini-review
provides
comprehensive
overview
latest
developments
in
DNA
analysis,
encompassing
reference-based
de
novo
assembly
approaches.
We
explore
entire
workflow,
from
initial
data
processing
variant
calling
annotation,
focusing
on
how
these
improve
our
ability
interpret
wide
array
variants.
Additionally,
we
discuss
current
challenges,
limitations,
future
directions
field,
offering
detailed
examination
state-of-the-art
bioinformatics
sequencing.
Language: Английский
FPGA-based accelerator for adaptive banded event alignment in nanopore sequencing data analysis
BMC Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: March 17, 2025
Adaptive
Banded
Event
Alignment
(ABEA)
stands
as
a
critical
algorithmic
component
in
sequence
polishing
and
DNA
methylation
detection,
employing
dynamic
programming
to
align
raw
Nanopore
signal
with
reference
reads.
Motivated
by
the
observation
that,
compared
CPUs
GPUs,
cutting-edge
FPGAs
demonstrate—in
certain
cases—superior
performance
at
reduced
cost
energy
consumption,
this
paper
presents
an
efficient
FPGA-based
accelerator
for
ABEA,
leveraging
inherent
high
parallelism
sequential
access
pattern
within
ABEA.
Our
proposed
ABEA
significantly
enhances
original
CPU-based
implementation
Nanopolish
well
state-of-art
acceleration
on
GPU
FPGA
platforms.
Specifically,
targeting
Xilinx
VU9P,
our
achieves
average
throughput
speedup
of
10.05
$$\times$$
over
CPU-only
implementation,
1.81
only
7.2%
energy,
10.11
existing
accelerator.
work
demonstrates
that
intensive
genome
analysis
can
benefit
from
FPGAs,
offering
improvements
both
consumption.
Language: Английский
TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering
Frontiers in Genetics,
Journal Year:
2024,
Volume and Issue:
15
Published: Oct. 28, 2024
Basecalling
is
an
essential
step
in
nanopore
sequencing
analysis
where
the
raw
signals
of
sequencers
are
converted
into
nucleotide
sequences,
that
is,
reads.
State-of-the-art
basecallers
use
complex
deep
learning
models
to
achieve
high
basecalling
accuracy.
This
makes
computationally
inefficient
and
memory-hungry,
bottlenecking
entire
genome
pipeline.
However,
for
many
applications,
most
reads
do
not
match
reference
interest
(i.e.,
target
reference)
thus
discarded
later
steps
genomics
pipeline,
wasting
computation.
To
overcome
this
issue,
we
propose
TargetCall,
first
pre-basecalling
filter
eliminate
wasted
computation
basecalling.
TargetCall’s
key
idea
discard
will
off-target
reads)
prior
TargetCall
consists
two
main
components:
(1)
LightCall,
a
lightweight
neural
network
basecaller
produces
noisy
reads,
(2)
Similarity
Check,
which
labels
each
these
as
on-target
or
by
matching
them
reference.
Our
thorough
experimental
evaluations
show
1)
improves
end-to-end
runtime
performance
state-of-the-art
3.31×
while
maintaining
id="m2">(98.88%)
recall
keeping
2)
maintains
accuracy
downstream
analysis,
3)
achieves
better
performance,
throughput,
recall,
precision,
generality
than
works.
available
at
https://github.com/CMU-SAFARI/TargetCall
.
Language: Английский
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis
ACM Transactions on Architecture and Code Optimization,
Journal Year:
2023,
Volume and Issue:
21(1), P. 1 - 29
Published: Dec. 28, 2023
Profile
hidden
Markov
models
(pHMMs)
are
widely
employed
in
various
bioinformatics
applications
to
identify
similarities
between
biological
sequences,
such
as
DNA
or
protein
sequences.
In
pHMMs,
sequences
represented
graph
structures,
where
states
and
edges
capture
modifications
(i.e.,
insertions,
deletions,
substitutions)
by
assigning
probabilities
them.
These
subsequently
used
compute
the
similarity
score
a
sequence
pHMM
graph.
The
Baum-Welch
algorithm,
prevalent
highly
accurate
method,
utilizes
these
optimize
scores.
Accurate
computation
of
is
essential
for
correct
identification
similarities.
However,
algorithm
computationally
intensive,
existing
solutions
offer
either
software-only
hardware-only
approaches
with
fixed
designs.
When
we
analyze
state-of-the-art
works,
an
urgent
need
flexible,
high-performance,
energy-efficient
hardware-software
co-design
address
major
inefficiencies
pHMMs.
We
introduce
ApHMM
,
first
flexible
acceleration
framework
designed
significantly
reduce
both
computational
energy
overheads
associated
employs
tackle
(1)
designing
hardware
accommodate
designs,
(2)
exploiting
predictable
data
dependency
patterns
through
on-chip
memory
memoization
techniques,
(3)
rapidly
filtering
out
unnecessary
computations
using
hardware-based
filter,
(4)
minimizing
redundant
computations.
achieves
substantial
speedups
15.55×–260.03×,
1.83×–5.34×,
27.97×
when
compared
CPU,
GPU,
FPGA
implementations
respectively.
outperforms
CPU
three
key
applications:
error
correction,
family
search,
multiple
alignment,
1.29×–59.94×,
1.03×–1.75×,
1.03×–1.95×,
respectively,
while
improving
their
efficiency
64.24×–115.46×,
1.75×,
1.96×.
Language: Английский