A Hitchhiker's Guide to long-read genomic analysis
Genome Research,
Journal Year:
2025,
Volume and Issue:
35(4), P. 545 - 558
Published: April 1, 2025
Over
the
past
decade,
long-read
sequencing
has
evolved
into
a
pivotal
technology
for
uncovering
hidden
and
complex
regions
of
genome.
Significant
cost
efficiency,
scalability,
accuracy
advancements
have
driven
this
evolution.
Concurrently,
novel
analytical
methods
emerged
to
harness
full
potential
long
reads.
These
enabled
milestones
such
as
first
fully
completed
human
genome,
enhanced
identification
understanding
genomic
variants,
deeper
insights
interplay
between
epigenetics
variation.
This
mini-review
provides
comprehensive
overview
latest
developments
in
DNA
analysis,
encompassing
reference-based
de
novo
assembly
approaches.
We
explore
entire
workflow,
from
initial
data
processing
variant
calling
annotation,
focusing
on
how
these
improve
our
ability
interpret
wide
array
variants.
Additionally,
we
discuss
current
challenges,
limitations,
future
directions
field,
offering
detailed
examination
state-of-the-art
bioinformatics
sequencing.
Language: Английский
TopoQual polishes circular consensus sequencing data and accurately predicts quality scores
BMC Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: Jan. 16, 2025
Abstract
Background
Pacific
Biosciences
(PacBio)
circular
consensus
sequencing
(CCS),
also
known
as
high
fidelity
(HiFi)
technology,
has
revolutionized
modern
genomics
by
producing
long
(10
+
kb)
and
highly
accurate
reads.
This
is
achieved
circularized
DNA
molecules
multiple
times
combining
them
into
a
sequence.
Currently,
the
accuracy
quality
value
estimation
provided
HiFi
technology
are
more
than
sufficient
for
applications
such
genome
assembly
germline
variant
calling.
However,
there
limitations
in
of
estimated
scores
when
it
comes
to
somatic
calling
on
single
Results
To
address
challenge
inaccurate
calling,
we
introduce
TopoQual,
novel
tool
designed
enhance
base
predictions.
TopoQual
leverages
techniques
including
partial
order
alignments
(POA),
topologically
parallel
bases,
deep
learning
algorithms
polish
sequences.
Our
results
demonstrate
that
corrects
approximately
31.9%
errors
PacBio
Additionally,
validates
qualities
up
q59,
which
corresponds
one
error
0.9
million
bases.
These
improvements
will
significantly
reliability
using
data.
Conclusion
represents
significant
advancement
improving
predictions
By
correcting
substantial
proportion
achieving
validation,
enables
confident
not
only
addresses
critical
limitation
current
but
opens
new
possibilities
precise
genomic
analysis
various
research
clinical
applications.
Language: Английский
Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing
Nature Biotechnology,
Journal Year:
2025,
Volume and Issue:
unknown
Published: April 4, 2025
Language: Английский
A personalized multi-platform assessment of somatic mosaicism in the human frontal cortex
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 21, 2024
Somatic
mutations
in
individual
cells
lead
to
genomic
mosaicism,
contributing
the
intricate
regulatory
landscape
of
genetic
disorders
and
cancers.
To
evaluate
refine
detection
somatic
mosaicism
across
different
technologies
with
personalized
donor-specific
assembly
(DSA),
we
obtained
tissue
from
dorsolateral
prefrontal
cortex
(DLPFC)
a
post-mortem
neurotypical
31-year-old
individual.
We
sequenced
bulk
DLPFC
using
Oxford
Nanopore
Technologies
(~60X),
NovaSeq
(~30X),
linked-read
sequencing
(~28X).
Additionally,
applied
Cas9
capture
methodology
coupled
long-read
(TEnCATS),
targeting
active
transposable
elements.
also
isolated
amplified
DNA
flow-sorted
single
neurons
MALBAC,
115
these
MALBAC
libraries
on
94
NovaSeq.
constructed
haplotype-resolved
total
length
5.77
Gb
phase
block
2.67
Mb
(N50)
facilitate
cross-platform
analysis
variations.
observed
an
increase
phasing
rate
11.6%
38.0%
between
short-read
technologies.
By
generating
catalog
phased
germline
SNVs,
CNVs,
TEs
assembled
genome,
standard
approaches
recall
variants
achieved
aggregated
rates
97.3%
99.4%
based
data,
setting
upper
bound
for
limits.
Moreover,
utilizing
haplotype-based
DSA,
remarkable
reduction
false
positive
calls
tissue,
ranging
14.9%
72.4%.
developed
pipelines
leveraging
DSA
information
enhance
large
variant
calling
cells.
examining
variation
long-reads
neurons,
identified
468
candidate
heterozygous
deletions
(1.5Mb
-
20Mb),
137
which
intersected
single-cell
data.
61
putative
(60
Alus,
one
LINE-1)
Collectively,
our
spans
calling,
providing
comprehensive
ab
initio
ad
finem
approach
resource
real
human
tissue.
Language: Английский