DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Авг. 19, 2024
Somatic
variant
detection
is
an
integral
part
of
cancer
genomics
analysis.
While
most
methods
have
focused
on
short-read
sequencing,
long-read
technologies
now
offer
potential
advantages
in
terms
repeat
mapping
and
phasing.
We
present
DeepSomatic,
a
deep
learning
method
for
detecting
somatic
SNVs
insertions
deletions
(indels)
from
both
data,
with
modes
whole-genome
exome
able
to
run
tumor-normal,
tumor-only,
FFPE-prepared
samples.
To
help
address
the
dearth
publicly
available
training
benchmarking
data
detection,
we
generated
make
openly
dataset
five
matched
tumor-normal
cell
line
pairs
sequenced
Illumina,
PacBio
HiFi,
Oxford
Nanopore
Technologies,
along
benchmark
sets.
Across
samples
(short-read
long-read),
DeepSomatic
consistently
outperforms
existing
callers,
particularly
indels.
Язык: Английский
Long-read sequencing of hundreds of diverse brains provides insight into the impact of structural variation on gene expression and DNA methylation
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Дек. 18, 2024
Structural
variants
(SVs)
drive
gene
expression
in
the
human
brain
and
are
causative
of
many
neurological
conditions.
However,
most
existing
genetic
studies
have
been
based
on
short-read
sequencing
methods,
which
capture
fewer
than
half
SVs
present
any
one
individual.
Long-read
(LRS)
enhances
our
ability
to
detect
disease-associated
functionally
relevant
structural
(SVs);
however,
its
application
large-scale
genomic
has
limited
by
challenges
sample
preparation
high
costs.
Here,
we
leverage
a
new
scalable
wet-lab
protocol
computational
pipeline
for
whole-genome
Oxford
Nanopore
Technologies
apply
it
neurologically
normal
control
samples
from
North
American
Brain
Expression
Consortium
(NABEC)
(European
ancestry)
Human
Collection
Core
(HBCC)
(African
or
African
admixed
cohorts.
Through
this
work,
publicly
available
long-read
resource
351
(median
N50:
27
Kbp
at
an
average
depth
~40x
genome
coverage).
We
discover
approximately
234,905
produce
locally
phased
assemblies
that
cover
95%
all
protein-coding
genes
GRCh38.
Utilizing
matched
datasets
these
samples,
quantitative
trait
locus
(QTL)
analyses
identify
impact
post-mortem
frontal
cortex
tissue.
Further,
determine
haplotype-specific
methylation
signatures
millions
CpGs
and,
with
data,
cis-acting
SVs.
In
summary,
results
highlight
LRS
can
complex
regulatory
mechanisms
were
inaccessible
using
previous
approaches.
believe
provides
critical
step
toward
understanding
biological
effects
variation
brain.
Язык: Английский
Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 29, 2024
DNA
sequencing
of
tumours
to
identify
somatic
mutations
has
become
a
critical
tool
guide
the
type
treatment
given
cancer
patients.
The
gold
standard
for
mutation
calling
is
comparing
data
from
tumour
matched
normal
sample
avoid
mis-classifying
inherited
SNPs
as
mutations.
This
procedure
works
extremely
well,
but
in
certain
situations
only
available.
While
approaches
have
been
developed
find
without
normal,
they
limited
accuracy
or
require
specific
types
input
(e.g.
ultra-deep
sequencing).
Here
we
explore
application
single
molecule
long
read
samples.
We
develop
simple
theoretical
framework
show
how
haplotype
phasing
an
important
source
information
determining
whether
variant
mutation.
then
use
simulations
assess
range
experimental
parameters
(tumour
purity,
depth)
where
this
approach
effective.
These
ideas
are
into
prototype
caller,
smrest,
and
its
demonstrated
on
two
highly
mutated
cell
lines.
Finally,
argue
that
potential
measure
clinically
biomarkers
based
genome-wide
distribution
mutations:
burden
signatures.
Язык: Английский
A personalized multi-platform assessment of somatic mosaicism in the human frontal cortex
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Дек. 21, 2024
Somatic
mutations
in
individual
cells
lead
to
genomic
mosaicism,
contributing
the
intricate
regulatory
landscape
of
genetic
disorders
and
cancers.
To
evaluate
refine
detection
somatic
mosaicism
across
different
technologies
with
personalized
donor-specific
assembly
(DSA),
we
obtained
tissue
from
dorsolateral
prefrontal
cortex
(DLPFC)
a
post-mortem
neurotypical
31-year-old
individual.
We
sequenced
bulk
DLPFC
using
Oxford
Nanopore
Technologies
(~60X),
NovaSeq
(~30X),
linked-read
sequencing
(~28X).
Additionally,
applied
Cas9
capture
methodology
coupled
long-read
(TEnCATS),
targeting
active
transposable
elements.
also
isolated
amplified
DNA
flow-sorted
single
neurons
MALBAC,
115
these
MALBAC
libraries
on
94
NovaSeq.
constructed
haplotype-resolved
total
length
5.77
Gb
phase
block
2.67
Mb
(N50)
facilitate
cross-platform
analysis
variations.
observed
an
increase
phasing
rate
11.6%
38.0%
between
short-read
technologies.
By
generating
catalog
phased
germline
SNVs,
CNVs,
TEs
assembled
genome,
standard
approaches
recall
variants
achieved
aggregated
rates
97.3%
99.4%
based
data,
setting
upper
bound
for
limits.
Moreover,
utilizing
haplotype-based
DSA,
remarkable
reduction
false
positive
calls
tissue,
ranging
14.9%
72.4%.
developed
pipelines
leveraging
DSA
information
enhance
large
variant
calling
cells.
examining
variation
long-reads
neurons,
identified
468
candidate
heterozygous
deletions
(1.5Mb
-
20Mb),
137
which
intersected
single-cell
data.
61
putative
(60
Alus,
one
LINE-1)
Collectively,
our
spans
calling,
providing
comprehensive
ab
initio
ad
finem
approach
resource
real
human
tissue.
Язык: Английский