bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 2, 2024
Research
and
medical
genomics
require
comprehensive
scalable
solutions
to
drive
the
discovery
of
novel
disease
targets,
evolutionary
drivers,
genetic
markers
with
clinical
significance.
This
necessitates
a
framework
identify
all
types
variants
independent
their
size
(e.g.,
SNV/SV)
or
location
repeats).
Here
we
present
DRAGEN
that
utilizes
methods
based
on
multigenomes,
hardware
acceleration,
machine
learning
variant
detection
provide
insights
into
individual
genomes
~30min
computation
time
(from
raw
reads
detection).
outperforms
other
state-of-the-art
in
speed
accuracy
across
(SNV,
indel,
STR,
SV,
CNV)
further
incorporates
specialized
obtain
key
medically
relevant
genes
HLA,
SMN,
GBA).
We
showcase
3,202
demonstrate
its
scalability,
accuracy,
innovations
advance
integration
for
research
applications.
Nature Biotechnology,
Journal Year:
2022,
Volume and Issue:
40(7), P. 1035 - 1041
Published: March 28, 2022
Abstract
Whole-genome
sequencing
(WGS)
can
identify
variants
that
cause
genetic
disease,
but
the
time
required
for
and
analysis
has
been
a
barrier
to
its
use
in
acutely
ill
patients.
In
present
study,
we
develop
an
approach
ultra-rapid
nanopore
WGS
combines
optimized
sample
preparation
protocol,
distributing
over
48
flow
cells,
near
real-time
base
calling
alignment,
accelerated
variant
fast
filtration
efficient
manual
review.
Application
two
example
clinical
cases
identified
candidate
<8
h
from
identification.
We
show
this
framework
provides
accurate
calls
prioritization,
accelerates
diagnostic
genome
twofold
compared
with
previous
approaches.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: July 9, 2022
Abstract
The
Human
Pangenome
Reference
Consortium
(HPRC)
presents
a
first
draft
human
pangenome
reference.
contains
47
phased,
diploid
assemblies
from
cohort
of
genetically
diverse
individuals.
These
cover
more
than
99%
the
expected
sequence
and
are
accurate
at
structural
base-pair
levels.
Based
on
alignments
assemblies,
we
generated
that
captures
known
variants
haplotypes,
reveals
novel
alleles
structurally
complex
loci,
adds
119
million
base
pairs
euchromatic
polymorphic
1,529
gene
duplications
relative
to
existing
reference,
GRCh38.
Roughly
90
additional
derive
variation.
Using
our
analyze
short-read
data
reduces
errors
when
discovering
small
by
34%
boosts
detected
per
haplotype
104%
compared
GRCh38-based
workflows,
using
previous
diversity
sets
genome
assemblies.
Nature Communications,
Journal Year:
2024,
Volume and Issue:
15(1)
Published: Jan. 29, 2024
The
All
of
Us
(AoU)
initiative
aims
to
sequence
the
genomes
over
one
million
Americans
from
diverse
ethnic
backgrounds
improve
personalized
medical
care.
In
a
recent
technical
pilot,
we
compare
performance
traditional
short-read
sequencing
with
long-read
in
small
cohort
samples
HapMap
project
and
two
AoU
control
representing
eight
datasets.
Our
analysis
reveals
substantial
differences
ability
these
technologies
accurately
complex
medically
relevant
genes,
particularly
terms
gene
coverage
pathogenic
variant
identification.
We
also
consider
advantages
challenges
using
low
increase
sample
numbers
large
analysis.
results
show
that
HiFi
reads
produce
most
accurate
for
both
variants.
Further,
present
cloud-based
pipeline
optimize
SNV,
indel
SV
calling
at
scale
long-reads
These
lead
widespread
improvements
across
AoU.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 7, 2024
Less
than
half
of
individuals
with
a
suspected
Mendelian
condition
receive
precise
molecular
diagnosis
after
comprehensive
clinical
genetic
testing.
Improvements
in
data
quality
and
costs
have
heightened
interest
using
long-read
sequencing
(LRS)
to
streamline
genomic
testing,
but
the
absence
control
datasets
for
variant
filtering
prioritization
has
made
tertiary
analysis
LRS
challenging.
To
address
this,
1000
Genomes
Project
ONT
Sequencing
Consortium
aims
generate
from
at
least
800
samples.
Our
goal
is
use
identify
broader
spectrum
variation
so
we
may
improve
our
understanding
normal
patterns
human
variation.
Here,
present
first
100
samples,
representing
all
5
superpopulations
19
subpopulations.
These
sequenced
an
average
depth
coverage
37x
sequence
read
N50
54
kbp,
high
concordance
previous
studies
identifying
single
nucleotide
indel
variants
outside
homopolymer
regions.
Using
multiple
structural
(SV)
callers,
24,543
high-confidence
SVs
per
genome,
including
shared
private
likely
disrupt
gene
function
as
well
pathogenic
expansions
within
disease-associated
repeats
that
were
not
detected
short
reads.
Evaluation
methylation
signatures
revealed
expected
known
imprinted
loci,
samples
skewed
X-inactivation
patterns,
novel
differentially
methylated
All
raw
data,
processed
summary
statistics
are
publicly
available,
providing
valuable
resource
genetics
community
discover
SVs.
Nature Biotechnology,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 25, 2024
Research
and
medical
genomics
require
comprehensive,
scalable
methods
for
the
discovery
of
novel
disease
targets,
evolutionary
drivers
genetic
markers
with
clinical
significance.
This
necessitates
a
framework
to
identify
all
types
variants
independent
their
size
or
location.
Here
we
present
DRAGEN,
which
uses
multigenome
mapping
pangenome
references,
hardware
acceleration
machine
learning-based
variant
detection
provide
insights
into
individual
genomes,
~30
min
computation
time
from
raw
reads
detection.
DRAGEN
outperforms
current
state-of-the-art
in
speed
accuracy
across
(single-nucleotide
variations,
insertions
deletions,
short
tandem
repeats,
structural
variations
copy
number
variations)
incorporates
specialized
analysis
medically
relevant
genes.
We
demonstrate
performance
3,202
whole-genome
sequencing
datasets
by
generating
fully
genotyped
multisample
call
format
files
its
scalability,
innovation
further
advance
integration
comprehensive
genomics.
Overall,
marks
major
milestone
data
will
various
diseases,
including
Mendelian
rare
highly
platform.
Nature Communications,
Journal Year:
2020,
Volume and Issue:
11(1)
Published: Sept. 22, 2020
Abstract
Most
human
genomes
are
characterized
by
aligning
individual
reads
to
the
reference
genome,
but
accurate
long
and
linked
now
enable
us
construct
accurate,
phased
de
novo
assemblies.
We
focus
on
a
medically
important,
highly
variable,
5
million
base-pair
(bp)
region
where
diploid
assembly
is
particularly
useful
-
Major
Histocompatibility
Complex
(MHC).
Here,
we
develop
genome
benchmark
derived
from
for
openly-consented
Genome
in
Bottle
sample
HG002.
assemble
single
contig
each
haplotype,
align
them
reference,
call
small
structural
variants,
define
variant
MHC,
covering
94%
of
MHC
22368
variants
smaller
than
50
bp,
49%
more
mapping-based
benchmark.
This
reliably
identifies
errors
callsets,
enables
performance
assessment
regions
with
much
denser,
complex
variation
covered
previous
benchmarks.
Genome biology,
Journal Year:
2021,
Volume and Issue:
22(1)
Published: Sept. 6, 2021
Long-read
sequencing
enables
variant
detection
in
genomic
regions
that
are
considered
difficult-to-map
by
short-read
sequencing.
To
fully
exploit
the
benefits
of
longer
reads,
here
we
present
a
deep
learning
method
NanoCaller,
which
detects
SNPs
using
long-range
haplotype
information,
then
phases
long
reads
with
called
and
calls
indels
local
realignment.
Evaluation
on
8
human
genomes
demonstrates
NanoCaller
generally
achieves
better
performance
than
competing
approaches.
We
experimentally
validate
41
novel
variants
widely
used
benchmarking
genome,
could
not
be
reliably
detected
previously.
In
summary,
facilitates
discovery
complex
from
long-read