bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 2, 2024
Research
and
medical
genomics
require
comprehensive
scalable
solutions
to
drive
the
discovery
of
novel
disease
targets,
evolutionary
drivers,
genetic
markers
with
clinical
significance.
This
necessitates
a
framework
identify
all
types
variants
independent
their
size
(e.g.,
SNV/SV)
or
location
repeats).
Here
we
present
DRAGEN
that
utilizes
methods
based
on
multigenomes,
hardware
acceleration,
machine
learning
variant
detection
provide
insights
into
individual
genomes
~30min
computation
time
(from
raw
reads
detection).
outperforms
other
state-of-the-art
in
speed
accuracy
across
(SNV,
indel,
STR,
SV,
CNV)
further
incorporates
specialized
obtain
key
medically
relevant
genes
HLA,
SMN,
GBA).
We
showcase
3,202
demonstrate
its
scalability,
accuracy,
innovations
advance
integration
for
research
applications.
Cell,
Journal Year:
2022,
Volume and Issue:
185(18), P. 3426 - 3440.e19
Published: Sept. 1, 2022
The
1000
Genomes
Project
(1kGP)
is
the
largest
fully
open
resource
of
whole-genome
sequencing
(WGS)
data
consented
for
public
distribution
without
access
or
use
restrictions.
final,
phase
3
release
1kGP
included
2,504
unrelated
samples
from
26
populations
and
was
based
primarily
on
low-coverage
WGS.
Here,
we
present
a
high-coverage
3,202-sample
WGS
resource,
which
now
includes
602
complete
trios,
sequenced
to
depth
30X
using
Illumina.
We
performed
single-nucleotide
variant
(SNV)
short
insertion
deletion
(INDEL)
discovery
generated
comprehensive
set
structural
variants
(SVs)
by
integrating
multiple
analytic
methods
through
machine
learning
model.
show
gains
in
sensitivity
precision
calls
compared
3,
especially
among
rare
SNVs
as
well
INDELs
SVs
spanning
frequency
spectrum.
also
an
improved
reference
imputation
panel,
making
discovered
here
accessible
association
studies.
Nature,
Journal Year:
2023,
Volume and Issue:
617(7960), P. 312 - 324
Published: May 10, 2023
Abstract
Here
the
Human
Pangenome
Reference
Consortium
presents
a
first
draft
of
human
pangenome
reference.
The
contains
47
phased,
diploid
assemblies
from
cohort
genetically
diverse
individuals
1
.
These
cover
more
than
99%
expected
sequence
in
each
genome
and
are
accurate
at
structural
base
pair
levels.
Based
on
alignments
assemblies,
we
generate
that
captures
known
variants
haplotypes
reveals
new
alleles
structurally
complex
loci.
We
also
add
119
million
pairs
euchromatic
polymorphic
sequences
1,115
gene
duplications
relative
to
existing
reference
GRCh38.
Roughly
90
additional
derived
variation.
Using
our
analyse
short-read
data
reduced
small
variant
discovery
errors
by
34%
increased
number
detected
per
haplotype
104%
compared
with
GRCh38-based
workflows,
which
enabled
typing
vast
majority
sample.
Cell Genomics,
Journal Year:
2022,
Volume and Issue:
2(5), P. 100129 - 100129
Published: April 27, 2022
The
precisionFDA
Truth
Challenge
V2
aimed
to
assess
the
state
of
art
variant
calling
in
challenging
genomic
regions.
Starting
with
FASTQs,
20
challenge
participants
applied
their
variant-calling
pipelines
and
submitted
64
call
sets
for
one
or
more
sequencing
technologies
(Illumina,
PacBio
HiFi,
Oxford
Nanopore
Technologies).
Submissions
were
evaluated
following
best
practices
benchmarking
small
variants
updated
Genome
a
Bottle
benchmark
genome
stratifications.
submissions
included
numerous
innovative
methods,
graph-based
machine
learning
methods
scoring
short-read
long-read
datasets,
respectively.
With
approaches,
combining
multiple
performed
particularly
well.
Recent
developments
have
enabled
regions,
paving
way
identification
previously
unknown
clinically
relevant
variants.
Nature,
Journal Year:
2022,
Volume and Issue:
611(7936), P. 519 - 531
Published: Oct. 19, 2022
Abstract
The
current
human
reference
genome,
GRCh38,
represents
over
20
years
of
effort
to
generate
a
high-quality
assembly,
which
has
benefitted
society
1,2
.
However,
it
still
many
gaps
and
errors,
does
not
represent
biological
genome
as
is
blend
multiple
individuals
3,4
Recently,
telomere-to-telomere
reference,
CHM13,
was
generated
with
the
latest
long-read
technologies,
but
derived
from
hydatidiform
mole
cell
line
nearly
homozygous
5
To
address
these
limitations,
Human
Pangenome
Reference
Consortium
formed
goal
creating
high-quality,
cost-effective,
diploid
assemblies
for
pangenome
that
genetic
diversity
6
Here,
in
our
first
scientific
report,
we
determined
combination
sequencing
assembly
approaches
yield
most
complete
accurate
minimal
manual
curation.
Approaches
used
highly
long
reads
parent–child
data
graph-based
haplotype
phasing
during
outperformed
those
did
not.
Developing
top-performing
methods,
containing
only
approximately
four
per
chromosome
on
average,
chromosomes
within
±1%
length
CHM13.
Nearly
48%
protein-coding
genes
have
non-synonymous
amino
acid
changes
between
haplotypes,
centromeric
regions
showed
highest
diversity.
Our
findings
serve
foundation
assembling
near-complete
genomes
at
scale
capture
global
variation
single
nucleotides
structural
rearrangements.
The American Journal of Human Genetics,
Journal Year:
2022,
Volume and Issue:
109(9), P. 1605 - 1619
Published: Aug. 24, 2022
Newborn
screening
(NBS)
dramatically
improves
outcomes
in
severe
childhood
disorders
by
treatment
before
symptom
onset.
In
many
genetic
diseases,
however,
remain
poor
because
NBS
has
lagged
behind
drug
development.
Rapid
whole-genome
sequencing
(rWGS)
is
attractive
for
comprehensive
it
concomitantly
examines
almost
all
diseases
and
gaining
acceptance
disease
diagnosis
ill
newborns.
We
describe
prototypic
methods
scalable,
parentally
consented,
feedback-informed
of
rWGS
virtual,
acute
management
guidance
(NBS-rWGS).
Using
established
criteria
the
Delphi
method,
we
reviewed
457
NBS-rWGS,
retaining
388
(85%)
with
effective
treatments.
Simulated
NBS-rWGS
454,707
UK
Biobank
subjects
29,865
pathogenic
or
likely
variants
associated
had
a
true
negative
rate
(specificity)
99.7%
following
root
cause
analysis.
2,208
critically
children
suspected
2,168
their
parents,
simulated
identified
104
(87%)
119
diagnoses
previously
made
15
findings
not
reported
(NBS-rWGS
predictive
value
99.6%,
positive
[sensitivity]
88.8%).
Retrospective
diagnosed
that
been
undetected
conventional
NBS.
43
children,
NBS-rWGS-based
interventions
started
on
day
life
5,
consensus
was
symptoms
could
have
avoided
completely
seven
mostly
21,
partially
13.
invite
groups
worldwide
to
refine
these
conditions
join
us
prospectively
examine
clinical
utility
cost
effectiveness.