bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Sept. 12, 2023
Long-read
sequencing
technology
has
enabled
variant
detection
in
difficult-to-map
regions
of
the
genome
and
rapid
genetic
diagnosis
clinical
settings.
Rapidly
evolving
third-generation
platforms
like
Pacific
Biosciences
(PacBio)
Oxford
nanopore
technologies
(ONT)
are
introducing
newer
data
types.
It
been
demonstrated
that
calling
methods
based
on
deep
neural
networks
can
use
local
haplotyping
information
with
long-reads
to
improve
genotyping
accuracy.
However,
using
haplotype
creates
an
overhead
as
needs
be
performed
multiple
times
which
ultimately
makes
it
difficult
extend
new
types
they
get
introduced.
In
this
work,
we
have
developed
a
approximate
method
enables
state-of-the-art
performance
including
PacBio
Revio
system,
ONT
R10.4
simplex
duplex
data.
This
addition
approximation
DeepVariant
universal
solution
for
long-read
platforms.
Nature Communications,
Journal Year:
2024,
Volume and Issue:
15(1)
Published: Jan. 29, 2024
The
All
of
Us
(AoU)
initiative
aims
to
sequence
the
genomes
over
one
million
Americans
from
diverse
ethnic
backgrounds
improve
personalized
medical
care.
In
a
recent
technical
pilot,
we
compare
performance
traditional
short-read
sequencing
with
long-read
in
small
cohort
samples
HapMap
project
and
two
AoU
control
representing
eight
datasets.
Our
analysis
reveals
substantial
differences
ability
these
technologies
accurately
complex
medically
relevant
genes,
particularly
terms
gene
coverage
pathogenic
variant
identification.
We
also
consider
advantages
challenges
using
low
increase
sample
numbers
large
analysis.
results
show
that
HiFi
reads
produce
most
accurate
for
both
variants.
Further,
present
cloud-based
pipeline
optimize
SNV,
indel
SV
calling
at
scale
long-reads
These
lead
widespread
improvements
across
AoU.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 7, 2024
Less
than
half
of
individuals
with
a
suspected
Mendelian
condition
receive
precise
molecular
diagnosis
after
comprehensive
clinical
genetic
testing.
Improvements
in
data
quality
and
costs
have
heightened
interest
using
long-read
sequencing
(LRS)
to
streamline
genomic
testing,
but
the
absence
control
datasets
for
variant
filtering
prioritization
has
made
tertiary
analysis
LRS
challenging.
To
address
this,
1000
Genomes
Project
ONT
Sequencing
Consortium
aims
generate
from
at
least
800
samples.
Our
goal
is
use
identify
broader
spectrum
variation
so
we
may
improve
our
understanding
normal
patterns
human
variation.
Here,
present
first
100
samples,
representing
all
5
superpopulations
19
subpopulations.
These
sequenced
an
average
depth
coverage
37x
sequence
read
N50
54
kbp,
high
concordance
previous
studies
identifying
single
nucleotide
indel
variants
outside
homopolymer
regions.
Using
multiple
structural
(SV)
callers,
24,543
high-confidence
SVs
per
genome,
including
shared
private
likely
disrupt
gene
function
as
well
pathogenic
expansions
within
disease-associated
repeats
that
were
not
detected
short
reads.
Evaluation
methylation
signatures
revealed
expected
known
imprinted
loci,
samples
skewed
X-inactivation
patterns,
novel
differentially
methylated
All
raw
data,
processed
summary
statistics
are
publicly
available,
providing
valuable
resource
genetics
community
discover
SVs.
Genome biology,
Journal Year:
2023,
Volume and Issue:
24(1)
Published: Sept. 18, 2023
Abstract
Background
Structural
variations
(SVs)
in
individual
genomes
are
major
determinants
of
complex
traits,
including
adaptability
to
environmental
variables.
The
Mongolian
and
Hainan
cattle
breeds
East
Asia
taurine
indicine
origins
that
have
evolved
adapt
cold
hot
environments,
respectively.
However,
few
studies
investigated
SVs
Asian
their
roles
adaptation,
little
is
known
about
adaptively
introgressed
cattle.
Results
In
this
study,
we
examine
the
climate
adaptation
these
two
lineages
by
generating
highly
contiguous
chromosome-scale
genome
assemblies.
Comparison
assemblies
along
with
18
obtained
long-read
sequencing
data
provides
a
catalog
123,898
nonredundant
SVs.
Several
detected
from
long
reads
exons
genes
associated
epidermal
differentiation,
skin
barrier,
bovine
tuberculosis
resistance.
Functional
investigations
show
108-bp
exonic
insertion
SPN
may
affect
uptake
Mycobacterium
macrophages,
which
might
contribute
low
susceptibility
tuberculosis.
Genotyping
373
whole
39
identifies
2610
differentiated
“north–south”
gradient
China
overlap
862
related
enriched
pathways
adaptation.
We
identify
1457
Chinese
indicine-stratified
possibly
originate
banteng
frequent
Conclusions
Our
findings
highlight
unique
contribution
disease
Genome Research,
Journal Year:
2024,
Volume and Issue:
34(2), P. 300 - 309
Published: Feb. 1, 2024
Expression
and
splicing
quantitative
trait
loci
(e/sQTL)
are
large
contributors
to
phenotypic
variability.
Achieving
sufficient
statistical
power
for
e/sQTL
mapping
requires
cohorts
with
both
genotypes
molecular
phenotypes,
so,
the
genomic
variation
is
often
called
from
short-read
alignments,
which
unable
comprehensively
resolve
structural
variation.
Here
we
build
a
pangenome
16
HiFi
haplotype-resolved
cattle
assemblies
identify
small
genotype
them
PanGenie
in
307
samples.
We
find
high
(>90%)
concordance
of
PanGenie-genotyped
DeepVariant-called
confidently
close
21
million
43,000
variants
larger
population.
validate
85%
these
(with
MAF
>
0.1)
directly
subset
25
samples
that
also
have
medium
coverage
reads.
then
conduct
this
comprehensive
variant
set
117
testis
transcriptome
data,
92
as
causal
candidates
eQTL
73
sQTL.
roughly
half
top
associated
affecting
expression
or
transposable
elements,
such
SV-eQTL
STN1
MYH7
SV-sQTL
CEP89
ASAH2
.
Extensive
linkage
disequilibrium
between
results
only
28
additional
17
sQTL
discovered
when
including
SVs,
although
many
SVs
compelling
candidates.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 4, 2024
Abstract
Solve-RD
is
a
pan-European
rare
disease
(RD)
research
program
that
aims
to
identify
disease-causing
genetic
variants
in
previously
undiagnosed
RD
families.
We
utilised
10-fold
coverage
HiFi
long-read
sequencing
(LRS)
for
detecting
causative
structural
(SVs),
single
nucleotide
(SNVs),
insertion-deletions
(InDels),
and
short
tandem
repeat
(STR)
expansions
extensively
studied
families
without
clear
molecular
diagnoses.
Our
cohort
includes
293
individuals
from
114
genetically
selected
by
European
Rare
Disease
Network
(ERN)
experts.
Of
these,
21
were
affected
so-called
‘unsolvable’
syndromes
which
causes
remain
unknown,
93
with
at
least
one
individual
neurological,
neuromuscular,
or
epilepsy
disorder
diagnosis
despite
extensive
prior
testing.
Clinical
interpretation
orthogonal
validation
of
known
genes
yielded
thirteen
novel
diagnoses
due
de
novo
inherited
SNVs,
InDels,
SVs,
STR
expansions.
In
an
additional
four
families,
we
identified
candidate
SV
affecting
several
including
MCF2
/
FGF13
fusion
PSMA3
deletion.
However,
no
common
cause
was
any
the
syndromes.
Taken
together,
found
(likely)
13.0%
unsolved
SVs
another
4.3%
these
conclusion,
our
results
demonstrate
added
value
genome
diseases.
Genome Medicine,
Journal Year:
2025,
Volume and Issue:
17(1)
Published: March 21, 2025
The
complex
2
Mb
survival
motor
neuron
(SMN)
locus
on
chromosome
5q13,
including
the
spinal
muscular
atrophy
(SMA)-causing
gene
SMN1
and
modifier
SMN2,
remains
incompletely
resolved
due
to
numerous
segmental
duplications.
Variation
in
SMN2
copy
number,
presumably
influenced
by
conversion,
affects
disease
severity,
though
number
alone
has
insufficient
prognostic
value
limited
genotype–phenotype
correlations.
With
advancements
newborn
screening
SMN-targeted
therapies,
identifying
genetic
markers
predict
progression
treatment
response
is
crucial.
Progress
thus
far
been
methodological
constraints.
To
address
this,
we
developed
HapSMA,
a
method
perform
polyploid
phasing
of
SMN
enable
copy-specific
analysis
its
surrounding
genes.
We
used
HapSMA
publicly
available
Oxford
Nanopore
Technologies
(ONT)
sequencing
data
29
healthy
controls
performed
long-read,
targeted
ONT
31
patients
with
SMA.
In
controls,
identified
single
nucleotide
variants
(SNVs)
specific
haplotypes
that
could
serve
as
conversion
markers.
Broad
NAIP
allowed
for
more
complete
view
variation.
Genetic
variation
was
larger
SMA
patients.
Forty-two
percent
showed
varying
breakpoints,
serving
direct
evidence
common
characteristic
highlighting
importance
inclusion
when
investigating
locus.
Our
findings
illustrate
both
advances
patient
samples
are
required
advance
our
understanding
loci
critical
clinical
challenges.
Genome Research,
Journal Year:
2025,
Volume and Issue:
35(4), P. 545 - 558
Published: April 1, 2025
Over
the
past
decade,
long-read
sequencing
has
evolved
into
a
pivotal
technology
for
uncovering
hidden
and
complex
regions
of
genome.
Significant
cost
efficiency,
scalability,
accuracy
advancements
have
driven
this
evolution.
Concurrently,
novel
analytical
methods
emerged
to
harness
full
potential
long
reads.
These
enabled
milestones
such
as
first
fully
completed
human
genome,
enhanced
identification
understanding
genomic
variants,
deeper
insights
interplay
between
epigenetics
variation.
This
mini-review
provides
comprehensive
overview
latest
developments
in
DNA
analysis,
encompassing
reference-based
de
novo
assembly
approaches.
We
explore
entire
workflow,
from
initial
data
processing
variant
calling
annotation,
focusing
on
how
these
improve
our
ability
interpret
wide
array
variants.
Additionally,
we
discuss
current
challenges,
limitations,
future
directions
field,
offering
detailed
examination
state-of-the-art
bioinformatics
sequencing.
Nature,
Journal Year:
2023,
Volume and Issue:
624(7992), P. 602 - 610
Published: Dec. 13, 2023
Abstract
Indigenous
Australians
harbour
rich
and
unique
genomic
diversity.
However,
Aboriginal
Torres
Strait
Islander
ancestries
are
historically
under-represented
in
genomics
research
almost
completely
missing
from
reference
datasets
1–3
.
Addressing
this
representation
gap
is
critical,
both
to
advance
our
understanding
of
global
human
diversity
as
a
prerequisite
for
ensuring
equitable
outcomes
medicine.
Here
we
apply
population-scale
whole-genome
long-read
sequencing
4
profile
structural
variation
across
four
remote
communities.
We
uncover
an
abundance
large
insertion–deletion
variants
(20–49
bp;
n
=
136,797),
(50
b–50
kb;
159,912)
regions
variable
copy
number
(>50
156).
The
majority
composed
tandem
repeat
or
interspersed
mobile
element
sequences
(up
90%)
have
not
been
previously
annotated
62%).
A
fraction
appear
be
exclusive
(12%
lower-bound
estimate)
most
these
found
only
single
community,
underscoring
the
need
broad
deep
sampling
achieve
comprehensive
catalogue
Australian
continent.
Finally,
explore
short
repeats
throughout
genome
characterize
allelic
at
50
known
disease
loci
5
,
hundreds
novel
expansion
sites
within
protein-coding
genes,
identify
patterns
constraint
among
sequences.
Our
study
sheds
new
light
on
dimensions
dynamics
beyond
Australia.
Nature Communications,
Journal Year:
2024,
Volume and Issue:
15(1)
Published: July 13, 2024
Abstract
Long-read
sequencing
technology
has
enabled
variant
detection
in
difficult-to-map
regions
of
the
genome
and
rapid
genetic
diagnosis
clinical
settings.
Rapidly
evolving
third-generation
platforms
like
Pacific
Biosciences
(PacBio)
Oxford
Nanopore
Technologies
(ONT)
are
introducing
newer
data
types.
It
been
demonstrated
that
calling
methods
based
on
deep
neural
networks
can
use
local
haplotyping
information
with
long-reads
to
improve
genotyping
accuracy.
However,
using
haplotype
creates
an
overhead
as
needs
be
performed
multiple
times
which
ultimately
makes
it
difficult
extend
new
types
they
get
introduced.
In
this
work,
we
have
developed
a
approximate
method
enables
state-of-the-art
performance
including
PacBio
Revio
system,
ONT
R10.4
simplex
duplex
data.
This
addition
approximation
simplifies
long-read
DeepVariant.