Local read haplotagging enables accurate long-read small variant calling DOI Creative Commons
Alexey Kolesnikov, Daniel E. Cook, Maria Nattestad

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Sept. 12, 2023

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and rapid genetic diagnosis clinical settings. Rapidly evolving third-generation platforms like Pacific Biosciences (PacBio) Oxford nanopore technologies (ONT) are introducing newer data types. It been demonstrated that calling methods based on deep neural networks can use local haplotyping information with long-reads to improve genotyping accuracy. However, using haplotype creates an overhead as needs be performed multiple times which ultimately makes it difficult extend new types they get introduced. In this work, we have developed a approximate method enables state-of-the-art performance including PacBio Revio system, ONT R10.4 simplex duplex data. This addition approximation DeepVariant universal solution for long-read platforms.

Language: Английский

Utility of long-read sequencing for All of Us DOI Creative Commons
Medhat Mahmoud, Yongqing Huang, Kiran Garimella

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: Jan. 29, 2024

The All of Us (AoU) initiative aims to sequence the genomes over one million Americans from diverse ethnic backgrounds improve personalized medical care. In a recent technical pilot, we compare performance traditional short-read sequencing with long-read in small cohort samples HapMap project and two AoU control representing eight datasets. Our analysis reveals substantial differences ability these technologies accurately complex medically relevant genes, particularly terms gene coverage pathogenic variant identification. We also consider advantages challenges using low increase sample numbers large analysis. results show that HiFi reads produce most accurate for both variants. Further, present cloud-based pipeline optimize SNV, indel SV calling at scale long-reads These lead widespread improvements across AoU.

Language: Английский

Citations

39

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation DOI Creative Commons
Jonas A. Gustafson, Sophia B Gibson, Nikhita Damaraju

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 7, 2024

Less than half of individuals with a suspected Mendelian condition receive precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest using long-read sequencing (LRS) to streamline genomic testing, but the absence control datasets for variant filtering prioritization has made tertiary analysis LRS challenging. To address this, 1000 Genomes Project ONT Sequencing Consortium aims generate from at least 800 samples. Our goal is use identify broader spectrum variation so we may improve our understanding normal patterns human variation. Here, present first 100 samples, representing all 5 superpopulations 19 subpopulations. These sequenced an average depth coverage 37x sequence read N50 54 kbp, high concordance previous studies identifying single nucleotide indel variants outside homopolymer regions. Using multiple structural (SV) callers, 24,543 high-confidence SVs per genome, including shared private likely disrupt gene function as well pathogenic expansions within disease-associated repeats that were not detected short reads. Evaluation methylation signatures revealed expected known imprinted loci, samples skewed X-inactivation patterns, novel differentially methylated All raw data, processed summary statistics are publicly available, providing valuable resource genetics community discover SVs.

Language: Английский

Citations

26

Structural variation and introgression from wild populations in East Asian cattle genomes confer adaptation to local environment DOI Creative Commons
Xiaoting Xia, Fengwei Zhang, Shuang� Li

et al.

Genome biology, Journal Year: 2023, Volume and Issue: 24(1)

Published: Sept. 18, 2023

Abstract Background Structural variations (SVs) in individual genomes are major determinants of complex traits, including adaptability to environmental variables. The Mongolian and Hainan cattle breeds East Asia taurine indicine origins that have evolved adapt cold hot environments, respectively. However, few studies investigated SVs Asian their roles adaptation, little is known about adaptively introgressed cattle. Results In this study, we examine the climate adaptation these two lineages by generating highly contiguous chromosome-scale genome assemblies. Comparison assemblies along with 18 obtained long-read sequencing data provides a catalog 123,898 nonredundant SVs. Several detected from long reads exons genes associated epidermal differentiation, skin barrier, bovine tuberculosis resistance. Functional investigations show 108-bp exonic insertion SPN may affect uptake Mycobacterium macrophages, which might contribute low susceptibility tuberculosis. Genotyping 373 whole 39 identifies 2610 differentiated “north–south” gradient China overlap 862 related enriched pathways adaptation. We identify 1457 Chinese indicine-stratified possibly originate banteng frequent Conclusions Our findings highlight unique contribution disease

Language: Английский

Citations

27

Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle DOI
Alexander S. Leonard, Xena Marie Mapel, Hubert Pausch

et al.

Genome Research, Journal Year: 2024, Volume and Issue: 34(2), P. 300 - 309

Published: Feb. 1, 2024

Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires cohorts with both genotypes molecular phenotypes, so, the genomic variation is often called from short-read alignments, which unable comprehensively resolve structural variation. Here we build a pangenome 16 HiFi haplotype-resolved cattle assemblies identify small genotype them PanGenie in 307 samples. We find high (>90%) concordance of PanGenie-genotyped DeepVariant-called confidently close 21 million 43,000 variants larger population. validate 85% these (with MAF > 0.1) directly subset 25 samples that also have medium coverage reads. then conduct this comprehensive variant set 117 testis transcriptome data, 92 as causal candidates eQTL 73 sQTL. roughly half top associated affecting expression or transposable elements, such SV-eQTL STN1 MYH7 SV-sQTL CEP89 ASAH2 . Extensive linkage disequilibrium between results only 28 additional 17 sQTL discovered when including SVs, although many SVs compelling candidates.

Language: Английский

Citations

14

Unravelling undiagnosed rare disease cases by HiFi long-read genome sequencing DOI Open Access
Wouter Steyaert, Lydia Sagath, German Demidov

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 4, 2024

Abstract Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilised 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural (SVs), single nucleotide (SNVs), insertion-deletions (InDels), and short tandem repeat (STR) expansions extensively studied families without clear molecular diagnoses. Our cohort includes 293 individuals from 114 genetically selected by European Rare Disease Network (ERN) experts. Of these, 21 were affected so-called ‘unsolvable’ syndromes which causes remain unknown, 93 with at least one individual neurological, neuromuscular, or epilepsy disorder diagnosis despite extensive prior testing. Clinical interpretation orthogonal validation of known genes yielded thirteen novel diagnoses due de novo inherited SNVs, InDels, SVs, STR expansions. In an additional four families, we identified candidate SV affecting several including MCF2 / FGF13 fusion PSMA3 deletion. However, no common cause was any the syndromes. Taken together, found (likely) 13.0% unsolved SVs another 4.3% these conclusion, our results demonstrate added value genome diseases.

Language: Английский

Citations

11

Concordance of whole-genome long-read sequencing with standard clinical testing for Prader-Willi and Angelman syndromes DOI
Cate Paschal, Miranda Galey, Anita Beck

et al.

Journal of Molecular Diagnostics, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Language: Английский

Citations

1

Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy DOI Creative Commons
Maria M Zwartkruis, Martin Elferink, Demi Gommers

et al.

Genome Medicine, Journal Year: 2025, Volume and Issue: 17(1)

Published: March 21, 2025

The complex 2 Mb survival motor neuron (SMN) locus on chromosome 5q13, including the spinal muscular atrophy (SMA)-causing gene SMN1 and modifier SMN2, remains incompletely resolved due to numerous segmental duplications. Variation in SMN2 copy number, presumably influenced by conversion, affects disease severity, though number alone has insufficient prognostic value limited genotype–phenotype correlations. With advancements newborn screening SMN-targeted therapies, identifying genetic markers predict progression treatment response is crucial. Progress thus far been methodological constraints. To address this, we developed HapSMA, a method perform polyploid phasing of SMN enable copy-specific analysis its surrounding genes. We used HapSMA publicly available Oxford Nanopore Technologies (ONT) sequencing data 29 healthy controls performed long-read, targeted ONT 31 patients with SMA. In controls, identified single nucleotide variants (SNVs) specific haplotypes that could serve as conversion markers. Broad NAIP allowed for more complete view variation. Genetic variation was larger SMA patients. Forty-two percent showed varying breakpoints, serving direct evidence common characteristic highlighting importance inclusion when investigating locus. Our findings illustrate both advances patient samples are required advance our understanding loci critical clinical challenges.

Language: Английский

Citations

1

A Hitchhiker's Guide to long-read genomic analysis DOI
Medhat Mahmoud, Daniel Paiva Agustinho, Fritz J. Sedlazeck

et al.

Genome Research, Journal Year: 2025, Volume and Issue: 35(4), P. 545 - 558

Published: April 1, 2025

Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering hidden and complex regions of genome. Significant cost efficiency, scalability, accuracy advancements have driven this evolution. Concurrently, novel analytical methods emerged to harness full potential long reads. These enabled milestones such as first fully completed human genome, enhanced identification understanding genomic variants, deeper insights interplay between epigenetics variation. This mini-review provides comprehensive overview latest developments in DNA analysis, encompassing reference-based de novo assembly approaches. We explore entire workflow, from initial data processing variant calling annotation, focusing on how these improve our ability interpret wide array variants. Additionally, we discuss current challenges, limitations, future directions field, offering detailed examination state-of-the-art bioinformatics sequencing.

Language: Английский

Citations

1

The landscape of genomic structural variation in Indigenous Australians DOI Creative Commons
Andre L. M. Reis, Melissa Rapadas, Jillian M. Hammond

et al.

Nature, Journal Year: 2023, Volume and Issue: 624(7992), P. 602 - 610

Published: Dec. 13, 2023

Abstract Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal Torres Strait Islander ancestries are historically under-represented in genomics research almost completely missing from reference datasets 1–3 . Addressing this representation gap is critical, both to advance our understanding of global human diversity as a prerequisite for ensuring equitable outcomes medicine. Here we apply population-scale whole-genome long-read sequencing 4 profile structural variation across four remote communities. We uncover an abundance large insertion–deletion variants (20–49 bp; n = 136,797), (50 b–50 kb; 159,912) regions variable copy number (>50 156). The majority composed tandem repeat or interspersed mobile element sequences (up 90%) have not been previously annotated 62%). A fraction appear be exclusive (12% lower-bound estimate) most these found only single community, underscoring the need broad deep sampling achieve comprehensive catalogue Australian continent. Finally, explore short repeats throughout genome characterize allelic at 50 known disease loci 5 , hundreds novel expansion sites within protein-coding genes, identify patterns constraint among sequences. Our study sheds new light on dimensions dynamics beyond Australia.

Language: Английский

Citations

23

Local read haplotagging enables accurate long-read small variant calling DOI Creative Commons
Alexey Kolesnikov, Daniel E. Cook, Maria Nattestad

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: July 13, 2024

Abstract Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and rapid genetic diagnosis clinical settings. Rapidly evolving third-generation platforms like Pacific Biosciences (PacBio) Oxford Nanopore Technologies (ONT) are introducing newer data types. It been demonstrated that calling methods based on deep neural networks can use local haplotyping information with long-reads to improve genotyping accuracy. However, using haplotype creates an overhead as needs be performed multiple times which ultimately makes it difficult extend new types they get introduced. In this work, we have developed a approximate method enables state-of-the-art performance including PacBio Revio system, ONT R10.4 simplex duplex data. This addition approximation simplifies long-read DeepVariant.

Language: Английский

Citations

4