Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms DOI Creative Commons
Sairam Behera, Severine Catreux, Massimiliano Rossi

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 2, 2024

Research and medical genomics require comprehensive scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, genetic markers with clinical significance. This necessitates a framework identify all types variants independent their size (e.g., SNV/SV) or location repeats). Here we present DRAGEN that utilizes methods based on multigenomes, hardware acceleration, machine learning variant detection provide insights into individual genomes ~30min computation time (from raw reads detection). outperforms other state-of-the-art in speed accuracy across (SNV, indel, STR, SV, CNV) further incorporates specialized obtain key medically relevant genes HLA, SMN, GBA). We showcase 3,202 demonstrate its scalability, accuracy, innovations advance integration for research applications.

Language: Английский

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation DOI
Mikhail Kolmogorov, Kimberley Billingsley, Mira Mastoras

et al.

Nature Methods, Journal Year: 2023, Volume and Issue: 20(10), P. 1483 - 1492

Published: Sept. 14, 2023

Language: Английский

Citations

81

Accelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing DOI Creative Commons
Sneha D. Goenka, John E. Gorzynski, Kishwar Shafin

et al.

Nature Biotechnology, Journal Year: 2022, Volume and Issue: 40(7), P. 1035 - 1041

Published: March 28, 2022

Abstract Whole-genome sequencing (WGS) can identify variants that cause genetic disease, but the time required for and analysis has been a barrier to its use in acutely ill patients. In present study, we develop an approach ultra-rapid nanopore WGS combines optimized sample preparation protocol, distributing over 48 flow cells, near real-time base calling alignment, accelerated variant fast filtration efficient manual review. Application two example clinical cases identified candidate <8 h from identification. We show this framework provides accurate calls prioritization, accelerates diagnostic genome twofold compared with previous approaches.

Language: Английский

Citations

74

A Draft Human Pangenome Reference DOI Creative Commons
Wen‐Wei Liao, Mobin Asri, Jana Ebler

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: July 9, 2022

Abstract The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. contains 47 phased, diploid assemblies from cohort of genetically diverse individuals. These cover more than 99% the expected sequence and are accurate at structural base-pair levels. Based on alignments assemblies, we generated that captures known variants haplotypes, reveals novel alleles structurally complex loci, adds 119 million base pairs euchromatic polymorphic 1,529 gene duplications relative to existing reference, GRCh38. Roughly 90 additional derive variation. Using our analyze short-read data reduces errors when discovering small by 34% boosts detected per haplotype 104% compared GRCh38-based workflows, using previous diversity sets genome assemblies.

Language: Английский

Citations

73

Utility of long-read sequencing for All of Us DOI Creative Commons
Medhat Mahmoud, Yongqing Huang, Kiran Garimella

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: Jan. 29, 2024

The All of Us (AoU) initiative aims to sequence the genomes over one million Americans from diverse ethnic backgrounds improve personalized medical care. In a recent technical pilot, we compare performance traditional short-read sequencing with long-read in small cohort samples HapMap project and two AoU control representing eight datasets. Our analysis reveals substantial differences ability these technologies accurately complex medically relevant genes, particularly terms gene coverage pathogenic variant identification. We also consider advantages challenges using low increase sample numbers large analysis. results show that HiFi reads produce most accurate for both variants. Further, present cloud-based pipeline optimize SNV, indel SV calling at scale long-reads These lead widespread improvements across AoU.

Language: Английский

Citations

39

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation DOI Creative Commons
Jonas A. Gustafson, Sophia B Gibson, Nikhita Damaraju

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 7, 2024

Less than half of individuals with a suspected Mendelian condition receive precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest using long-read sequencing (LRS) to streamline genomic testing, but the absence control datasets for variant filtering prioritization has made tertiary analysis LRS challenging. To address this, 1000 Genomes Project ONT Sequencing Consortium aims generate from at least 800 samples. Our goal is use identify broader spectrum variation so we may improve our understanding normal patterns human variation. Here, present first 100 samples, representing all 5 superpopulations 19 subpopulations. These sequenced an average depth coverage 37x sequence read N50 54 kbp, high concordance previous studies identifying single nucleotide indel variants outside homopolymer regions. Using multiple structural (SV) callers, 24,543 high-confidence SVs per genome, including shared private likely disrupt gene function as well pathogenic expansions within disease-associated repeats that were not detected short reads. Evaluation methylation signatures revealed expected known imprinted loci, samples skewed X-inactivation patterns, novel differentially methylated All raw data, processed summary statistics are publicly available, providing valuable resource genetics community discover SVs.

Language: Английский

Citations

26

Analysis and benchmarking of small and large genomic variants across tandem repeats DOI
Adam C. English, Egor Dolzhenko, Helyaneh Ziaei Jam

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: unknown

Published: April 26, 2024

Language: Английский

Citations

26

Comprehensive genome analysis and variant detection at scale using DRAGEN DOI Creative Commons
Sairam Behera, Severine Catreux, Massimiliano Rossi

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 25, 2024

Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers genetic markers with clinical significance. This necessitates a framework to identify all types variants independent their size or location. Here we present DRAGEN, which uses multigenome mapping pangenome references, hardware acceleration machine learning-based variant detection provide insights into individual genomes, ~30 min computation time from raw reads detection. DRAGEN outperforms current state-of-the-art in speed accuracy across (single-nucleotide variations, insertions deletions, short tandem repeats, structural variations copy number variations) incorporates specialized analysis medically relevant genes. We demonstrate performance 3,202 whole-genome sequencing datasets by generating fully genotyped multisample call format files its scalability, innovation further advance integration comprehensive genomics. Overall, marks major milestone data will various diseases, including Mendelian rare highly platform.

Language: Английский

Citations

17

Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study DOI
Jonathan Foox, Scott Tighe, Charles M. Nicolet

et al.

Nature Biotechnology, Journal Year: 2021, Volume and Issue: 39(9), P. 1129 - 1140

Published: Sept. 1, 2021

Language: Английский

Citations

98

A diploid assembly-based benchmark for variants in the major histocompatibility complex DOI Creative Commons
Chen-Shan Chin, Justin Wagner, Qiandong Zeng

et al.

Nature Communications, Journal Year: 2020, Volume and Issue: 11(1)

Published: Sept. 22, 2020

Abstract Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long and linked now enable us construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - Major Histocompatibility Complex (MHC). Here, we develop genome benchmark derived from for openly-consented Genome in Bottle sample HG002. assemble single contig each haplotype, align them reference, call small structural variants, define variant MHC, covering 94% of MHC 22368 variants smaller than 50 bp, 49% more mapping-based benchmark. This reliably identifies errors callsets, enables performance assessment regions with much denser, complex variation covered previous benchmarks.

Language: Английский

Citations

76

NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks DOI Creative Commons
Mian Umair Ahsan, Qian Liu, Fang Li

et al.

Genome biology, Journal Year: 2021, Volume and Issue: 22(1)

Published: Sept. 6, 2021

Long-read sequencing enables variant detection in genomic regions that are considered difficult-to-map by short-read sequencing. To fully exploit the benefits of longer reads, here we present a deep learning method NanoCaller, which detects SNPs using long-range haplotype information, then phases long reads with called and calls indels local realignment. Evaluation on 8 human genomes demonstrates NanoCaller generally achieves better performance than competing approaches. We experimentally validate 41 novel variants widely used benchmarking genome, could not be reliably detected previously. In summary, facilitates discovery complex from long-read

Language: Английский

Citations

65