Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms DOI Creative Commons
Sairam Behera, Severine Catreux, Massimiliano Rossi

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 2, 2024

Research and medical genomics require comprehensive scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, genetic markers with clinical significance. This necessitates a framework identify all types variants independent their size (e.g., SNV/SV) or location repeats). Here we present DRAGEN that utilizes methods based on multigenomes, hardware acceleration, machine learning variant detection provide insights into individual genomes ~30min computation time (from raw reads detection). outperforms other state-of-the-art in speed accuracy across (SNV, indel, STR, SV, CNV) further incorporates specialized obtain key medically relevant genes HLA, SMN, GBA). We showcase 3,202 demonstrate its scalability, accuracy, innovations advance integration for research applications.

Language: Английский

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios DOI Creative Commons
Marta Byrska-Bishop,

Uday S. Evani,

Xuefang Zhao

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(18), P. 3426 - 3440.e19

Published: Sept. 1, 2022

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. final, phase 3 release 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS resource, which now includes 602 complete trios, sequenced to depth 30X using Illumina. We performed single-nucleotide variant (SNV) short insertion deletion (INDEL) discovery generated comprehensive set structural variants (SVs) by integrating multiple analytic methods through machine learning model. show gains in sensitivity precision calls compared 3, especially among rare SNVs as well INDELs SVs spanning frequency spectrum. also an improved reference imputation panel, making discovered here accessible association studies.

Language: Английский

Citations

645

A draft human pangenome reference DOI Creative Commons
Wen‐Wei Liao, Mobin Asri, Jana Ebler

et al.

Nature, Journal Year: 2023, Volume and Issue: 617(7960), P. 312 - 324

Published: May 10, 2023

Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic sequences 1,115 gene duplications relative to existing reference GRCh38. Roughly 90 additional derived variation. Using our analyse short-read data reduced small variant discovery errors by 34% increased number detected per haplotype 104% compared with GRCh38-based workflows, which enabled typing vast majority sample.

Language: Английский

Citations

589

Towards population-scale long-read sequencing DOI Open Access
Wouter De Coster, Matthias H. Weissensteiner, Fritz J. Sedlazeck

et al.

Nature Reviews Genetics, Journal Year: 2021, Volume and Issue: 22(9), P. 572 - 587

Published: May 28, 2021

Language: Английский

Citations

257

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads DOI
Kishwar Shafin, Trevor Pesout, Pi-Chuan Chang

et al.

Nature Methods, Journal Year: 2021, Volume and Issue: 18(11), P. 1322 - 1332

Published: Nov. 1, 2021

Language: Английский

Citations

220

Symphonizing pileup and full-alignment for deep learning-based long-read variant calling DOI
Zhenxian Zheng, Shumin Li, Junhao Su

et al.

Nature Computational Science, Journal Year: 2022, Volume and Issue: 2(12), P. 797 - 803

Published: Dec. 19, 2022

Language: Английский

Citations

192

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions DOI Creative Commons
Nathan D. Olson, Justin Wagner, Jennifer McDaniel

et al.

Cell Genomics, Journal Year: 2022, Volume and Issue: 2(5), P. 100129 - 100129

Published: April 27, 2022

The precisionFDA Truth Challenge V2 aimed to assess the state of art variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 call sets for one or more sequencing technologies (Illumina, PacBio HiFi, Oxford Nanopore Technologies). Submissions were evaluated following best practices benchmarking small variants updated Genome a Bottle benchmark genome stratifications. submissions included numerous innovative methods, graph-based machine learning methods scoring short-read long-read datasets, respectively. With approaches, combining multiple performed particularly well. Recent developments have enabled regions, paving way identification previously unknown clinically relevant variants.

Language: Английский

Citations

146

Semi-automated assembly of high-quality diploid human reference genomes DOI Creative Commons
Erich D. Jarvis, Giulio Formenti, Arang Rhie

et al.

Nature, Journal Year: 2022, Volume and Issue: 611(7936), P. 519 - 531

Published: Oct. 19, 2022

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome Reference Consortium formed goal creating high-quality, cost-effective, diploid assemblies for pangenome that genetic diversity 6 Here, in our first scientific report, we determined combination sequencing assembly approaches yield most complete accurate minimal manual curation. Approaches used highly long reads parent–child data graph-based haplotype phasing during outperformed those did not. Developing top-performing methods, containing only approximately four per chromosome on average, chromosomes within ±1% length CHM13. Nearly 48% protein-coding genes have non-synonymous amino acid changes between haplotypes, centromeric regions showed highest diversity. Our findings serve foundation assembling near-complete genomes at scale capture global variation single nucleotides structural rearrangements.

Language: Английский

Citations

141

A genome sequencing system for universal newborn screening, diagnosis, and precision medicine for severe genetic diseases DOI Creative Commons
Stephen F. Kingsmore,

Laurie D. Smith,

Chris M. Kunard

et al.

The American Journal of Human Genetics, Journal Year: 2022, Volume and Issue: 109(9), P. 1605 - 1619

Published: Aug. 24, 2022

Newborn screening (NBS) dramatically improves outcomes in severe childhood disorders by treatment before symptom onset. In many genetic diseases, however, remain poor because NBS has lagged behind drug development. Rapid whole-genome sequencing (rWGS) is attractive for comprehensive it concomitantly examines almost all diseases and gaining acceptance disease diagnosis ill newborns. We describe prototypic methods scalable, parentally consented, feedback-informed of rWGS virtual, acute management guidance (NBS-rWGS). Using established criteria the Delphi method, we reviewed 457 NBS-rWGS, retaining 388 (85%) with effective treatments. Simulated NBS-rWGS 454,707 UK Biobank subjects 29,865 pathogenic or likely variants associated had a true negative rate (specificity) 99.7% following root cause analysis. 2,208 critically children suspected 2,168 their parents, simulated identified 104 (87%) 119 diagnoses previously made 15 findings not reported (NBS-rWGS predictive value 99.6%, positive [sensitivity] 88.8%). Retrospective diagnosed that been undetected conventional NBS. 43 children, NBS-rWGS-based interventions started on day life 5, consensus was symptoms could have avoided completely seven mostly 21, partially 13. invite groups worldwide to refine these conditions join us prospectively examine clinical utility cost effectiveness.

Language: Английский

Citations

110

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer DOI
Gunjan Baid, Daniel E. Cook, Kishwar Shafin

et al.

Nature Biotechnology, Journal Year: 2022, Volume and Issue: unknown

Published: Sept. 1, 2022

Language: Английский

Citations

100

Variant calling and benchmarking in an era of complete human genome sequences DOI
Nathan D. Olson, Justin Wagner, Nathan Dwarshuis

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(7), P. 464 - 483

Published: April 14, 2023

Language: Английский

Citations

81