Comprehensive genome analysis and variant detection at scale using DRAGEN DOI Creative Commons
Sairam Behera, Severine Catreux, Massimiliano Rossi

и другие.

Nature Biotechnology, Год журнала: 2024, Номер unknown

Опубликована: Окт. 25, 2024

Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers genetic markers with clinical significance. This necessitates a framework to identify all types variants independent their size or location. Here we present DRAGEN, which uses multigenome mapping pangenome references, hardware acceleration machine learning-based variant detection provide insights into individual genomes, ~30 min computation time from raw reads detection. DRAGEN outperforms current state-of-the-art in speed accuracy across (single-nucleotide variations, insertions deletions, short tandem repeats, structural variations copy number variations) incorporates specialized analysis medically relevant genes. We demonstrate performance 3,202 whole-genome sequencing datasets by generating fully genotyped multisample call format files its scalability, innovation further advance integration comprehensive genomics. Overall, marks major milestone data will various diseases, including Mendelian rare highly platform.

Язык: Английский

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios DOI Creative Commons
Marta Byrska-Bishop,

Uday S. Evani,

Xuefang Zhao

и другие.

Cell, Год журнала: 2022, Номер 185(18), С. 3426 - 3440.e19

Опубликована: Сен. 1, 2022

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. final, phase 3 release 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS resource, which now includes 602 complete trios, sequenced to depth 30X using Illumina. We performed single-nucleotide variant (SNV) short insertion deletion (INDEL) discovery generated comprehensive set structural variants (SVs) by integrating multiple analytic methods through machine learning model. show gains in sensitivity precision calls compared 3, especially among rare SNVs as well INDELs SVs spanning frequency spectrum. also an improved reference imputation panel, making discovered here accessible association studies.

Язык: Английский

Процитировано

654

A draft human pangenome reference DOI Creative Commons
Wen‐Wei Liao, Mobin Asri, Jana Ebler

и другие.

Nature, Год журнала: 2023, Номер 617(7960), С. 312 - 324

Опубликована: Май 10, 2023

Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic sequences 1,115 gene duplications relative to existing reference GRCh38. Roughly 90 additional derived variation. Using our analyse short-read data reduced small variant discovery errors by 34% increased number detected per haplotype 104% compared with GRCh38-based workflows, which enabled typing vast majority sample.

Язык: Английский

Процитировано

598

Towards population-scale long-read sequencing DOI Open Access
Wouter De Coster, Matthias H. Weissensteiner, Fritz J. Sedlazeck

и другие.

Nature Reviews Genetics, Год журнала: 2021, Номер 22(9), С. 572 - 587

Опубликована: Май 28, 2021

Язык: Английский

Процитировано

258

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads DOI
Kishwar Shafin, Trevor Pesout, Pi-Chuan Chang

и другие.

Nature Methods, Год журнала: 2021, Номер 18(11), С. 1322 - 1332

Опубликована: Ноя. 1, 2021

Язык: Английский

Процитировано

221

Symphonizing pileup and full-alignment for deep learning-based long-read variant calling DOI
Zhenxian Zheng, Shumin Li, Junhao Su

и другие.

Nature Computational Science, Год журнала: 2022, Номер 2(12), С. 797 - 803

Опубликована: Дек. 19, 2022

Язык: Английский

Процитировано

197

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions DOI Creative Commons
Nathan D. Olson, Justin Wagner, Jennifer McDaniel

и другие.

Cell Genomics, Год журнала: 2022, Номер 2(5), С. 100129 - 100129

Опубликована: Апрель 27, 2022

The precisionFDA Truth Challenge V2 aimed to assess the state of art variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 call sets for one or more sequencing technologies (Illumina, PacBio HiFi, Oxford Nanopore Technologies). Submissions were evaluated following best practices benchmarking small variants updated Genome a Bottle benchmark genome stratifications. submissions included numerous innovative methods, graph-based machine learning methods scoring short-read long-read datasets, respectively. With approaches, combining multiple performed particularly well. Recent developments have enabled regions, paving way identification previously unknown clinically relevant variants.

Язык: Английский

Процитировано

147

Semi-automated assembly of high-quality diploid human reference genomes DOI Creative Commons
Erich D. Jarvis, Giulio Formenti, Arang Rhie

и другие.

Nature, Год журнала: 2022, Номер 611(7936), С. 519 - 531

Опубликована: Окт. 19, 2022

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome Reference Consortium formed goal creating high-quality, cost-effective, diploid assemblies for pangenome that genetic diversity 6 Here, in our first scientific report, we determined combination sequencing assembly approaches yield most complete accurate minimal manual curation. Approaches used highly long reads parent–child data graph-based haplotype phasing during outperformed those did not. Developing top-performing methods, containing only approximately four per chromosome on average, chromosomes within ±1% length CHM13. Nearly 48% protein-coding genes have non-synonymous amino acid changes between haplotypes, centromeric regions showed highest diversity. Our findings serve foundation assembling near-complete genomes at scale capture global variation single nucleotides structural rearrangements.

Язык: Английский

Процитировано

143

A genome sequencing system for universal newborn screening, diagnosis, and precision medicine for severe genetic diseases DOI Creative Commons
Stephen F. Kingsmore,

Laurie D. Smith,

Chris M. Kunard

и другие.

The American Journal of Human Genetics, Год журнала: 2022, Номер 109(9), С. 1605 - 1619

Опубликована: Авг. 24, 2022

Newborn screening (NBS) dramatically improves outcomes in severe childhood disorders by treatment before symptom onset. In many genetic diseases, however, remain poor because NBS has lagged behind drug development. Rapid whole-genome sequencing (rWGS) is attractive for comprehensive it concomitantly examines almost all diseases and gaining acceptance disease diagnosis ill newborns. We describe prototypic methods scalable, parentally consented, feedback-informed of rWGS virtual, acute management guidance (NBS-rWGS). Using established criteria the Delphi method, we reviewed 457 NBS-rWGS, retaining 388 (85%) with effective treatments. Simulated NBS-rWGS 454,707 UK Biobank subjects 29,865 pathogenic or likely variants associated had a true negative rate (specificity) 99.7% following root cause analysis. 2,208 critically children suspected 2,168 their parents, simulated identified 104 (87%) 119 diagnoses previously made 15 findings not reported (NBS-rWGS predictive value 99.6%, positive [sensitivity] 88.8%). Retrospective diagnosed that been undetected conventional NBS. 43 children, NBS-rWGS-based interventions started on day life 5, consensus was symptoms could have avoided completely seven mostly 21, partially 13. invite groups worldwide to refine these conditions join us prospectively examine clinical utility cost effectiveness.

Язык: Английский

Процитировано

113

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer DOI
Gunjan Baid, Daniel E. Cook, Kishwar Shafin

и другие.

Nature Biotechnology, Год журнала: 2022, Номер unknown

Опубликована: Сен. 1, 2022

Язык: Английский

Процитировано

100

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation DOI
Mikhail Kolmogorov, Kimberley Billingsley, Mira Mastoras

и другие.

Nature Methods, Год журнала: 2023, Номер 20(10), С. 1483 - 1492

Опубликована: Сен. 14, 2023

Язык: Английский

Процитировано

82