A 25-year odyssey of genomic technology advances and structural variant discovery DOI Creative Commons
David Porubský, Evan E. Eichler

Cell, Journal Year: 2024, Volume and Issue: 187(5), P. 1024 - 1037

Published: Jan. 29, 2024

Language: Английский

A draft human pangenome reference DOI Creative Commons
Wen‐Wei Liao, Mobin Asri, Jana Ebler

et al.

Nature, Journal Year: 2023, Volume and Issue: 617(7960), P. 312 - 324

Published: May 10, 2023

Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic sequences 1,115 gene duplications relative to existing reference GRCh38. Roughly 90 additional derived variation. Using our analyse short-read data reduced small variant discovery errors by 34% increased number detected per haplotype 104% compared with GRCh38-based workflows, which enabled typing vast majority sample.

Language: Английский

Citations

585

GENCODE: reference annotation for the human and mouse genomes in 2023 DOI Creative Commons
Adam Frankish, Sílvia Carbonell Sala, Mark Diekhans

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D942 - D949

Published: Nov. 24, 2022

GENCODE produces high quality gene and transcript annotation for the human mouse genomes. All is supported by experimental data serves as a reference genome biology clinical genomics. The consortium generates targeted data, develops bioinformatic tools carries out analyses that, along with externally produced methods, support identification of structures determination their function. Here, we present an update on genes, including developments in tools, major collaborations which underpin this progress. For example, report creation set non-canonical ORFs identified transcripts, LRGASP collaboration to assess use long transcriptomic build models, progress RefSeq UniProt increase convergence protein-coding propagation across pan-genome development new regulatory features GENCODE. Our accessible via Ensembl, UCSC Genome Browser https://www.gencodegenes.org.

Language: Английский

Citations

367

The complete sequence of a human Y chromosome DOI
Arang Rhie, Sergey Nurk, Monika Čechová

et al.

Nature, Journal Year: 2023, Volume and Issue: 621(7978), P. 344 - 354

Published: Aug. 23, 2023

Language: Английский

Citations

238

Telomere-to-telomere assembly of diploid chromosomes with Verkko DOI
Mikko Rautiainen, Sergey Nurk, Brian P. Walenz

et al.

Nature Biotechnology, Journal Year: 2023, Volume and Issue: 41(10), P. 1474 - 1482

Published: Feb. 16, 2023

Language: Английский

Citations

215

Method of the year: long-read sequencing DOI Open Access

Vivien Marx

Nature Methods, Journal Year: 2023, Volume and Issue: 20(1), P. 6 - 11

Published: Jan. 1, 2023

Language: Английский

Citations

205

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes DOI Creative Commons
Jana Ebler, Peter Ebert, Wayne E. Clarke

et al.

Nature Genetics, Journal Year: 2022, Volume and Issue: 54(4), P. 518 - 525

Published: April 1, 2022

Abstract Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In present study, we propose new algorithm, PanGenie, that leverages haplotype-resolved pangenome together -mer counts from sequencing data genotype wide spectrum of variation—a process refer as inference. Compared mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage achieves better concordances almost all variant types coverages tested. Improvements especially pronounced large insertions (≥50 bp) variants in enabling inclusion these classes genome-wide association studies. efficiently increasing amount assemblies unravel functional impact previously inaccessible while being compared alignment-based workflows.

Language: Английский

Citations

165

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads DOI Creative Commons
Jiang Hu, Zhuo Wang, Zongyi Sun

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: April 26, 2024

Long-read sequencing data, particularly those derived from the Oxford Nanopore platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient correction and assembly tool for noisy long reads, which achieves a level of accuracy in genome assembly. We apply NextDenovo assemble 35 diverse human genomes around world using long-read data. These allow us identify landscape segmental duplication gene copy number variation modern populations. The use should pave way population-scale

Language: Английский

Citations

159

Semi-automated assembly of high-quality diploid human reference genomes DOI Creative Commons
Erich D. Jarvis, Giulio Formenti, Arang Rhie

et al.

Nature, Journal Year: 2022, Volume and Issue: 611(7936), P. 519 - 531

Published: Oct. 19, 2022

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome Reference Consortium formed goal creating high-quality, cost-effective, diploid assemblies for pangenome that genetic diversity 6 Here, in our first scientific report, we determined combination sequencing assembly approaches yield most complete accurate minimal manual curation. Approaches used highly long reads parent–child data graph-based haplotype phasing during outperformed those did not. Developing top-performing methods, containing only approximately four per chromosome on average, chromosomes within ±1% length CHM13. Nearly 48% protein-coding genes have non-synonymous amino acid changes between haplotypes, centromeric regions showed highest diversity. Our findings serve foundation assembling near-complete genomes at scale capture global variation single nucleotides structural rearrangements.

Language: Английский

Citations

141

Detection of mosaic and population-level structural variants with Sniffles2 DOI Creative Commons
Moritz Smolka, Luis F. Paulin, Christopher M. Grochowski

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: 42(10), P. 1571 - 1580

Published: Jan. 2, 2024

Abstract Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with fast consensus sequence and coverage-adaptive filtering. Sniffles2 11.8 times faster 29% more than state-of-the-art SV callers across different coverages (5–50×), sequencing technologies (ONT HiFi) types. Furthermore, solves problem of family-level population-level calling produce fully genotyped VCF files. Across 11 probands, accurately identified causative SVs around MECP2 , including highly alleles three overlapping SVs. also enables detection mosaic in bulk long-read data. As result, multiple brain tissue from patient system atrophy. The showed remarkable diversity within cingulate cortex, impacting both genes involved neuron function repetitive elements.

Language: Английский

Citations

136

An efficient error correction and accurate assembly tool for noisy long reads DOI Creative Commons
Jiang Hu, Zhuo Wang, Zongyi Sun

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 12, 2023

Abstract Long read sequencing data, particularly those derived from the Oxford Nanopore (ONT) platform, tend to exhibit a high error rate. Here, we present NextDenovo, highly efficient correction and assembly tool for noisy long reads, which achieves level of accuracy in genome assembly. NextDenovo can rapidly correct reads; these corrected reads contain fewer errors than other comparable tools are characterized by chimeric alignments. We applied quality reference genomes 35 diverse humans across world using ONT data. Based on de novo assemblies, were able identify landscape segmental duplications gene copy number variation modern human population. The use program should pave way population-scale long-read assembly, thereby facilitating construction pan-genomes,

Language: Английский

Citations

103