
Cell, Journal Year: 2024, Volume and Issue: 187(5), P. 1024 - 1037
Published: Jan. 29, 2024
Language: Английский
Cell, Journal Year: 2024, Volume and Issue: 187(5), P. 1024 - 1037
Published: Jan. 29, 2024
Language: Английский
Nature, Journal Year: 2023, Volume and Issue: 617(7960), P. 312 - 324
Published: May 10, 2023
Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic sequences 1,115 gene duplications relative to existing reference GRCh38. Roughly 90 additional derived variation. Using our analyse short-read data reduced small variant discovery errors by 34% increased number detected per haplotype 104% compared with GRCh38-based workflows, which enabled typing vast majority sample.
Language: Английский
Citations
585Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D942 - D949
Published: Nov. 24, 2022
GENCODE produces high quality gene and transcript annotation for the human mouse genomes. All is supported by experimental data serves as a reference genome biology clinical genomics. The consortium generates targeted data, develops bioinformatic tools carries out analyses that, along with externally produced methods, support identification of structures determination their function. Here, we present an update on genes, including developments in tools, major collaborations which underpin this progress. For example, report creation set non-canonical ORFs identified transcripts, LRGASP collaboration to assess use long transcriptomic build models, progress RefSeq UniProt increase convergence protein-coding propagation across pan-genome development new regulatory features GENCODE. Our accessible via Ensembl, UCSC Genome Browser https://www.gencodegenes.org.
Language: Английский
Citations
367Nature, Journal Year: 2023, Volume and Issue: 621(7978), P. 344 - 354
Published: Aug. 23, 2023
Language: Английский
Citations
238Nature Biotechnology, Journal Year: 2023, Volume and Issue: 41(10), P. 1474 - 1482
Published: Feb. 16, 2023
Language: Английский
Citations
215Nature Methods, Journal Year: 2023, Volume and Issue: 20(1), P. 6 - 11
Published: Jan. 1, 2023
Language: Английский
Citations
205Nature Genetics, Journal Year: 2022, Volume and Issue: 54(4), P. 518 - 525
Published: April 1, 2022
Abstract Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In present study, we propose new algorithm, PanGenie, that leverages haplotype-resolved pangenome together -mer counts from sequencing data genotype wide spectrum of variation—a process refer as inference. Compared mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage achieves better concordances almost all variant types coverages tested. Improvements especially pronounced large insertions (≥50 bp) variants in enabling inclusion these classes genome-wide association studies. efficiently increasing amount assemblies unravel functional impact previously inaccessible while being compared alignment-based workflows.
Language: Английский
Citations
165Genome biology, Journal Year: 2024, Volume and Issue: 25(1)
Published: April 26, 2024
Long-read sequencing data, particularly those derived from the Oxford Nanopore platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient correction and assembly tool for noisy long reads, which achieves a level of accuracy in genome assembly. We apply NextDenovo assemble 35 diverse human genomes around world using long-read data. These allow us identify landscape segmental duplication gene copy number variation modern populations. The use should pave way population-scale
Language: Английский
Citations
159Nature, Journal Year: 2022, Volume and Issue: 611(7936), P. 519 - 531
Published: Oct. 19, 2022
Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome Reference Consortium formed goal creating high-quality, cost-effective, diploid assemblies for pangenome that genetic diversity 6 Here, in our first scientific report, we determined combination sequencing assembly approaches yield most complete accurate minimal manual curation. Approaches used highly long reads parent–child data graph-based haplotype phasing during outperformed those did not. Developing top-performing methods, containing only approximately four per chromosome on average, chromosomes within ±1% length CHM13. Nearly 48% protein-coding genes have non-synonymous amino acid changes between haplotypes, centromeric regions showed highest diversity. Our findings serve foundation assembling near-complete genomes at scale capture global variation single nucleotides structural rearrangements.
Language: Английский
Citations
141Nature Biotechnology, Journal Year: 2024, Volume and Issue: 42(10), P. 1571 - 1580
Published: Jan. 2, 2024
Abstract Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with fast consensus sequence and coverage-adaptive filtering. Sniffles2 11.8 times faster 29% more than state-of-the-art SV callers across different coverages (5–50×), sequencing technologies (ONT HiFi) types. Furthermore, solves problem of family-level population-level calling produce fully genotyped VCF files. Across 11 probands, accurately identified causative SVs around MECP2 , including highly alleles three overlapping SVs. also enables detection mosaic in bulk long-read data. As result, multiple brain tissue from patient system atrophy. The showed remarkable diversity within cingulate cortex, impacting both genes involved neuron function repetitive elements.
Language: Английский
Citations
136bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown
Published: March 12, 2023
Abstract Long read sequencing data, particularly those derived from the Oxford Nanopore (ONT) platform, tend to exhibit a high error rate. Here, we present NextDenovo, highly efficient correction and assembly tool for noisy long reads, which achieves level of accuracy in genome assembly. NextDenovo can rapidly correct reads; these corrected reads contain fewer errors than other comparable tools are characterized by chimeric alignments. We applied quality reference genomes 35 diverse humans across world using ONT data. Based on de novo assemblies, were able identify landscape segmental duplications gene copy number variation modern human population. The use program should pave way population-scale long-read assembly, thereby facilitating construction pan-genomes,
Language: Английский
Citations
103