Semi-automated assembly of high-quality diploid human reference genomes DOI Creative Commons
Erich D. Jarvis, Giulio Formenti, Arang Rhie

и другие.

Nature, Год журнала: 2022, Номер 611(7936), С. 519 - 531

Опубликована: Окт. 19, 2022

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still many gaps and errors, does not represent biological genome as is blend multiple individuals 3,4 Recently, telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but derived from hydatidiform mole cell line nearly homozygous 5 To address these limitations, Human Pangenome Reference Consortium formed goal creating high-quality, cost-effective, diploid assemblies for pangenome that genetic diversity 6 Here, in our first scientific report, we determined combination sequencing assembly approaches yield most complete accurate minimal manual curation. Approaches used highly long reads parent–child data graph-based haplotype phasing during outperformed those did not. Developing top-performing methods, containing only approximately four per chromosome on average, chromosomes within ±1% length CHM13. Nearly 48% protein-coding genes have non-synonymous amino acid changes between haplotypes, centromeric regions showed highest diversity. Our findings serve foundation assembling near-complete genomes at scale capture global variation single nucleotides structural rearrangements.

Язык: Английский

YaHS: yet another Hi-C scaffolding tool DOI Creative Commons
Chenxi Zhou, Shane McCarthy, Richard Durbin

и другие.

Bioinformatics, Год журнала: 2022, Номер 39(1)

Опубликована: Дек. 16, 2022

Abstract Summary We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with single-line command, requires minimal input users (an assembly file and an alignment file) which is compatible similar tools provides results in multiple formats, thereby enabling rapid, robust scalable high-quality genome assemblies high accuracy contiguity. Availability implementation YaHS implemented C licensed under MIT License. The source code, documentation tutorial are available at https://github.com/sanger-tol/yahs. Supplementary information data Bioinformatics online.

Язык: Английский

Процитировано

1266

Ensembl 2023 DOI Creative Commons
Fergal J. Martin,

M Ridwan Amode,

Alisha Aneja

и другие.

Nucleic Acids Research, Год журнала: 2022, Номер 51(D1), С. D933 - D941

Опубликована: Окт. 14, 2022

Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms more than twenty years. During that time, our resources, services tools have continually evolved in line with both the publicly available genome data downstream research applications utilise platform. In recent years we witnessed a dramatic shift landscape. There been large increase number of reference genomes through global biodiversity initiatives. parallel, there major advances towards pangenome representations higher species, where many alternative assemblies representing different breeds, cultivars, strains haplotypes are now available. order to support these efforts accelerate research, it is goal at create annotations, species across tree life. Here, report popular genomes, growth annotations (including from first human graphs), updates Variant Effect Predictor (VEP), interactive protein structure predictions AlphaFold DB, beta release new website.

Язык: Английский

Процитировано

730

A draft human pangenome reference DOI Creative Commons
Wen‐Wei Liao, Mobin Asri, Jana Ebler

и другие.

Nature, Год журнала: 2023, Номер 617(7960), С. 312 - 324

Опубликована: Май 10, 2023

Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic sequences 1,115 gene duplications relative to existing reference GRCh38. Roughly 90 additional derived variation. Using our analyse short-read data reduced small variant discovery errors by 34% increased number detected per haplotype 104% compared with GRCh38-based workflows, which enabled typing vast majority sample.

Язык: Английский

Процитировано

585

Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing DOI Creative Commons
Michael Alonge,

Ludivine Lebeigle,

Melanie Kirsche

и другие.

Genome biology, Год журнала: 2022, Номер 23(1)

Опубликована: Дек. 15, 2022

Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, establish chromosome-scale reference genomes the widely used tomato genotype M82 along with Sweet-100, new rapid-cycling that developed to accelerate functional editing in tomato. This work outlines strategies rapidly expand genomic resources other plant species.

Язык: Английский

Процитировано

473

The UCSC Genome Browser database: 2023 update DOI Creative Commons
Luis R Nassar,

Galt P Barber,

Anna Benet‐Pagès

и другие.

Nucleic Acids Research, Год журнала: 2022, Номер 51(D1), С. D1188 - D1195

Опубликована: Окт. 25, 2022

Abstract The UCSC Genome Browser (https://genome.ucsc.edu) is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year. This year has seen emphasis in clinical data, with new tracks expanded Recommended Track Sets feature on hg38 well addition of a single cell track group. SARS-CoV-2 remain focus, regular annotation updates browser continued curation our phylogenetic sequence placing tool, hgPhyloPlace, whose tree now reached over 12M sequences. Our GenArk also grown, offering 2500 hubs system for users request any absent assemblies. We have bigBarChart display type created ways visualize via bigRmsk dynseq display. Displaying custom annotations easier due chromAlias which eliminates requirement renaming names standard. Users involved generation may be interested tools trackDb settings facilitate creation their annotations.

Язык: Английский

Процитировано

456

Complete genomic and epigenetic maps of human centromeres DOI
Nicolas Altemose, Glennis A. Logsdon, Andrey V. Bzikadze

и другие.

Science, Год журнала: 2022, Номер 376(6588)

Опубликована: Март 31, 2022

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric centromeric repeats, constitute 6.2% the (189.9 megabases). Detailed maps these regions revealed multimegabase structural rearrangements, including in active repeat arrays. Analysis centromere-associated uncovered strong relationship between position centromere evolution surrounding DNA through layered expansions. Furthermore, comparisons X centromeres across diverse panel individuals illuminated high degrees structural, epigenetic, sequence variation complex rapidly evolving regions.

Язык: Английский

Процитировано

379

GENCODE: reference annotation for the human and mouse genomes in 2023 DOI Creative Commons
Adam Frankish, Sílvia Carbonell Sala, Mark Diekhans

и другие.

Nucleic Acids Research, Год журнала: 2022, Номер 51(D1), С. D942 - D949

Опубликована: Ноя. 24, 2022

GENCODE produces high quality gene and transcript annotation for the human mouse genomes. All is supported by experimental data serves as a reference genome biology clinical genomics. The consortium generates targeted data, develops bioinformatic tools carries out analyses that, along with externally produced methods, support identification of structures determination their function. Here, we present an update on genes, including developments in tools, major collaborations which underpin this progress. For example, report creation set non-canonical ORFs identified transcripts, LRGASP collaboration to assess use long transcriptomic build models, progress RefSeq UniProt increase convergence protein-coding propagation across pan-genome development new regulatory features GENCODE. Our accessible via Ensembl, UCSC Genome Browser https://www.gencodegenes.org.

Язык: Английский

Процитировано

367

Database resources of the National Center for Biotechnology Information in 2023 DOI Creative Commons
Eric W Sayers, Evan Bolton, J. Rodney Brister

и другие.

Nucleic Acids Research, Год журнала: 2022, Номер 51(D1), С. D29 - D38

Опубликована: Ноя. 12, 2022

The National Center for Biotechnology Information (NCBI) provides online information resources biology, including the GenBank® nucleic acid sequence database and PubMed® of citations abstracts published in life science journals. NCBI search retrieval operations most these data from 35 distinct databases. E-utilities serve as programming interface New include Comparative Genome Resource (CGR) BLAST ClusteredNR database. Resources receiving significant updates past year PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, PubChem. These can be accessed through home page at https://www.ncbi.nlm.nih.gov.

Язык: Английский

Процитировано

341

A complete reference genome improves analysis of human genetic variation DOI
Sergey Aganezov, Stephanie M. Yan, Daniela C. Soto

и другие.

Science, Год журнала: 2022, Номер 376(6588)

Опубликована: Март 31, 2022

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands structural errors, and unlocks most complex regions human for clinical functional study. We show how this reference universally improves read mapping variant calling 3202 17 globally diverse samples sequenced with short long reads, respectively. identify hundreds variants per sample in previously unresolved regions, showcasing promise T2T-CHM13 evolutionary biomedical discovery. Simultaneously, eliminates tens spurious sample, including reduction false positives 269 medically relevant genes by up a factor 12. Because these improvements discovery coupled population genomic resources, is positioned replace GRCh38 as prevailing genetics.

Язык: Английский

Процитировано

275

From telomere to telomere: The transcriptional and epigenetic state of human repeat elements DOI
Savannah J. Hoyt, Jessica M. Storer, Gabrielle A. Hartley

и другие.

Science, Год журнала: 2022, Номер 376(6588)

Опубликована: Март 31, 2022

Mobile elements and repetitive genomic regions are sources of lineage-specific innovation uniquely fingerprint individual genomes. Comprehensive analyses such repeat elements, including those found in more complex the genome, require a complete, linear genome assembly. We present de novo discovery annotation T2T-CHM13 human reference genome. identified previously unknown satellite arrays, expanded catalog variants families for repeats mobile characterized classes composite repeats, located retroelement transduction events. detected nascent transcription delineated CpG methylation profiles to define structure transcriptionally active retroelements humans, centromeres. These data expand our insight into diversity, distribution, evolution that have shaped

Язык: Английский

Процитировано

261