
Cell, Journal Year: 2018, Volume and Issue: 176(4), P. 816 - 830.e18
Published: Dec. 27, 2018
Language: Английский
Cell, Journal Year: 2018, Volume and Issue: 176(4), P. 816 - 830.e18
Published: Dec. 27, 2018
Language: Английский
Science, Journal Year: 2022, Volume and Issue: 376(6588), P. 44 - 53
Published: March 31, 2022
Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be protein coding. The completed include centromeric satellite arrays, recent segmental duplications, short arms five acrocentric chromosomes, unlocking these complex variational functional studies.
Language: Английский
Citations
2175Nature, Journal Year: 2021, Volume and Issue: 592(7856), P. 737 - 746
Published: April 28, 2021
Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, biodiversity conservation. However, such available only a few non-microbial species 1–4 . To address this issue, international Genome 10K (G10K) consortium 5,6 has worked over five-year period evaluate develop cost-effective methods assembling highly accurate nearly genomes. Here we present lessons learned from generating 16 that represent six major vertebrate lineages. We confirm long-read sequencing technologies essential maximizing quality, unresolved complex repeats haplotype heterozygosity sources assembly error when not handled correctly. Our correct substantial errors, add missing sequence in some best historical genomes, reveal biological discoveries. These include identification many false gene duplications, increases sizes, chromosome rearrangements specific lineages, repeated independent breakpoint bat canonical GC-rich pattern protein-coding genes their regulatory regions. Adopting these lessons, have embarked on Vertebrate Genomes Project (VGP), an effort generate high-quality, genomes all roughly 70,000 extant help enable new era discovery across life sciences.
Language: Английский
Citations
2020Bioinformatics, Journal Year: 2019, Volume and Issue: 36(1), P. 311 - 316
Published: July 10, 2019
Abstract Motivation Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply data resolution when such stored in dense form. Hence, there is a pressing need to develop storage strategies that handle the full range of useful resolutions multidimensional by taking advantage their sparse nature, while supporting efficient compression providing fast random access facilitate development scalable algorithms for analysis. Results We developed file format called cooler, based on model, can support genomically labeled matrices at any resolution. It has flexibility accommodate various descriptions axes (genomic coordinates, tracks bin annotations), resolutions, density patterns metadata. Cooler HDF5 supported Python library command line suite create, read, inspect manipulate cooler collections. The been adopted as standard NIH 4D Nucleome Consortium. Availability implementation cross-platform, BSD-licensed be installed from package index or bioconda repository. source code maintained Github https://github.com/mirnylab/cooler. Supplementary information available Bioinformatics online.
Language: Английский
Citations
1509GigaScience, Journal Year: 2021, Volume and Issue: 10(1)
Published: Jan. 1, 2021
Abstract Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free is therefore ultimate, but sadly still unachieved goal a multitude research projects. Despite ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near genome eukaryotes. Whilst working towards improved datasets fully evaluation curation actively used to bridge this shortcoming significantly reduce number errors. In addition increase product value, insights gained from are fed back into strategy contribute notable quality. We describe tried tested using gEVAL, browser. outline procedures applied gEVAL also recommendations gEVAL-independent context facilitate uptake wider community.
Language: Английский
Citations
1304Nature, Journal Year: 2017, Volume and Issue: 551(7678), P. 51 - 56
Published: Sept. 27, 2017
Language: Английский
Citations
1132Nature, Journal Year: 2019, Volume and Issue: 570(7761), P. 395 - 399
Published: June 1, 2019
Language: Английский
Citations
584Molecular Cell, Journal Year: 2020, Volume and Issue: 78(3), P. 539 - 553.e8
Published: March 25, 2020
Language: Английский
Citations
545Nature, Journal Year: 2021, Volume and Issue: 593(7858), P. 238 - 243
Published: April 7, 2021
Language: Английский
Citations
533Genome biology, Journal Year: 2022, Volume and Issue: 23(1)
Published: Dec. 15, 2022
Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, establish chromosome-scale reference genomes the widely used tomato genotype M82 along with Sweet-100, new rapid-cycling that developed to accelerate functional editing in tomato. This work outlines strategies rapidly expand genomic resources other plant species.
Language: Английский
Citations
465Genome biology, Journal Year: 2018, Volume and Issue: 19(1)
Published: Oct. 4, 2018
Here, we introduce the 3D Genome Browser, http://3dgenome.org , which allows users to conveniently explore both their own and over 300 publicly available chromatin interaction data of different types. We design a new binary format for Hi-C that reduces file size by at least magnitude visualize interactions millions base pairs within seconds. Our browser provides multiple methods linking distal cis-regulatory elements with potential target genes. Users can seamlessly integrate thousands other omics gain comprehensive view regulatory landscape genome structure.
Language: Английский
Citations
463