Database Resources of the National Genomics Data Center in 2020 DOI Creative Commons
Zhang Zhang, Wenming Zhao, Jingfa Xiao

et al.

Nucleic Acids Research, Journal Year: 2019, Volume and Issue: unknown

Published: Oct. 2, 2019

The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements higher-throughput lower-cost sequencing technologies accordingly huge volume multi-omics data generated at exponential scales rates, NGDC is continually expanding, updating enriching its core through big integration value-added curation. In past year, efforts for update have been mainly devoted BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas IC4R. Newly released include three human genome databases (PGG.SNV, PGG.Han CGVD), eLMSG, Hub, GWAS Atlas, iSheep PADS Arsenal. addition, four web services, namely, eGPS Cloud, BIG Search, Submission SSO, significantly improved enhanced. All these along with their services are publicly accessible https://bigd.big.ac.cn.

Language: Английский

Tobacco smoking and somatic mutations in human bronchial epithelium DOI
Kenichi Yoshida, Kate H.C. Gowers, Henry Lee-Six

et al.

Nature, Journal Year: 2020, Volume and Issue: 578(7794), P. 266 - 272

Published: Jan. 29, 2020

Language: Английский

Citations

437

The mutational landscape of normal human endometrial epithelium DOI
Luiza Moore, Daniel Leongamornlert, Tim H. H. Coorens

et al.

Nature, Journal Year: 2020, Volume and Issue: 580(7805), P. 640 - 646

Published: April 22, 2020

Language: Английский

Citations

428

FlyBase: updates to theDrosophila melanogasterknowledge base DOI Creative Commons
Aoife Larkin, Steven J Marygold, Giulia Antonazzo

et al.

Nucleic Acids Research, Journal Year: 2020, Volume and Issue: 49(D1), P. D899 - D907

Published: Oct. 22, 2020

FlyBase (flybase.org) is an essential online database for researchers using Drosophila melanogaster as a model organism, facilitating access to diverse array of information that includes genetic, molecular, genomic and reagent resources. Here, we describe the introduction several new features at FlyBase, including Pathway Reports, paralog information, disease models based on orthology, customizable tables within reports overview displays ('ribbons') expression data. We also variety recent important updates, incorporation developmental proteome, upgrades GAL4 search tab, additional Experimental Tool migration JBrowse genome browsing improvements batch queries/downloads Fast-Track Your Paper tool.

Language: Английский

Citations

420

TheCandidaGenome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data DOI Creative Commons
Marek S. Skrzypek,

Jonathan Binkley,

Gail Binkley

et al.

Nucleic Acids Research, Journal Year: 2016, Volume and Issue: 45(D1), P. D592 - D596

Published: Oct. 11, 2016

The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple species, along with web-based tools accessing, analyzing exploring these data. mission of CGD to facilitate accelerate research into pathogenesis biology, by curating the scientific literature in real time, connecting literature-derived annotations latest version genomic its annotations. Here, we report incorporation Assembly 22, first chromosome-level, phased diploid assembly C. albicans genome, coupled improvements have made using additional We also creation systematic identifiers genes features system similar adopted yeast community over two decades ago. Finally, describe JBrowse CGD, which allows browsing mapped high throughput sequencing data, implementation several RNA-Seq data sets, as well whole genome was used construction 22.

Language: Английский

Citations

393

VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center DOI Creative Commons
B Kirtley Amos,

Cristina Aurrecoechea,

Matthieu Barba

et al.

Nucleic Acids Research, Journal Year: 2021, Volume and Issue: 50(D1), P. D898 - D911

Published: Oct. 6, 2021

Abstract The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with EuPathDB projects. As a Bioinformatics Center funded by National Institutes Health, additional support from Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists fungi) relevant free-living or non-pathogenic species hosts. Designed to empower researchers access Omics data bioinformatic analyses, projects integrate >1700 pre-analysed datasets (and associated metadata) advanced search capabilities, visualizations, analysis tools in graphic interface. Diverse types are analysed standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons easily made across datasets, this unique mining platform. A new site-wide facilitates both experienced novice users. Upgraded infrastructure numerous updates web interface, tools, searches strategies, Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded workspace, interrogating host-pathogen interactions, improved interactions affiliated databases (ClinEpiDB, MicrobiomeDB) other scientific resources, increased interoperability Bacterial & Viral BRC.

Language: Английский

Citations

393

Petabase-scale sequence alignment catalyses viral discovery DOI Creative Commons
R. C. Edgar,

Brie Taylor,

Victor S.-Y. Lin

et al.

Nature, Journal Year: 2022, Volume and Issue: 602(7895), P. 142 - 147

Published: Jan. 26, 2022

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by lack efficient methods for searching this corpus, which (at the time writing) exceeds 20 petabases and is growing exponentially1. Here we developed cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) hallmark gene RNA-dependent RNA polymerase identified well over 105 novel viruses, thereby expanding number known species roughly an order magnitude. characterized viruses related coronaviruses, hepatitis delta virus huge phages, respectively, analysed environmental reservoirs. To catalyse ongoing revolution viral discovery, established free comprehensive database these data tools. Expanding diversity can reveal evolutionary origins emerging pathogens improve pathogen surveillance anticipation mitigation future pandemics.

Language: Английский

Citations

386

15 years of GDR: New data and functionality in the Genome Database for Rosaceae DOI Creative Commons
Sook Jung,

Taein Lee,

Chun-Huai Cheng

et al.

Nucleic Acids Research, Journal Year: 2018, Volume and Issue: 47(D1), P. D1137 - D1145

Published: Oct. 9, 2018

The Genome Database for Rosaceae (GDR, https://www.rosaceae.org) is an integrated web-based community database resource providing access to publicly available genomics, genetics and breeding data data-mining tools facilitate basic, translational applied research in Rosaceae. volume of GDR has increased greatly over the last 5 years. now houses multiple versions whole genome assembly annotation from 14 species, made by recent advances sequencing technology. Annotated searchable reference transcriptomes, RefTrans, combining peer-reviewed published RNA-Seq as well EST datasets, are newly major crop species. Significantly more quantitative trait loci, genetic maps markers MapViewer, a new visualization tool that better integrates with other pages GDR. Pathways can be accessed through Cyc databases, synteny among newest assemblies eight species viewed browser, SynView. Collated single-nucleotide polymorphism diversity phenotypic datasets relevant data. Also, Breeding Information Management System allows breeders upload, manage analyze their private within secure server option release publicly.

Language: Английский

Citations

375

CottonFGD: an integrated functional genomics database for cotton DOI Creative Commons
Τao Zhu, Chengzhen Liang,

Zhigang Meng

et al.

BMC Plant Biology, Journal Year: 2017, Volume and Issue: 17(1)

Published: June 8, 2017

Cotton (Gossypium spp.) is the most important fiber and oil crop in world. With emergence of huge -omics data sets, it essential to have an integrated functional genomics database that allows worldwide users quickly easily fetch visualize genomic information. Currently available cotton-related databases some weakness integrating multiple kinds from Gossypium species. Therefore, necessary establish for cotton. We developed CottonFGD (Cotton Functional Genomic Database, https://cottonfgd.org ), includes sequences, gene structural annotations, genetic marker data, transcriptome population genome resequencing all four sequenced It consists three interconnected modules: search, profile, analysis. These modules make enable both single review batch analysis with also additional pages statistics, bulk download, a detailed user manual. Equipped specialized modernized visualization tools, populated provides quick easy-to-use platform cotton researchers worldwide.

Language: Английский

Citations

343

REDIportal: a comprehensive database of A-to-I RNA editing events in humans DOI Creative Commons
Ernesto Picardi, Anna Maria D’Erchia, Claudio Lo Giudice

et al.

Nucleic Acids Research, Journal Year: 2016, Volume and Issue: 45(D1), P. D750 - D757

Published: Sept. 1, 2016

RNA editing by A-to-I deamination is the prominent co-/post-transcriptional modification in humans. It carried out ADAR enzymes and contributes to both transcriptomic proteomic expansion. has pivotal cellular effects its deregulation been linked a variety of human disorders including neurological neurodegenerative diseases cancer. Despite biological relevance, many physiological functional aspects are yet elusive. Here, we present REDIportal, available online at http://srv00.recas.ba.infn.it/atlas/, largest comprehensive collection humans more than 4.5 millions events detected 55 body sites from thousands RNAseq experiments. REDIportal embeds RADAR database represents first resource designed answer questions, enabling inspection browsing levels samples, tissues sites. In contrast with previous databases, comprises own browser (JBrowse) that allows users explore changes their genomic context, empathizing repetitive elements which prominent.

Language: Английский

Citations

315

HTSlib: C library for reading/writing high-throughput sequencing data DOI Creative Commons
James Bonfield, John Marshall, Petr Danecek

et al.

GigaScience, Journal Year: 2021, Volume and Issue: 10(2)

Published: Jan. 29, 2021

Since the original publication of VCF and SAM formats, an explosion software tools have been created to process these data files. To facilitate this a library was produced out SAMtools implementation, with focus on performance robustness. The file formats themselves become international standards under jurisdiction Global Alliance for Genomics Health.

Language: Английский

Citations

311