A teaching and training framework to promote findable, accessible, interoperable, and reusable data generation in agriculture DOI Creative Commons
Annarita Marrano, Leyla Cabugos, Alenka Hafner

et al.

Database, Journal Year: 2025, Volume and Issue: 2025

Published: Jan. 1, 2025

Abstract Advances in agricultural genetic, genomic, and breeding (GGB) technologies generate increasingly large complex datasets that need to be adequately managed shared. While several biological databases maintain curate GGB data, not all scientists are aware of them how they can used access share data. In addition, there is the increase scientists’ awareness appropriate data archiving curation increases longevity value bolsters scientific discoveries’ reproducibility transparency. The AgBioData Education working group aims address these unmet needs developed a modular curriculum for educators teaching basics findable, accessible, interoperable, reusable (FAIR) principles undergraduate graduate students (https://www.agbiodata.org/). present paper provides an overview topics covered within curriculum, called ‘AgBioData Curriculum Ag FAIR Data,’ its audience modalities, it will positively impact different stakeholders database ecosystem. We hope presented here help understand support use aspects improving our global food system. Database URL: https://zenodo.org/records/14278084

Language: Английский

Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data DOI Open Access
Zhiqiang Pang,

Guangyan Zhou,

Jessica Ewald

et al.

Nature Protocols, Journal Year: 2022, Volume and Issue: 17(8), P. 1735 - 1761

Published: June 17, 2022

Language: Английский

Citations

1064

Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities DOI Creative Commons
Matteo Chiara, Anna Maria D’Erchia, Carmela Gissi

et al.

Briefings in Bioinformatics, Journal Year: 2020, Volume and Issue: 22(2), P. 616 - 630

Published: Oct. 8, 2020

Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding evolution of infectious agents, investigating spread transmission chains outbreaks, as well facilitating development effective rapid molecular diagnostic tests contributing to hunt treatments vaccines. The ongoing COVID-19 pandemic poses one greatest global threats modern history has already caused severe social economic costs. efficient methods reconstruct genomic sequence SARS-CoV-2, etiological agent COVID-19, fundamental design devise measures mitigate diffusion pandemic. Diverse approaches can, testified by number available sequences, be applied SARS-CoV-2 genomes. However, each technology approach its own advantages limitations. In current review, we will provide a brief, but hopefully comprehensive, account currently platforms methodological We also present an outline repositories databases that access data associated metadata. Finally, offer general advice guidelines appropriate sharing deposition metadata, suggest more standardized integration future SARS-CoV-2-related would greatly facilitate struggle against this new pathogen. hope our 'vademecum' production handling data, contribute objective.

Language: Английский

Citations

205

Genomes OnLine Database (GOLD) v.8: overview and updates DOI Creative Commons
Supratim Mukherjee,

Dimitri Stamatis,

Jon Bertsch

et al.

Nucleic Acids Research, Journal Year: 2020, Volume and Issue: 49(D1), P. D723 - D733

Published: Oct. 19, 2020

Abstract The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) is a manually curated, daily updated collection of genome projects and their metadata accumulated from around the world. current version database includes over 1.17 million entries organized broadly into Studies (45 770), Organisms (387 382) or Biosamples (101 207), Sequencing Projects (355 364) Analysis (283 481). These four levels contain 600 fields, which 76 controlled vocabulary (CV) tables containing 3873 terms. GOLD provides an interactive web user interface for browsing searching by wide range project fields. Users can enter details about own in GOLD, acts as gatekeeper to ensure that accurately documented before submitting sequence information Integrated Microbial (IMG) system analysis. In order maintain reference dataset use members scientific community, also imports public repositories such GenBank SRA. status database, along with recent updates improvements are described this manuscript.

Language: Английский

Citations

189

Packaging research artefacts with RO-Crate DOI Creative Commons
Stian Soiland‐Reyes, Peter Sefton, Mercè Crosas

et al.

Data Science, Journal Year: 2022, Volume and Issue: 5(2), P. 97 - 138

Published: Jan. 4, 2022

An increasing number of researchers support reproducibility by including pointers to and descriptions datasets, software methods in their publications. However, scientific articles may be ambiguous, incomplete difficult process automated systems. In this paper we introduce RO-Crate, an open, community-driven, lightweight approach packaging research artefacts along with metadata a machine readable manner. RO-Crate is based on Schema$.$org annotations JSON-LD, aiming establish best practices formally describe accessible practical way for use wide variety situations. structured archive all the items that contributed outcome, identifiers, provenance, relations annotations. As general purpose data metadata, used across multiple areas, bioinformatics, digital humanities regulatory sciences. By applying "just enough" Linked Data standards, simplifies making outputs FAIR while also enhancing reproducibility. article available at https://w3id.org/ro/doi/10.5281/zenodo.5146227

Language: Английский

Citations

118

Data Integration Challenges for Machine Learning in Precision Medicine DOI Creative Commons
Mireya Martínez-García, Enrique Hernández-Lemus

Frontiers in Medicine, Journal Year: 2022, Volume and Issue: 8

Published: Jan. 25, 2022

A main goal of Precision Medicine is that incorporating and integrating the vast corpora on different databases about molecular environmental origins disease, into analytic frameworks, allowing development individualized, context-dependent diagnostics, therapeutic approaches. In this regard, artificial intelligence machine learning approaches can be used to build analytical models complex disease aimed at prediction personalized health conditions outcomes. Such must handle wide heterogeneity individuals in both their genetic predisposition social determinants. Computational medicine need able efficiently manage, visualize integrate, large datasets combining structure, unstructured formats. This needs done while constrained by levels confidentiality, ideally doing so within a unified architecture. Efficient data integration management key successful application computational medicine. number challenges arise design designs medical analytics under currently demanding performance medicine, also subject time, power, bioethical constraints. Here, we will review some these constraints discuss possible avenues overcome current challenges.

Language: Английский

Citations

84

Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9 DOI Creative Commons
Supratim Mukherjee,

Dimitri Stamatis,

Cindy Tianqing Li

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D957 - D963

Published: Oct. 16, 2022

Abstract The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute (DOE-JGI) continues to maintain its role as one flagship genomic metadata repositories world. ever-increasing number projects and are freely available user community world-wide. GOLD’s is consumed by scientists remains an important source for large-scale comparative genomics analysis initiatives. Encouraged this active engagement growth, GOLD has continued add new components capabilities. features such a public Application Programming Interface (API) Ecosystem landing page well growth different entities in current v.9 edition described detail manuscript.

Language: Английский

Citations

73

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research DOI Creative Commons
Franziska Hufsky, Kevin Lamkiewicz, Alexandre Almeida

et al.

Briefings in Bioinformatics, Journal Year: 2020, Volume and Issue: 22(2), P. 642 - 663

Published: Oct. 30, 2020

Abstract SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The causes infectious disease COVID-19. biology coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly have only recently developed as rapid reaction to need fast detection, understanding and treatment To control ongoing COVID-19 pandemic, it utmost importance get insight into evolution pathogenesis virus. In this review, we cover workflows routine detection infection, reliable analysis sequencing data, tracking pandemic evaluation containment measures, study evolution, discovery potential drug targets development therapeutic strategies. For each tool, briefly describe its use case how advances research specifically SARS-CoV-2. All are free available online, either through web applications or public code repositories. Contact:[email protected]

Language: Английский

Citations

139

Importance of timely metadata curation to the global surveillance of genetic diversity DOI Creative Commons
Eric D. Crandall, Rachel H. Toczydlowski, Libby Liggins

et al.

Conservation Biology, Journal Year: 2023, Volume and Issue: 37(4)

Published: Jan. 27, 2023

Abstract Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic can indicate resilience to changing climate, its measurement is relevant many national and global conservation policy targets. Many studies produce large amounts genome‐scale data for wild populations, but most (87%) do not include the associated spatial temporal metadata necessary them be reused in monitoring programs or acknowledging sovereignty nations Indigenous peoples. We undertook distributed datathon quantify availability these missing test hypothesis that their decays with time. also worked remediate by extracting from published papers, online repositories, direct communication authors. Starting 848 candidate genomic sets (reduced representation whole genome) International Nucleotide Sequence Database Collaboration, we determined 561 contained mostly samples populations. successfully restored spatiotemporal 78% ( n = 440 on 45,105 individuals 762 17 phyla). Examining papers repositories was much more fruitful than contacting 351 authors, who replied our email requests 45% Overall, 23% queries authors unearthed useful metadata. The probability retrieving declined significantly as age set increased. There 13.5% yearly decrease up 22% were only available This rapid decay availability, mirrored other types biological data, should motivate swift updates data‐sharing policies researcher practices ensure valuable context provided lost science forever.

Language: Английский

Citations

30

Challenges and opportunities in sharing microbiome data and analyses DOI Open Access
Curtis Huttenhower, ROBERT FINN, Alice C. McHardy

et al.

Nature Microbiology, Journal Year: 2023, Volume and Issue: 8(11), P. 1960 - 1970

Published: Oct. 2, 2023

Language: Английский

Citations

25

The Infectious Disease Ontology in the age of COVID-19 DOI Creative Commons
Shane Babcock, John Beverley, Lindsay G. Cowell

et al.

Journal of Biomedical Semantics, Journal Year: 2021, Volume and Issue: 12(1)

Published: July 18, 2021

Abstract Background Effective response to public health emergencies, such as we are now experiencing with COVID-19, requires data sharing across multiple disciplines and systems. Ontologies offer a powerful tool, this holds especially for those ontologies built on the design principles of Open Biomedical Foundry. These exemplified by Infectious Disease Ontology (IDO), suite interoperable ontology modules aiming provide coverage all aspects infectious disease domain. At its center is IDO Core, disease- pathogen-neutral covering just types entities relations that relevant diseases generally. Core extended pathogen-specific modules. Results To assist integration analysis COVID-19 data, viral more generally, have recently developed three new extensions: Virus (VIDO); Coronavirus (CIDO); an extension CIDO focusing (IDO-COVID-19). Reflecting fact viruses lack cellular parts, introduced into term acellular structure cover other studied virologists. We distinguish between agents – organisms disposition structures disposition. This in turn has led various updates refinements Core’s content. believe our work VIDO, CIDO, IDO-COVID-19 can serve model yielding greater conformance building best practices. Conclusions provides simple recipe way allows about novel be easily compared, along dimensions, represented existing ontologies. The strategy, moreover, supports coordination, providing method physicians, researchers, organizations respond rapidly efficiently current future crises.

Language: Английский

Citations

55