The Ontology of Biological Attributes (OBA)—computational traits for the life sciences DOI Creative Commons
Ray Stefancsik, James P. Balhoff, Meghan A. Balk

et al.

Mammalian Genome, Journal Year: 2023, Volume and Issue: 34(3), P. 364 - 378

Published: April 19, 2023

Abstract Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation wild-type or other reference. However, these do not include the phenotypic trait attribute categories required for annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings any population-focussed measurable data. The integration and biological information with an ever increasing body chemical, environmental data greatly facilitates computational analyses it is also highly relevant biomedical clinical applications. Ontology Biological Attributes (OBA) formalised, species-independent collection interoperable intended fulfil role. OBA standardised representational framework observable attributes are characteristics entities, organisms, parts organisms. has modular design which provides several benefits users integrators, including automated meaningful classification terms computed on basis logical inferences drawn from domain-specific cells, anatomical entities. axioms provide previously missing bridge can computationally link Mendelian GWAS quantitative traits. term components semantic links enable knowledge across specialised research community boundaries, thereby breaking silos.

Language: Английский

The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update DOI Creative Commons

Linelle Ann L Abueg,

Enis Afgan,

Olivier Allart

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 52(W1), P. W83 - W94

Published: May 20, 2024

Abstract Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public services by platform stability, tool and reference dataset diversity, training, support integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements accessibility, discoverability Labs/subdomains, a redesigned ToolShed. capabilities progressing two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, licensed support. Engagement with global consortia being increased developing more workflows resourcing run them. The Training Network (GTN) portfolio grown both size, learning paths direct integration tools feature training courses. Code development continues line Project roadmap, job scheduling interface. Environmental impact assessment also helping engage users developers, reminding them their role sustainability, displaying estimated CO2 emissions generated job.

Language: Английский

Citations

177

Packaging research artefacts with RO-Crate DOI Creative Commons
Stian Soiland‐Reyes, Peter Sefton, Mercè Crosas

et al.

Data Science, Journal Year: 2022, Volume and Issue: 5(2), P. 97 - 138

Published: Jan. 4, 2022

An increasing number of researchers support reproducibility by including pointers to and descriptions datasets, software methods in their publications. However, scientific articles may be ambiguous, incomplete difficult process automated systems. In this paper we introduce RO-Crate, an open, community-driven, lightweight approach packaging research artefacts along with metadata a machine readable manner. RO-Crate is based on Schema$.$org annotations JSON-LD, aiming establish best practices formally describe accessible practical way for use wide variety situations. structured archive all the items that contributed outcome, identifiers, provenance, relations annotations. As general purpose data metadata, used across multiple areas, bioinformatics, digital humanities regulatory sciences. By applying "just enough" Linked Data standards, simplifies making outputs FAIR while also enhancing reproducibility. article available at https://w3id.org/ro/doi/10.5281/zenodo.5146227

Language: Английский

Citations

122

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space DOI
Michael C. Schatz, Anthony Philippakis, Enis Afgan

et al.

Cell Genomics, Journal Year: 2022, Volume and Issue: 2(1), P. 100085 - 100085

Published: Jan. 1, 2022

Language: Английский

Citations

104

Genomic newborn screening for rare diseases DOI
Zornitza Stark, Richard H. Scott

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(11), P. 755 - 766

Published: June 29, 2023

Language: Английский

Citations

102

The GA4GH Phenopacket schema defines a computable representation of clinical data DOI
Julius O.B. Jacobsen, Michael Baudis, Gareth Baynam

et al.

Nature Biotechnology, Journal Year: 2022, Volume and Issue: 40(6), P. 817 - 820

Published: June 1, 2022

Language: Английский

Citations

77

Sequence modeling and design from molecular to genome scale with Evo DOI

Eric Nguyen,

Michael Poli, Matthew G. Durrant

et al.

Science, Journal Year: 2024, Volume and Issue: 386(6723)

Published: Nov. 14, 2024

The genome is a sequence that encodes the DNA, RNA, and proteins orchestrate an organism’s function. We present Evo, long-context genomic foundation model with frontier architecture trained on millions of prokaryotic phage genomes, report scaling laws DNA to complement observations in language vision. Evo generalizes across proteins, enabling zero-shot function prediction competitive domain-specific models generation functional CRISPR-Cas transposon systems, representing first examples protein-RNA protein-DNA codesign model. also learns how small mutations affect whole-organism fitness generates megabase-scale sequences plausible architecture. These capabilities span molecular scales complexity, advancing our understanding control biology.

Language: Английский

Citations

70

Sequence modeling and design from molecular to genome scale with Evo DOI Creative Commons
Éric Nguyen, Michael Poli, Matthew G. Durrant

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 27, 2024

The genome is a sequence that completely encodes the DNA, RNA, and proteins orchestrate function of whole organism. Advances in machine learning combined with massive datasets genomes could enable biological foundation model accelerates mechanistic understanding generative design complex molecular interactions. We report Evo, genomic enables prediction generation tasks from to scale. Using an architecture based on advances deep signal processing, we scale Evo 7 billion parameters context length 131 kilobases (kb) at single-nucleotide, byte resolution. Trained prokaryotic genomes, can generalize across three fundamental modalities central dogma biology perform zero-shot competitive with, or outperforms, leading domain-specific language models. also excels multi-element tasks, which demonstrate by generating synthetic CRISPR-Cas complexes entire transposable systems for first time. information learned over predict gene essentiality nucleotide resolution generate coding-rich sequences up 650 kb length, orders magnitude longer than previous methods. multi-modal multi-scale provides promising path toward improving our control multiple levels complexity.

Language: Английский

Citations

53

Australian Genomics: Outcomes of a 5-year national program to accelerate the integration of genomics in healthcare DOI Creative Commons
Zornitza Stark, Tiffany Boughtwood, Matilda Haas

et al.

The American Journal of Human Genetics, Journal Year: 2023, Volume and Issue: 110(3), P. 419 - 426

Published: March 1, 2023

Language: Английский

Citations

45

PRECISION MEDICINE AND GENOMICS: A COMPREHENSIVE REVIEW OF IT-ENABLED APPROACHES DOI Creative Commons
Francisca Chibugo Udegbe,

Ogochukwu Roseline Ebulue,

Charles Chukwudalu Ebulue

et al.

International Medical Science Research Journal, Journal Year: 2024, Volume and Issue: 4(4), P. 509 - 520

Published: April 20, 2024

This review delves into Information Technology's (IT) transformative impact on precision medicine and genomics, spotlighting the pivotal role of bioinformatics, data mining, machine learning, blockchain technologies in advancing personalized healthcare. A comprehensive analysis outlines how these IT-enabled approaches facilitate analysis, interpretation, application vast genomic sets, thereby enhancing disease prediction, diagnosis, treatment an individual level. Despite promising advancements, also addresses significant challenges, including complexity, interoperability, ethical considerations, digital divide, underscoring necessity for multidisciplinary collaboration innovation to overcome hurdles. The paper concludes by emphasizing potential emerging patient-centred care realizing vision medicine, which promises improved healthcare outcomes through strategies. Keywords: Precision Medicine, Genomics, Bioinformatics, Machine Learning, Data Security.

Language: Английский

Citations

27

Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet DOI Creative Commons
Coby Viner, Charles A. Ishak, James Johnson

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: Jan. 8, 2024

Abstract Background Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription can distinguish between unmodified and modified bases. Current models of factor binding tend not take modifications into account, while the recent few that do often have limitations. This makes a comprehensive accurate profiling affinities difficult. Results Here, we develop methods identify sites DNA. Our expand standard /// alphabet include cytosine modifications. We Cytomod create genomic sequences also enhance MEME Suite, adding capacity handle custom alphabets. adapt well-established position weight matrix (PWM) model affinity this expanded alphabet. Using these methods, modification-sensitive motifs. confirm established preferences, such as preference ZFP57 C/EBPβ for methylated motifs c-Myc unmethylated E-box Conclusions known preferences tune parameters, discover novel wide array factors. Finally, validate our predictions OCT4 using cleavage under targets release nuclease (CUT&RUN) experiments across conventional, methylation-, hydroxymethylation-enriched sequences. approach readily extends other As more genome-wide single-base resolution modification data becomes available, expect method will yield insights altered many different

Language: Английский

Citations

19