The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics DOI Creative Commons
Nicolas Matentzoglu, Susan M. Bello, Ray Stefancsik

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 22, 2024

Abstract Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, pivotal clinical use cases such as disease diagnostics treatment development. For over a century, vast quantities phenotype have been collected in many different contexts covering variety organisms. The emerging field phenomics focuses on integrating interpreting these to inform hypotheses. A major impediment is the wide range distinct disconnected approaches recording observable characteristics an organism. Phenotype curated using free text, single terms or combinations terms, multiple vocabularies, terminologies, ontologies. Integrating heterogeneous often siloed enables application knowledge both within across species. Existing integration efforts typically limited mappings between pairs terminologies; generic representation that captures full cross-species much needed. We developed Unified Ontology (uPheno) framework, community effort provide layer domain-specific ontologies, single, unified, logical representation. uPheno comprises (1) system consistent computational definition ontology design patterns, maintained library; (2) hierarchical vocabulary species-neutral under which their species-specific counterparts grouped; (3) mapping tables This harmonized supports genotype-phenotype associations from organisms informed variant prioritization.

Language: Английский

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases DOI Creative Commons
Leonardo Chimirri, J. Harry Caufield, Yasemin Bridges

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 28, 2025

Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data to create LLMs such as Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but could be across globe support diagnostics if barriers overcome. Initial pilot studies on utility diagnosis languages other than English have shown promise, a large-scale assessment relative performance these variety European and non-European comprehensive corpus challenging rare-disease cases is lacking. We created 4967 clinical vignettes using structured captured with Human Phenotype Ontology (HPO) terms Global Alliance Genomics Health (GA4GH) Phenopacket Schema. These span total 378 distinct genetic diseases 2618 associated phenotypic features. translations together language-specific templates generate prompts English, Chinese, Czech, Dutch, German, Italian, Japanese, Spanish, Turkish. applied GPT-4o, version gpt-4o-2024-08-06, task delivering ranked zero-shot prompt. An ontology-based approach Mondo disease ontology was map synonyms subtypes diagnoses order automate evaluation LLM responses. For GPT-4o placed correct at first rank 19·8% within top-3 ranks 27·0% time. In comparison, eight non-English tested here 1 between 16·9% 20·5%, 25·3% 27·7% cases. consistent nine tested. This suggests that may settings. NHGRI 5U24HG011449 5RM1HG010860. P.N.R. supported by Professorship Alexander von Humboldt Foundation; P.L. National Grant (PMP21/00063 ONTOPREC-ISCIII, Fondos FEDER).

Language: Английский

Citations

0

Evaluation of the Diagnostic Accuracy of GPT-4 in Five Thousand Rare Disease Cases DOI Creative Commons
Justin Reese, Leonardo Chimirri, Yasemin Bridges

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 22, 2024

Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due the unstructured nature of responses. To assess current capabilities LLMs diagnose genetic diseases, we benchmarked these on 5,213 case reports using Phenopacket Schema, Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent three generative pretrained transformer (GPT) models. The same phenopackets used as input a widely diagnostic tool, Exomiser, phenotype-only mode. best LLM ranked correct diagnosis first 23.6% cases, whereas Exomiser did so 35.5% cases. While for has been improving, it not reached level commonly traditional bioinformatics tools. Future research needed determine approach incorporate into pipelines.

Language: Английский

Citations

2

The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics DOI Creative Commons
Nicolas Matentzoglu, Susan M. Bello, Ray Stefancsik

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 22, 2024

Abstract Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, pivotal clinical use cases such as disease diagnostics treatment development. For over a century, vast quantities phenotype have been collected in many different contexts covering variety organisms. The emerging field phenomics focuses on integrating interpreting these to inform hypotheses. A major impediment is the wide range distinct disconnected approaches recording observable characteristics an organism. Phenotype curated using free text, single terms or combinations terms, multiple vocabularies, terminologies, ontologies. Integrating heterogeneous often siloed enables application knowledge both within across species. Existing integration efforts typically limited mappings between pairs terminologies; generic representation that captures full cross-species much needed. We developed Unified Ontology (uPheno) framework, community effort provide layer domain-specific ontologies, single, unified, logical representation. uPheno comprises (1) system consistent computational definition ontology design patterns, maintained library; (2) hierarchical vocabulary species-neutral under which their species-specific counterparts grouped; (3) mapping tables This harmonized supports genotype-phenotype associations from organisms informed variant prioritization.

Language: Английский

Citations

0