Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases DOI Creative Commons
Emily Alsentzer, Michelle M. Li, Shilpa N. Kobren

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Dec. 13, 2022

Abstract There are more than 7,000 rare diseases, some affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and heterogeneity of clinical presentations, approximately 70% individuals seeking a diagnosis today remain undiagnosed. Deep learning has demonstrated success aiding common diseases. However, existing approaches require labeled datasets thousands diagnosed per disease. Here, we present SHEPHERD, few shot approach for multi-faceted disease diagnosis. SHEPHERD performs deep over biomedical knowledge graph enriched information perform phenotype-driven Once trained, show that can provide insights about real-world patients. We evaluate on cohort N = 465 representing 299 (79% genes 83% represented only single patient) Undiagnosed Diseases Network. excels at several diagnostic facets: performing causal gene discovery (causal predicted rank 3.56 average), retrieving “patients-like-me” same disease, providing interpretable characterizations novel presentations. additionally examine two other cohorts, MyGene2 (N 146) Deciphering Developmental Disorders Study 1,431). demonstrates potential accelerate implications using medical very labels.

Language: Английский

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases DOI Creative Commons
Leonardo Chimirri, J. Harry Caufield, Yasemin Bridges

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 28, 2025

Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data to create LLMs such as Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but could be across globe support diagnostics if barriers overcome. Initial pilot studies on utility diagnosis languages other than English have shown promise, a large-scale assessment relative performance these variety European and non-European comprehensive corpus challenging rare-disease cases is lacking. We created 4967 clinical vignettes using structured captured with Human Phenotype Ontology (HPO) terms Global Alliance Genomics Health (GA4GH) Phenopacket Schema. These span total 378 distinct genetic diseases 2618 associated phenotypic features. translations together language-specific templates generate prompts English, Chinese, Czech, Dutch, German, Italian, Japanese, Spanish, Turkish. applied GPT-4o, version gpt-4o-2024-08-06, task delivering ranked zero-shot prompt. An ontology-based approach Mondo disease ontology was map synonyms subtypes diagnoses order automate evaluation LLM responses. For GPT-4o placed correct at first rank 19·8% within top-3 ranks 27·0% time. In comparison, eight non-English tested here 1 between 16·9% 20·5%, 25·3% 27·7% cases. consistent nine tested. This suggests that may settings. NHGRI 5U24HG011449 5RM1HG010860. P.N.R. supported by Professorship Alexander von Humboldt Foundation; P.L. National Grant (PMP21/00063 ONTOPREC-ISCIII, Fondos FEDER).

Language: Английский

Citations

0

Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases DOI Creative Commons
Emily Alsentzer, Michelle M. Li, Shilpa N. Kobren

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Dec. 13, 2022

Abstract There are more than 7,000 rare diseases, some affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and heterogeneity of clinical presentations, approximately 70% individuals seeking a diagnosis today remain undiagnosed. Deep learning has demonstrated success aiding common diseases. However, existing approaches require labeled datasets thousands diagnosed per disease. Here, we present SHEPHERD, few shot approach for multi-faceted disease diagnosis. SHEPHERD performs deep over biomedical knowledge graph enriched information perform phenotype-driven Once trained, show that can provide insights about real-world patients. We evaluate on cohort N = 465 representing 299 (79% genes 83% represented only single patient) Undiagnosed Diseases Network. excels at several diagnostic facets: performing causal gene discovery (causal predicted rank 3.56 average), retrieving “patients-like-me” same disease, providing interpretable characterizations novel presentations. additionally examine two other cohorts, MyGene2 (N 146) Deciphering Developmental Disorders Study 1,431). demonstrates potential accelerate implications using medical very labels.

Language: Английский

Citations

7