Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 28, 2025
Large
language
models
(LLMs)
are
increasingly
used
in
the
medical
field
for
diverse
applications
including
differential
diagnostic
support.
The
estimated
training
data
to
create
LLMs
such
as
Generative
Pretrained
Transformer
(GPT)
predominantly
consist
of
English-language
texts,
but
could
be
across
globe
support
diagnostics
if
barriers
overcome.
Initial
pilot
studies
on
utility
diagnosis
languages
other
than
English
have
shown
promise,
a
large-scale
assessment
relative
performance
these
variety
European
and
non-European
comprehensive
corpus
challenging
rare-disease
cases
is
lacking.
We
created
4967
clinical
vignettes
using
structured
captured
with
Human
Phenotype
Ontology
(HPO)
terms
Global
Alliance
Genomics
Health
(GA4GH)
Phenopacket
Schema.
These
span
total
378
distinct
genetic
diseases
2618
associated
phenotypic
features.
translations
together
language-specific
templates
generate
prompts
English,
Chinese,
Czech,
Dutch,
German,
Italian,
Japanese,
Spanish,
Turkish.
applied
GPT-4o,
version
gpt-4o-2024-08-06,
task
delivering
ranked
zero-shot
prompt.
An
ontology-based
approach
Mondo
disease
ontology
was
map
synonyms
subtypes
diagnoses
order
automate
evaluation
LLM
responses.
For
GPT-4o
placed
correct
at
first
rank
19·8%
within
top-3
ranks
27·0%
time.
In
comparison,
eight
non-English
tested
here
1
between
16·9%
20·5%,
25·3%
27·7%
cases.
consistent
nine
tested.
This
suggests
that
may
settings.
NHGRI
5U24HG011449
5RM1HG010860.
P.N.R.
supported
by
Professorship
Alexander
von
Humboldt
Foundation;
P.L.
National
Grant
(PMP21/00063
ONTOPREC-ISCIII,
Fondos
FEDER).
Language: Английский
Evaluation of the Diagnostic Accuracy of GPT-4 in Five Thousand Rare Disease Cases
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 22, 2024
Large
language
models
(LLMs)
show
promise
in
supporting
differential
diagnosis,
but
their
performance
is
challenging
to
evaluate
due
the
unstructured
nature
of
responses.
To
assess
current
capabilities
LLMs
diagnose
genetic
diseases,
we
benchmarked
these
on
5,213
case
reports
using
Phenopacket
Schema,
Human
Phenotype
Ontology
and
Mondo
disease
ontology.
Prompts
generated
from
each
phenopacket
were
sent
three
generative
pretrained
transformer
(GPT)
models.
The
same
phenopackets
used
as
input
a
widely
diagnostic
tool,
Exomiser,
phenotype-only
mode.
best
LLM
ranked
correct
diagnosis
first
23.6%
cases,
whereas
Exomiser
did
so
35.5%
cases.
While
for
has
been
improving,
it
not
reached
level
commonly
traditional
bioinformatics
tools.
Future
research
needed
determine
approach
incorporate
into
pipelines.
Language: Английский
The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 22, 2024
Abstract
Phenotypic
data
are
critical
for
understanding
biological
mechanisms
and
consequences
of
genomic
variation,
pivotal
clinical
use
cases
such
as
disease
diagnostics
treatment
development.
For
over
a
century,
vast
quantities
phenotype
have
been
collected
in
many
different
contexts
covering
variety
organisms.
The
emerging
field
phenomics
focuses
on
integrating
interpreting
these
to
inform
hypotheses.
A
major
impediment
is
the
wide
range
distinct
disconnected
approaches
recording
observable
characteristics
an
organism.
Phenotype
curated
using
free
text,
single
terms
or
combinations
terms,
multiple
vocabularies,
terminologies,
ontologies.
Integrating
heterogeneous
often
siloed
enables
application
knowledge
both
within
across
species.
Existing
integration
efforts
typically
limited
mappings
between
pairs
terminologies;
generic
representation
that
captures
full
cross-species
much
needed.
We
developed
Unified
Ontology
(uPheno)
framework,
community
effort
provide
layer
domain-specific
ontologies,
single,
unified,
logical
representation.
uPheno
comprises
(1)
system
consistent
computational
definition
ontology
design
patterns,
maintained
library;
(2)
hierarchical
vocabulary
species-neutral
under
which
their
species-specific
counterparts
grouped;
(3)
mapping
tables
This
harmonized
supports
genotype-phenotype
associations
from
organisms
informed
variant
prioritization.
Language: Английский