Empowering Radiologists with ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июнь 25, 2024
ABSTRACT
Purpose
This
study
evaluated
the
diagnostic
accuracy
and
differential
diagnosis
capabilities
of
12
Large
Language
Models
(LLMs),
one
cardiac
radiologist,
three
general
radiologists
in
radiology.
The
impact
ChatGPT-4o
assistance
on
radiologist
performance
was
also
investigated.
Materials
Methods
We
collected
publicly
available
80
“Cardiac
Case
Month’’
from
Society
Thoracic
Radiology
website.
LLMs
Radiologist-III
were
provided
with
text-based
information,
whereas
other
visually
assessed
cases
without
assistance.
Diagnostic
scores
(DDx
Score)
analyzed
using
chi-square,
Kruskal-Wallis,
Wilcoxon,
McNemar,
Mann-Whitney
U
tests.
Results
unassisted
72.5%,
General
Radiologist-I
53.8%,
Radiologist-II
51.3%.
With
ChatGPT-4o,
improved
to
78.8%,
70.0%,
63.8%,
respectively.
improvements
for
Radiologists-I
II
statistically
significant
(P≤0.006).
All
radiologists’
DDx
significantly
(P≤0.05).
Remarkably,
Radiologist-I’s
GPT-4o-assisted
Score
not
different
Cardiac
Radiologist’s
(P>0.05).
Among
LLMs,
Claude
3.5
Sonnet
3
Opus
had
highest
(81.3%),
followed
by
(70.0%).
Regarding
Score,
outperformed
all
models
(P<0.05).
radiologist-III
48.8%
63.8%
GPT4o-assistance
(P<0.001).
Conclusion
may
enhance
imaging,
suggesting
its
potential
as
a
valuable
support
tool.
Further
research
is
required
assess
clinical
integration.
Язык: Английский
Comparison of Performance of Large Language Models on Lung-RADS Related Questions
JCO Global Oncology,
Год журнала:
2024,
Номер
10
Опубликована: Авг. 1, 2024
This
study
evaluates
LLM
integration
in
interpreting
Lung-RADS
for
lung
cancer
screening,
highlighting
their
innovative
role
enhancing
radiological
practice.
Our
findings
reveal
that
Claude
3
Opus
and
Perplexity
achieved
a
96%
accuracy
rate,
outperforming
other
models.
Язык: Английский
Comparison of the performance of large language models and general radiologist on Ovarian-Adnexal Reporting and Data System (O-RADS)-related questions
Quantitative Imaging in Medicine and Surgery,
Год журнала:
2024,
Номер
14(9), С. 6990 - 6991
Опубликована: Июль 24, 2024
Язык: Английский
Comparison of the Knowledge of Large Language Models and General Radiologist on RECIST (Preprint)
Опубликована: Июль 26, 2024
UNSTRUCTURED
This
study
aims
to
assess
the
potential
of
large
language
models
(LLMs)
enhance
reporting
efficiency
and
accuracy
in
oncological
imaging,
specifically
evaluating
their
knowledge
RECIST
1.1
guidelines.
While
capabilities
LLMs
have
been
explored
across
various
domains,
specific
applications
radiology
are
significant
interest
due
intricate
time-consuming
nature
image
evaluation
oncology.
We
conducted
a
comparative
analysis
involving
seven
different
general
radiologist
(GR)
determine
proficiency
responding
1.1-based
multiple-choice
questions.
Our
methodology
involved
creation
25
questions
by
board-certified
radiologist,
ensuring
alignment
with
These
were
presented
LLMs—Claude
3
Opus,
ChatGPT
4,
4o,
Gemini
1.5
Pro,
Mistral
Large,
Meta
Llama
70B,
Perplexity
Pro—as
well
as
GR
six
years
experience.
The
prompted
answer
an
experienced
responses
compared
those
GR.
results
demonstrated
that
Claude
Opus
achieved
perfect
100%
(25/25),
followed
closely
4o
96%
(24/25).
4
Large
both
scored
92%
(23/25),
while
each
88%
(21/25).
also
score
(23/25).
findings
highlight
impressive
current
understanding
applying
guidelines,
suggesting
valuable
tools
radiology.
outstanding
performance
raises
prospect
becoming
integral
oncology
practices,
potentially
enhancing
reporting.
However,
variations
among
underscore
need
for
further
refinement
evaluation.
Additionally,
this
focused
on
text-based
responses,
visual
assessment
multimodal
remain
unexplored.
Given
radiology,
future
research
should
investigate
integration
fully
harness
clinical
settings.
In
conclusion,
our
underscores
high
assist
radiologists
reporting,
providing
consistent
reliable
approach
interpreting
advocate
continued
development
diagnostic
efficiency.
Язык: Английский
Comparison of the Knowledge of Large Language Models and General Radiologist on RECIST (Preprint)
Опубликована: Авг. 13, 2024
BACKGROUND
Large
language
models
(LLMs)
represent
a
remarkable
breakthrough
in
natural
processing.
What
sets
the
current
generation
of
LLMs
apart
is
their
ability
to
perform
very
specific
tasks
radiology,
as
many
other
fields,
without
need
for
additional
training.
have
potential
usher
new
era
efficiency
and
excellence
radiology
practice,
both
supportive
diagnostic
tool
facilitate
reporting
process.
This
great
importance
oncology
oncological
researchers
been
conducting
studies
these
fields
order
demonstrate
position
LLMs.
OBJECTIVE
We
aimed
provide
perspective
on
improve
imaging
by
comparatively
assessing
LLMs'
knowledge
RECIST
1.1
among
themselves
with
general
radiologist
(GR).
METHODS
Radiologist
(E.Ç.)
prepared
25
multiple-choice
questions
this
study
utilizing
information
1.1,
thus
eliminating
ethics
committee
approval.
initiated
input
prompt
follows:
‘‘Act
like
professor
who
has
30
years
experience
radiology..
Give
just
letter
most
correct
choice
multiple
questions.
Each
question
only
one
answer.’’This
was
tested
June
2024
seven
different
using
default
settings.
The
testing
included
from
various
developers:
Claude
3
Opus,
ChatGPT
4
4o,
Gemini
1.5
pro,
Mistral
Large,
Meta
Llama
70B,
Perplexity
pro.
Also
GR
(T.C.)
board
certified
EDiR
6
each,
answered
same
RESULTS
results
revealed
that
Opus
achieved
highest
accuracy
100%
(25/25
questions),
followed
newest
model
Open
AI’s
4o
96%
(24/25
questions).
92%
(23/25
70
B,
pro
had
88%
(21/25
CONCLUSIONS
outstanding
success
knowing
all
raises
whether
can
be
star
radiology.
Our
reveals
majority
LLM
exhibit
commendable
level
proficiency
comperable
answering
related
findings
show
more
than
sufficient
text-based
about
1.1.
Additionally,
our
underscore
high
tools
assist
radiologists
reporting.
However,
take
full
advantage
abilities
reporting,
it
visual
are
also
evaluated.
Visual
evaluation
forms
basis
Therefore,
future
should
focus
evaluating
multimodal
ability.
In
conclusion,
every
field
will
reveal
allow
easily
integrated
into
radiological
practice.
CLINICALTRIAL
There
no
trial
registration.
INTERNATIONAL
REGISTERED
REPORT
RR2-10.2196/preprints.64805
Язык: Английский
Can large language models be new supportive tools in coronary computed tomography angiography reporting?
Clinical Imaging,
Год журнала:
2024,
Номер
114, С. 110271 - 110271
Опубликована: Авг. 31, 2024
Язык: Английский