medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 4, 2024
Abstract
Importance
Large
language
models
(LLMs)
possess
a
range
of
capabilities
which
may
be
applied
to
the
clinical
domain,
including
text
summarization.
As
ambient
artificial
intelligence
scribes
and
other
LLM-based
tools
begin
deployed
within
healthcare
settings,
rigorous
evaluations
accuracy
these
technologies
are
urgently
needed.
Objective
To
investigate
performance
GPT-4
GPT-3.5-turbo
in
generating
Emergency
Department
(ED)
discharge
summaries
evaluate
prevalence
type
errors
across
each
section
summary.
Design
Cross-sectional
study.
Setting
University
California,
San
Francisco
ED.
Participants
We
identified
all
adult
ED
visits
from
2012
2023
with
an
clinician
note.
randomly
selected
sample
100
for
GPT-summarization.
Exposure
potential
two
state-of-the-art
LLMs,
GPT-3.5-turbo,
summarize
full
note
into
Main
Outcomes
Measures
GPT-4-generated
were
evaluated
by
independent
Medicine
physician
reviewers
three
evaluation
criteria:
1)
Inaccuracy
GPT-summarized
information;
2)
Hallucination
3)
Omission
relevant
information.
On
identifying
error,
additionally
asked
provide
brief
explanation
their
reasoning,
was
manually
classified
subgroups
errors.
Results
From
202,059
eligible
visits,
we
sampled
GPT-generated
summarization
then
expert-driven
evaluation.
In
total,
33%
generated
10%
those
entirely
error-free
domains.
Summaries
mostly
accurate,
inaccuracies
found
only
cases,
however,
42%
exhibited
hallucinations
47%
omitted
clinically
Inaccuracies
most
commonly
Plan
sections
summaries,
while
omissions
concentrated
describing
patients’
Physical
Examination
findings
or
History
Presenting
Complaint.
Conclusions
Relevance
this
cross-sectional
study
encounters,
that
LLMs
could
generate
accurate
but
liable
hallucination
omission
A
comprehensive
understanding
location
is
important
facilitate
review
such
content
prevent
patient
harm.
Radiology,
Journal Year:
2024,
Volume and Issue:
310(1)
Published: Jan. 1, 2024
Although
chatbots
have
existed
for
decades,
the
emergence
of
transformer-based
large
language
models
(LLMs)
has
captivated
world
through
most
recent
wave
artificial
intelligence
chatbots,
including
ChatGPT.
Transformers
are
a
type
neural
network
architecture
that
enables
better
contextual
understanding
and
efficient
training
on
massive
amounts
unlabeled
data,
such
as
unstructured
text
from
internet.
As
LLMs
increased
in
size,
their
improved
performance
emergent
abilities
revolutionized
natural
processing.
Since
is
integral
to
human
thought,
applications
based
transformative
potential
many
industries.
In
fact,
LLM-based
demonstrated
human-level
professional
benchmarks,
radiology.
offer
numerous
clinical
research
radiology,
several
which
been
explored
literature
with
encouraging
results.
Multimodal
can
simultaneously
interpret
images
generate
reports,
closely
mimicking
current
diagnostic
pathways
Thus,
requisition
report,
opportunity
positively
impact
nearly
every
step
radiology
journey.
Yet,
these
impressive
not
without
limitations.
This
article
reviews
limitations
mitigation
strategies,
well
uses
LLMs,
multimodal
models.
Also
reviewed
existing
enhance
efficiency
supervised
settings.
JAMA Network Open,
Journal Year:
2024,
Volume and Issue:
7(5), P. e248895 - e248895
Published: May 7, 2024
The
introduction
of
large
language
models
(LLMs),
such
as
Generative
Pre-trained
Transformer
4
(GPT-4;
OpenAI),
has
generated
significant
interest
in
health
care,
yet
studies
evaluating
their
performance
a
clinical
setting
are
lacking.
Determination
acuity,
measure
patient's
illness
severity
and
level
required
medical
attention,
is
one
the
foundational
elements
reasoning
emergency
medicine.
JMIR Human Factors,
Journal Year:
2024,
Volume and Issue:
11, P. e53559 - e53559
Published: Jan. 24, 2024
More
clinicians
and
researchers
are
exploring
uses
for
large
language
model
chatbots,
such
as
ChatGPT,
research,
dissemination,
educational
purposes.
Therefore,
it
becomes
increasingly
relevant
to
consider
the
full
potential
of
this
tool,
including
special
features
that
currently
available
through
application
programming
interface.
One
these
is
a
variable
called
temperature,
which
changes
degree
randomness
involved
in
model’s
generated
output.
This
particular
interest
researchers.
By
lowering
variable,
one
can
generate
more
consistent
outputs;
by
increasing
it,
receive
creative
responses.
For
who
tools
variety
tasks,
ability
tailor
outputs
be
less
may
beneficial
work
demands
consistency.
Additionally,
access
text
generation
enable
scientific
authors
describe
their
research
general
potentially
connect
with
broader
public
social
media.
In
viewpoint,
we
present
temperature
feature,
discuss
uses,
provide
some
examples.
Korean Journal of Radiology,
Journal Year:
2024,
Volume and Issue:
25(2), P. 126 - 126
Published: Jan. 1, 2024
Large
language
models
(LLMs)
have
revolutionized
the
global
landscape
of
technology
beyond
natural
processing.
Owing
to
their
extensive
pre-training
on
vast
datasets,
contemporary
LLMs
can
handle
tasks
ranging
from
general
functionalities
domain-specific
areas,
such
as
radiology,
without
additional
fine-tuning.
General-purpose
chatbots
based
optimize
efficiency
radiologists
in
terms
professional
work
and
research
endeavors.
Importantly,
these
are
a
trajectory
rapid
evolution,
wherein
challenges
"hallucination,"
high
training
cost,
issues
addressed,
along
with
inclusion
multimodal
inputs.
In
this
review,
we
aim
offer
conceptual
knowledge
actionable
guidance
interested
utilizing
through
succinct
overview
topic
summary
radiology-specific
aspects,
beginning
potential
future
directions.
Diagnostic and Interventional Imaging,
Journal Year:
2024,
Volume and Issue:
105(7-8), P. 251 - 265
Published: April 27, 2024
The
purpose
of
this
study
was
to
systematically
review
the
reported
performances
ChatGPT,
identify
potential
limitations,
and
explore
future
directions
for
its
integration,
optimization,
ethical
considerations
in
radiology
applications.
Radiology Artificial Intelligence,
Journal Year:
2024,
Volume and Issue:
6(4)
Published: May 8, 2024
Purpose
To
assess
the
performance
of
a
local
open-source
large
language
model
(LLM)
in
various
information
extraction
tasks
from
real-life
emergency
brain
MRI
reports.
Materials
and
Methods
All
consecutive
reports
written
2022
French
quaternary
center
were
retrospectively
reviewed.
Two
radiologists
identified
scans
that
performed
department
for
headaches.
Four
scored
reports'
conclusions
as
either
normal
or
abnormal.
Abnormalities
labeled
headache-causing
incidental.
Vicuna
(LMSYS
Org),
an
LLM,
same
tasks.
Vicuna's
metrics
evaluated
using
radiologists'
consensus
reference
standard.
Results
Among
2398
during
study
period,
595
included
headaches
indication
(median
age
patients,
35
years
[IQR,
26-51
years];
68%
[403
595]
women).
A
positive
finding
was
reported
227
(38%)
cases,
136
which
could
explain
headache.
The
LLM
had
sensitivity
98.0%
(95%
CI:
96.5,
99.0)
specificity
99.3%
98.8,
99.7)
detecting
presence
headache
clinical
context,
99.4%
98.3,
99.9)
98.6%
92.2,
100.0)
use
contrast
medium
injection,
96.0%
92.5,
98.2)
98.9%
97.2,
categorization
abnormal,
88.2%
81.6,
93.1)
73%
62,
81)
causal
inference
between
findings
Conclusion
An
able
to
extract
free-text
radiology
with
excellent
accuracy
without
requiring
further
training.