medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Ноя. 6, 2024
Abstract
Background
Large
language
models
(LLMs)
have
the
potential
to
objectively
evaluate
radiology
resident
reports;
however,
research
on
their
use
for
feedback
in
training
and
assessment
of
skill
development
remains
limited.
Purpose
This
study
aimed
assess
effectiveness
LLMs
revising
reports
by
comparing
them
with
verified
board-certified
radiologists
analyze
progression
resident’s
reporting
skills
over
time.
Materials
methods
To
identify
LLM
that
best
aligned
human
radiologists,
100
were
randomly
selected
from
a
total
7376
authored
nine
first-year
residents.
The
evaluated
based
six
criteria:
(1)
Addition
missing
positive
findings,
(2)
Deletion
(3)
negative
(4)
Correction
expression
(5)
diagnosis,
(6)
Proposal
additional
examinations
or
treatments.
Reports
segmented
into
four
time-based
terms,
900
(450
CT
450
MRI)
chosen
initial
final
terms
residents’
first
year.
revised
rates
each
criterion
compared
between
last
using
Wilcoxon
Signed-Rank
test.
Results
Among
tested,
GPT-4o
demonstrated
highest
level
agreement
radiologists.
Significant
improvements
noted
Criteria
1–3
when
(all
P
<
0.023)
GPT-4o.
In
contrast,
no
significant
changes
observed
4–6.
Despite
this,
all
criteria
except
Criterion
6
showed
progressive
enhancement
Conclusion
can
effectively
provide
commonly
corrected
areas
reports,
enabling
residents
improve
weaknesses
monitor
progress.
Additionally,
may
help
reduce
workload
radiologists’
mentors.
Journal of cardiovascular computed tomography,
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 1, 2025
The
Coronary
Artery
Disease-Reporting
and
Data
System
(CAD-RADS)
2.0
offers
standardized
guidelines
for
interpreting
coronary
artery
disease
in
cardiac
CT.
Accurate
consistent
CAD-RADS
scoring
is
crucial
comprehensive
characterization
clinical
decision-making.
This
study
investigates
the
capability
of
large
language
models
(LLMs)
to
autonomously
generate
scores
from
CT
reports.
A
dataset
reports
was
created
evaluate
performance
several
state-of-the-art
LLMs
generating
via
in-context
learning.
tested
comprised
GPT-3.5,
GPT-4o,
Mistral
7b,
Mixtral
8
×
7b,
Llama3
8b,
8b
with
a
64k
context
length,
70b.
generated
each
model
were
compared
ground
truth,
which
provided
by
two
board-certified
cardiothoracic
radiologists
consensus
based
on
final
set
200
GPT-4o
70b
achieved
highest
accuracy
full
including
all
modifiers
rate
93
%
92.5
%,
respectively,
followed
7b
78
%.
In
contrast,
older
LLMs,
such
as
7b
GPT-3.5
performed
poorly
(16
%)
demonstrated
intermediate
results
an
41.5
enhanced
learning
are
capable
excellent
accuracy,
potentially
enhancing
both
efficiency
consistency
reporting.
Open-source
not
only
deliver
competitive
but
also
present
benefit
local
hosting,
mitigating
concerns
around
data
security.
Journal of Medical Screening,
Год журнала:
2025,
Номер
unknown
Опубликована: Апрель 21, 2025
Some
noteworthy
studies
have
questioned
the
use
of
ChatGPT,
a
free
artificial
intelligence
program
that
has
become
very
popular
and
widespread
in
recent
times,
different
branches
medicine.
In
this
study,
success
ChatGPT
detecting
breast
cancer
on
mammography
(MMG)
was
evaluated.
The
pre-treatment
mammographic
images
patients
with
histopathological
diagnosis
invasive
carcinoma
prominent
mass
formation
MMG
were
read
separately
into
two
subprograms:
Radiologist
Report
Writer
(P1)
XrayGPT
(P2).
programs
asked
to
determine
density,
tumor
size,
side,
quadrant,
presence
microcalcification,
distortion,
skin
or
nipple
changes,
axillary
lymphadenopathy
(LAP),
BI-RADS
score.
responses
evaluated
consensus
by
experienced
radiologists.
Although
detection
rate
both
over
60%,
determining
size
localization,
LAP
low.
category
agreement
readers
fair
for
P1
(κ:28%,
0.20<
κ
≤
0.40)
moderate
P2
(κ:58%,
0.40<
0.60).
conclusion,
while
application
can
detect
appearance
better
than
application,
is
low
all
other
related
features.
This
casts
doubt
suitability
current
large
language
models
image
analysis
screening.
Japanese Journal of Radiology,
Год журнала:
2024,
Номер
unknown
Опубликована: Ноя. 16, 2024
Abstract
In
this
narrative
review,
we
review
the
applications
of
artificial
intelligence
(AI)
into
clinical
magnetic
resonance
imaging
(MRI)
exams,
with
a
particular
focus
on
Japan’s
contributions
to
field.
first
part
introduce
various
AI
in
optimizing
different
aspects
MRI
process,
including
scan
protocols,
patient
preparation,
image
acquisition,
reconstruction,
and
postprocessing
techniques.
Additionally,
examine
AI’s
growing
influence
decision-making,
particularly
areas
such
as
segmentation,
radiation
therapy
planning,
reporting
assistance.
By
emphasizing
studies
conducted
Japan,
highlight
nation’s
advancement
MRI.
latter
characteristics
that
make
Japan
unique
environment
for
development
implementation
examinations.
healthcare
landscape
is
distinguished
by
several
key
factors
collectively
create
fertile
ground
research
development.
Notably,
boasts
one
highest
densities
scanners
per
capita
globally,
ensuring
widespread
access
exam.
national
health
insurance
system
plays
pivotal
role
providing
scans
all
citizens
irrespective
socioeconomic
status,
which
facilitates
collection
inclusive
unbiased
data
across
diverse
population.
extensive
screening
programs,
coupled
collaborative
initiatives
like
Medical
Imaging
Database
(J-MID),
enable
aggregation
sharing
large,
high-quality
datasets.
With
its
technological
expertise
infrastructure,
well-positioned
meaningful
MRI–AI
domain.
The
efforts
researchers,
clinicians,
technology
experts,
those
will
continue
advance
future
MRI,
potentially
leading
improvements
care
efficiency.
Japanese Journal of Radiology,
Год журнала:
2025,
Номер
unknown
Опубликована: Март 8, 2025
Large
language
models
(LLMs)
have
the
potential
to
objectively
evaluate
radiology
resident
reports;
however,
research
on
their
use
for
feedback
in
training
and
assessment
of
skill
development
remains
limited.
This
study
aimed
assess
effectiveness
LLMs
revising
reports
by
comparing
them
with
verified
board-certified
radiologists
analyze
progression
resident's
reporting
skills
over
time.
To
identify
LLM
that
best
aligned
human
radiologists,
100
were
randomly
selected
from
7376
authored
nine
first-year
residents.
The
evaluated
based
six
criteria:
(1)
addition
missing
positive
findings,
(2)
deletion
(3)
negative
(4)
correction
expression
(5)
diagnosis,
(6)
proposal
additional
examinations
or
treatments.
Reports
segmented
into
four
time-based
terms,
900
(450
CT
450
MRI)
chosen
initial
final
terms
residents'
first
year.
revised
rates
each
criterion
compared
between
last
using
Wilcoxon
Signed-Rank
test.
Among
three
LLMs-ChatGPT-4
Omni
(GPT-4o),
Claude-3.5
Sonnet,
Claude-3
Opus-GPT-4o
demonstrated
highest
level
agreement
radiologists.
Significant
improvements
noted
Criteria
1-3
when
(Criteria
1,
2,
3;
P
<
0.001,
=
0.023,
0.004,
respectively)
GPT-4o.
No
significant
changes
observed
4-6.
Despite
this,
all
criteria
except
6
showed
progressive
enhancement
can
effectively
provide
commonly
corrected
areas
reports,
enabling
residents
improve
weaknesses
monitor
progress.
Additionally,
may
help
reduce
workload
radiologists'
mentors.
npj Digital Medicine,
Год журнала:
2025,
Номер
8(1)
Опубликована: Март 22, 2025
Abstract
While
generative
artificial
intelligence
(AI)
has
shown
potential
in
medical
diagnostics,
comprehensive
evaluation
of
its
diagnostic
performance
and
comparison
with
physicians
not
been
extensively
explored.
We
conducted
a
systematic
review
meta-analysis
studies
validating
AI
models
for
tasks
published
between
June
2018
2024.
Analysis
83
revealed
an
overall
accuracy
52.1%.
No
significant
difference
was
found
(
p
=
0.10)
or
non-expert
0.93).
However,
performed
significantly
worse
than
expert
0.007).
Several
demonstrated
slightly
higher
compared
to
non-experts,
although
the
differences
were
significant.
Generative
demonstrates
promising
capabilities
varying
by
model.
Although
it
yet
achieved
expert-level
reliability,
these
findings
suggest
enhancing
healthcare
delivery
education
when
implemented
appropriate
understanding
limitations.
JMIR Medical Informatics,
Год журнала:
2025,
Номер
13, С. e64963 - e64963
Опубликована: Апрель 25, 2025
Abstract
Background
With
the
rapid
development
of
artificial
intelligence
(AI)
technology,
especially
generative
AI,
large
language
models
(LLMs)
have
shown
great
potential
in
medical
field.
Through
massive
data
training,
it
can
understand
complex
texts
and
quickly
analyze
records
provide
health
counseling
diagnostic
advice
directly,
rare
diseases.
However,
no
study
has
yet
compared
extensively
discussed
performance
LLMs
with
that
physicians.
Objective
This
systematically
reviewed
accuracy
clinical
diagnosis
provided
reference
for
further
application.
Methods
We
conducted
searches
CNKI
(China
National
Knowledge
Infrastructure),
VIP
Database,
SinoMed,
PubMed,
Web
Science,
Embase,
CINAHL
(Cumulative
Index
to
Nursing
Allied
Health
Literature)
from
January
1,
2017,
present.
A
total
2
reviewers
independently
screened
literature
extracted
relevant
information.
The
risk
bias
was
assessed
using
Prediction
Model
Risk
Bias
Assessment
Tool
(PROBAST),
which
evaluates
both
applicability
included
studies.
Results
30
studies
involving
19
a
4762
cases
were
included.
quality
assessment
indicated
high
majority
studies,
primary
cause
is
known
case
diagnosis.
For
optimal
model,
ranged
25%
97.8%,
while
triage
66.5%
98%.
Conclusions
demonstrated
considerable
capabilities
significant
application
across
various
cases.
Although
their
still
falls
short
professionals,
if
used
cautiously,
they
become
one
best
intelligent
assistants
field
human
care.