Digital
health
interventions
offer
promise
for
scalable
and
accessible
healthcare,
but
access
is
still
limited
by
some
participatory
challenges,
especially
disadvantaged
families
facing
literacy,
language
barriers,
low
income,
or
living
in
marginalized
areas.
These
issues
are
particularly
pronounced
colorectal
cancer
(CRC)
patients,
who
often
experience
distressing
symptoms
struggle
with
educational
materials
due
to
complex
jargon,
fatigue,
reading
level
mismatches.
To
address
these
issues,
we
developed
assessed
the
feasibility
of
a
digital
platform,
CRCWeb,
improve
accessibility
resources
on
symptom
management
CRC
patients
their
caregivers
literacy
income.
CRCWeb
was
through
stakeholder-centered
design
approach.
Two-phase
semi-structured
interviews
caregivers,
oncology
experts
informed
iterative
process.
From
interviews,
following
five
key
principles:
user-friendly
navigation,
multimedia
integration,
concise
clear
content,
enhanced
individuals
vision
disabilities,
scalability
future
content
expansion.
Initial
feedback
from
stakeholder
engagements
confirmed
high
user
satisfaction,
participants
rating
an
average
3.98
out
5
post-intervention
survey.
Additionally,
using
GenAI
tools,
including
large
models
(LLMs)
like
ChatGPT
generation
tools
such
as
Pictory,
healthcare
guidelines
were
transformed
into
concise,
easily
comprehensible
made
CRCWeb.
User
engagement
notably
higher
among
logged
platform
2.52
times
more
frequently
than
non-disadvantaged
participants.
The
structured
development
approach
demonstrates
that
GenAI-powered
can
effectively
barriers
faced
This
highlights
how
innovations
enhance
healthcare.
RR2-10.2196/48499.
Abstract
Accurate
CT
protocol
assignment
is
crucial
for
optimizing
medical
imaging
procedures.
The
integration
of
large
language
models
(LLMs)
may
be
helpful,
but
its
efficacy
as
a
clinical
decision
support
system
protocoling
tasks
remains
unknown.
This
study
aimed
to
develop
and
evaluate
fine-tuned
LLM
specifically
designed
protocoling,
well
assess
performance,
both
standalone
in
concurrent
use,
terms
effectiveness
efficiency
within
radiological
workflows.
retrospective
included
radiology
tests
contrast-enhanced
chest
abdominal
examinations
(2829/498/941
training/validation/testing).
Inputs
involve
the
indication
section,
age,
anatomic
coverage.
was
15
epochs,
selecting
best
model
by
macro
sensitivity
validation.
Performance
then
evaluated
on
800
randomly
selected
cases
from
test
dataset.
Two
residents
two
radiologists
assigned
protocols
with
without
referencing
output
system.
exhibited
high
accuracy
metrics,
top-1
top-2
accuracies
0.923
0.963,
respectively,
0.907.
It
processed
each
case
an
average
0.39
s.
LLM,
tool,
improved
(0.913
vs.
0.936)
(0.920
0.926
respectively),
improvement
being
statistically
significant
(
p
=
0.02).
Additionally,
it
reduced
reading
times
14%
12%
radiologists.
These
results
indicate
potential
LLMs
improve
diagnostic
practice.
Abstract
Early
detection
of
patients
with
impending
bone
metastasis
is
crucial
for
prognosis
improvement.
This
study
aimed
to
investigate
the
feasibility
a
fine-tuned,
locally
run
large
language
model
(LLM)
in
extracting
unstructured
Japanese
radiology
report
and
compare
its
performance
manual
annotation.
retrospective
included
“metastasis”
radiological
reports
(April
2018–January
2019,
August–May
2022,
April–December
2023
training,
validation,
test
datasets
9559,
1498,
7399
patients,
respectively).
Radiologists
reviewed
clinical
indication
diagnosis
sections
(used
as
input
data)
classified
them
into
groups
0
(no
metastasis),
1
(progressive
2
(stable
or
decreased
metastasis).
The
data
group
was
under-sampled
training
due
imbalance.
best-performing
from
validation
set
subsequently
tested
using
testing
dataset.
Two
additional
radiologists
(readers
2)
were
involved
classifying
within
dataset
purposes.
fine-tuned
LLM,
reader
1,
demonstrated
an
accuracy
0.979,
0.996,
0.993,
sensitivity
0/1/2
0.988/0.947/0.943,
1.000/1.000/0.966,
1.000/0.982/0.954,
time
required
classification
(s)
105,
2312,
3094
(
n
=
711),
respectively.
Fine-tuned
LLM
extracted
metastasis,
demonstrating
satisfactory
that
comparable
slightly
lower
than
annotation
by
noticeably
shorter
time.
This
study
evaluates
the
performance
of
four
large
language
models
(LLMs)
in
classifying
malignant
lymphoma
stages
using
Lugano
classification
from
free-text
FDG-PET
reports
Japanese
Specifically,
we
assess
GPT-4o,
Claude
3.5
Sonnet,
Llama
3
70B,
and
Gemma
2
27B
their
ability
interpret
unstructured
radiology
texts.
In
a
retrospective
single-center
study,
80
patients
who
underwent
staging
FDG-PET/CT
for
were
included.
The
"Findings"
sections
analyzed
without
pre-processing.
Each
LLM
assigned
based
on
these
reports.
Performance
was
compared
to
reference
standard
determined
by
expert
radiologists.
Statistical
analyses
involved
overall
accuracy,
weighted
kappa
agreement.
GPT-4o
achieved
highest
accuracy
at
75%
(60/80
cases)
with
substantial
agreement
(weighted
κ
=
0.801).
Sonnet
had
61.3%
(49/80,
0.763).
70B
showed
accuracies
58.8%
57.5%,
respectively,
all
indicating
outperformed
other
LLMs
assigning
demonstrated
potential
advanced
clinical
While
immediate
utility
automatically
predicting
stage
an
existing
report
may
be
limited,
results
highlight
value
understanding
standardizing
data.
Emergency Radiology,
Год журнала:
2025,
Номер
unknown
Опубликована: Июнь 2, 2025
Abstract
Purpose
This
study
aimed
to
develop
an
automated
early
warning
system
using
a
large
language
model
(LLM)
identify
acute
subacute
brain
infarction
from
free-text
computed
tomography
(CT)
or
magnetic
resonance
imaging
(MRI)
radiology
reports.
Methods
In
this
retrospective
study,
5,573,
1,883,
and
834
patients
were
included
in
the
training
(mean
age,
67.5
±
17.2
years;
2,831
males),
validation
61.5
18.3
994
test
66.5
16.1
488
males)
datasets.
An
LLM
(Japanese
Bidirectional
Encoder
Representations
Transformers
model)
was
fine-tuned
classify
CT
MRI
reports
into
three
groups
(group
0,
newly
identified
infarction;
group
1,
known
old
2,
without
infarction).
The
processes
repeated
15
times,
best-performing
on
dataset
selected
further
evaluate
its
performance
dataset.
Results
best
exhibited
sensitivities
of
0.891,
0.905,
0.959
for
respectively,
macrosensitivity
(the
average
sensitivity
all
groups)
accuracy
0.918
0.923,
respectively.
model’s
extracting
infarcts
high,
with
area
under
receiver
operating
characteristic
curve
0.979
(95%
confidence
interval,
0.956–1.000).
prediction
time
0.115
0.037
s
per
patient.
Conclusion
A
could
extract
based
findings
high
performance.
The
aim
of
this
study
is
to
develop
a
fine-tuned
large
language
model
that
classifies
interventional
radiology
reports
into
technique
categories
and
compare
its
performance
with
readers.
This
retrospective
included
3198
patients
(1758
males
1440
females;
age,
62.8
±
16.8
years)
who
underwent
from
January
2018
July
2024.
Training,
validation,
test
datasets
involved
2292,
250,
656
patients,
respectively.
Input
data
texts
in
clinical
indication,
imaging
diagnosis,
image-finding
sections
reports.
Manually
classified
(15
total)
were
utilized
as
reference
data.
Fine-tuning
the
Bidirectional
Encoder
Representations
was
performed
using
training
validation
datasets.
process
repeated
15
times
due
randomness
learning
process.
best-performed
model,
which
showed
highest
accuracy
among
trials,
selected
further
evaluate
independent
dataset.
report
classification
one
radiologist
(reader
1)
two
residents
(readers
2
3).
macrosensitivity
(average
each
category's
sensitivity)
dataset
0.996
0.994,
For
dataset,
accuracy/macrosensitivity
0.988/0.980,
0.986/0.977,
0.989/0.979,
0.988/0.980
best
reader
1,
2,
3,
required
0.178
s
for
per
patient,
17.5–19.9
faster
than
In
conclusion,
high
similar
readers
within
remarkably
shorter
time.