Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy
Christian Trapp,
No information about this author
Nina Schmidt-Hegemann,
No information about this author
Michael Keilholz
No information about this author
et al.
Strahlentherapie und Onkologie,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 10, 2025
Abstract
Background
This
study
aims
to
evaluate
the
capabilities
and
limitations
of
large
language
models
(LLMs)
for
providing
patient
education
men
undergoing
radiotherapy
localized
prostate
cancer,
incorporating
assessments
from
both
clinicians
patients.
Methods
Six
questions
about
definitive
cancer
were
designed
based
on
common
inquiries.
These
presented
different
LLMs
[ChatGPT‑4,
ChatGPT-4o
(both
OpenAI
Inc.,
San
Francisco,
CA,
USA),
Gemini
(Google
LLC,
Mountain
View,
Copilot
(Microsoft
Corp.,
Redmond,
WA,
Claude
(Anthropic
PBC,
USA)]
via
respective
web
interfaces.
Responses
evaluated
readability
using
Flesch
Reading
Ease
Index.
Five
radiation
oncologists
assessed
responses
relevance,
correctness,
completeness
a
five-point
Likert
scale.
Additionally,
35
patients
ChatGPT‑4
comprehensibility,
accuracy,
trustworthiness,
overall
informativeness.
Results
The
Index
indicated
that
all
relatively
difficult
understand.
All
provided
answers
found
be
generally
relevant
correct.
ChatGPT‑4,
ChatGPT-4o,
AI
also
complete.
However,
we
significant
differences
between
performance
regarding
relevance
completeness.
Some
lacked
detail
or
contained
inaccuracies.
Patients
perceived
information
as
easy
understand
relevant,
with
most
expressing
confidence
in
willingness
use
future
medical
questions.
ChatGPT-4’s
helped
feel
better
informed,
despite
initially
standardized
provided.
Conclusion
Overall,
show
promise
tool
radiotherapy.
While
improvements
are
needed
terms
accuracy
readability,
positive
feedback
suggests
can
enhance
understanding
engagement.
Further
research
is
essential
fully
realize
potential
artificial
intelligence
education.
Language: Английский
Evaluating an Artificially Intelligent Chatbot ‘Prostate Cancer Info’ for Providing Quality Prostate Cancer Screening Information: A Cross-Sectional Study (Preprint)
Otis L. Owens,
No information about this author
Michael Leonard
No information about this author
Published: Feb. 16, 2025
BACKGROUND
Generative
AI
Chatbots
may
be
useful
tools
for
supporting
shared
prostate
cancer
screening
decisions,
but
the
information
produced
by
these
sometimes
lack
quality
or
credibility.
‘Prostate
Cancer
Info’
is
a
custom
GPT
chatbot
developed
to
provide
plain-language
PrCA
only
from
websites
of
key
authorities
on
and
peer-reviewed
literature.
OBJECTIVE
To
evaluate
accuracy,
completeness,
readability
Info’s
responses
frequently
asked
questions.
METHODS
Twenty-three
questions
were
individually
input
into
Info.’
Responses
recorded
in
Microsoft
Word
reviewed
two
raters
their
accuracy
completeness.
Readability
content
was
determined
pasting
an
online
Flesch
Kincaid
Reading
Ease
Scores
calculator.
RESULTS
all
accurate
culturally
appropriate.
Seventeen
twenty-three
(74%)
had
complete
responses.
The
average
64.5
(written
at
8th-grade
level).
CONCLUSIONS
chatbots,
such
as
Prostate
Info,
are
great
starting
places
learning
about
preparing
engage
decision
making
should
not
used
independent
sources
because
omitted.
Men
encouraged
use
complement
received
form
healthcare
provider.
Language: Английский
Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses
BMC Oral Health,
Journal Year:
2025,
Volume and Issue:
25(1)
Published: April 15, 2025
Artificial
intelligence
(AI)
chatbots
are
increasingly
used
in
healthcare
to
address
patient
questions
by
providing
personalized
responses.
Evaluating
their
performance
is
essential
ensure
reliability.
This
study
aimed
assess
the
of
three
AI
responding
frequently
asked
(FAQs)
patients
regarding
dental
prostheses.
Thirty-one
were
collected
from
accredited
organizations'
websites
and
"People
Also
Ask"
feature
Google,
focusing
on
removable
fixed
prosthodontics.
Two
board-certified
prosthodontists
evaluated
response
quality
using
modified
Global
Quality
Score
(GQS)
a
5-point
Likert
scale.
Inter-examiner
agreement
was
assessed
weighted
kappa.
Readability
measured
Flesch-Kincaid
Grade
Level
(FKGL)
Flesch
Reading
Ease
(FRE)
indices.
Statistical
analyses
performed
repeated
measures
ANOVA
Friedman
test,
with
Bonferroni
correction
for
pairwise
comparisons
(α
=
0.05).
The
inter-examiner
good.
Among
chatbots,
Google
Gemini
had
highest
score
(4.58
±
0.50),
significantly
outperforming
Microsoft
Copilot
(3.87
0.89)
(P
=.004).
analysis
showed
ChatGPT
(10.45
1.26)
produced
more
complex
responses
compared
(7.82
1.19)
(8.38
1.59)
<.001).
FRE
scores
indicated
that
ChatGPT's
categorized
as
fairly
difficult
(53.05
7.16),
while
Gemini's
plain
English
(64.94
7.29),
significant
difference
between
them
show
great
potential
answering
inquiries
about
However,
improvements
needed
enhance
effectiveness
education
tools.
Language: Английский
Evaluating an Artificially Intelligent Chatbot ‘Prostate Cancer Info’ for Providing Quality Prostate Cancer Screening Information: A Cross-Sectional Study (Preprint)
Otis L. Owens,
No information about this author
Michael Leonard
No information about this author
JMIR Cancer,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 16, 2025
Language: Английский
An Effective Methodology for Diabetes Prediction in the Case of Class Imbalance
Borislava Toleva,
No information about this author
I. Atanasov,
No information about this author
Ivan Ivanov
No information about this author
et al.
Bioengineering,
Journal Year:
2025,
Volume and Issue:
12(1), P. 35 - 35
Published: Jan. 6, 2025
Diabetes
causes
an
increase
in
the
level
of
blood
sugar,
which
leads
to
damage
various
parts
human
body.
data
are
used
not
only
for
providing
a
deeper
understanding
treatment
mechanisms
but
also
predicting
probability
that
one
might
become
sick.
This
paper
proposes
novel
methodology
perform
classification
case
heavy
class
imbalance,
as
observed
PIMA
diabetes
dataset.
The
proposed
uses
two
steps,
namely
resampling
and
random
shuffling
prior
defining
model.
is
tested
with
versions
cross
validation
appropriate
cases
imbalance-k-fold
stratified
k-fold
validation.
Our
findings
suggest
when
having
imbalanced
data,
randomly
train/test
split
can
help
improve
estimation
metrics.
outperform
existing
machine
learning
algorithms
complex
deep
models.
Applying
our
simple
fast
way
predict
labels
imbalance.
It
does
require
additional
techniques
balance
classes.
involve
preselecting
important
variables,
saves
time
makes
model
easy
analysis.
it
effective
initial
further
modeling
Moreover,
methodologies
show
how
effectiveness
models
based
on
standard
approaches
make
them
more
reliable.
Language: Английский