Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry
Journal of Esthetic and Restorative Dentistry,
Год журнала:
2025,
Номер
unknown
Опубликована: Март 2, 2025
This
study
aimed
to
evaluate
the
reliability,
consistency,
and
readability
of
responses
provided
by
various
artificial
intelligence
(AI)
programs
questions
related
Restorative
Dentistry.
Forty-five
knowledge-based
information
20
(10
patient-related
10
dentistry-specific)
were
posed
ChatGPT-3.5,
ChatGPT-4,
ChatGPT-4o,
Chatsonic,
Copilot,
Gemini
Advanced
chatbots.
The
DISCERN
questionnaire
was
used
assess
reliability;
Flesch
Reading
Ease
Flesch-Kincaid
Grade
Level
scores
utilized
readability.
Accuracy
consistency
determined
based
on
chatbots'
questions.
Copilot
demonstrated
"good"
while
ChatGPT-3.5
showed
"fair"
reliability.
Chatsonic
exhibited
highest
"DISCERN
total
score"
for
questions,
ChatGPT-4o
performed
best
dentistry-specific
No
significant
differences
found
in
among
chatbots
(p
>
0.05).
accuracy
(93.3%)
had
lowest
(68.9%).
ChatGPT-4
between
repetitions.
Performance
AIs
varied
terms
accuracy,
when
responding
Dentistry
promising
results
academic
patient
education
applications.
However,
generally
above
recommended
levels
materials.
utilization
AI
has
an
increasing
impact
aspects
dentistry.
Moreover,
if
restorative
dentistry
prove
be
reliable
comprehensible,
this
may
yield
outcomes
future.
Язык: Английский
Evaluating the Accuracy and Readability of ChatGPT-4o’s Responses to Patient-Based Questions about Keratoconus
Ophthalmic Epidemiology,
Год журнала:
2025,
Номер
unknown, С. 1 - 6
Опубликована: Март 28, 2025
This
study
aimed
to
evaluate
the
accuracy
and
readability
of
responses
generated
by
ChatGPT-4o,
an
advanced
large
language
model,
frequently
asked
patient-centered
questions
about
keratoconus.
A
cross-sectional,
observational
was
conducted
using
ChatGPT-4o
answer
30
potential
that
could
be
patients
with
The
evaluated
two
board-certified
ophthalmologists
scored
on
a
scale
1
5.
Readability
assessed
Simple
Measure
Gobbledygook
(SMOG),
Flesch-Kincaid
Grade
Level
(FKGL),
Flesch
Reading
Ease
(FRE)
scores.
Descriptive,
treatment-related,
follow-up-related
were
analyzed,
statistical
comparisons
between
these
categories
performed.
mean
score
for
4.48
±
0.57
5-point
Likert
scale.
interrater
reliability,
intraclass
correlation
coefficient
0.769,
indicated
strong
level
agreement.
scores
revealed
SMOG
15.49
1.74,
FKGL
14.95
1.95,
FRE
27.41
9.71,
indicating
high
education
is
required
comprehend
responses.
There
no
significant
difference
in
among
different
question
(p
=
0.161),
but
varied
significantly,
treatment-related
being
easiest
understand.
provides
highly
accurate
keratoconus,
though
complexity
its
may
limit
accessibility
general
population.
Further
development
needed
enhance
AI-generated
medical
content.
Язык: Английский