Pediatric Transplantation,
Journal Year:
2025,
Volume and Issue:
29(3)
Published: March 13, 2025
ABSTRACT
Background
Education
and
enhancing
the
knowledge
of
adolescents
who
will
undergo
kidney
transplantation
are
among
primary
objectives
their
care.
While
there
specific
interventions
in
place
to
achieve
this,
they
require
extensive
resources.
The
rise
large
language
models
like
ChatGPT‐3.5
offers
potential
assistance
for
providing
information
patients.
This
study
aimed
evaluate
accuracy,
relevance,
safety
ChatGPT‐3.5's
responses
patient‐centered
questions
about
pediatric
transplantation.
objective
was
assess
whether
could
be
a
supplementary
educational
tool
caregivers
complex
medical
context.
Methods
A
total
37
were
presented
ChatGPT‐3.5,
which
prompted
respond
as
health
professional
would
layperson.
Five
nephrologists
independently
evaluated
outputs
comprehensiveness,
understandability,
readability,
safety.
Results
mean
relevancy,
comprehensiveness
scores
all
4.51,
4.56,
4.55,
respectively.
Out
outputs,
four
rated
completely
accurate,
seven
relevant
comprehensive.
Only
one
output
had
an
score
below
4.
Twelve
considered
potentially
risky,
but
only
three
risk
grade
moderate
or
higher.
Outputs
that
risky
accuracy
relevancy
average.
Conclusion
Our
findings
suggest
ChatGPT
useful
individuals
waiting
However,
presence
underscores
necessity
human
oversight
validation.
Journal of Neuro-Ophthalmology,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 4, 2024
Background:
Patient
education
in
ophthalmology
poses
a
challenge
for
physicians
because
of
time
and
resource
limitations.
ChatGPT
(OpenAI,
San
Francisco)
may
assist
with
automating
production
patient
handouts
on
common
neuro-ophthalmic
diseases.
Methods:
We
queried
ChatGPT-3.5
to
generate
51
across
17
conditions.
devised
the
“Quality
Generated
Language
Outputs
Patients”
(QGLOP)
tool
assess
domains
accuracy/comprehensiveness,
bias,
currency,
tone,
each
scored
out
4
total
16.
A
fellowship-trained
neuro-ophthalmologist
passage.
Handout
readability
was
assessed
using
Simple
Measure
Gobbledygook
(SMOG),
which
estimates
years
required
understand
text.
Results:
The
QGLOP
scores
accuracy,
tone
were
found
be
2.43,
3,
3.43,
3.02
respectively.
mean
score
11.9
[95%
CI
8.98,
14.8]
16
points,
indicating
performance
74.4%
56.1%,
92.5%].
SMOG
responses
as
10.9
9.36,
12.4]
education.
Conclusions:
suggests
that
ophthalmologist
have
at-least
moderate
level
satisfaction
write-up
quality
conferred
by
ChatGPT.
This
still
requires
final
review
editing
before
dissemination.
Comparatively,
rarer
5%
collectively
either
extreme
would
require
very
mild
or
extensive
revision.
Also,
exceeded
accepted
upper
limits
grade
8
reading
health-related
handouts.
In
its
current
iteration,
should
used
an
efficiency
initial
draft
neuro-ophthalmologist,
who
then
refine
accuracy
lay
readership.
Ophthalmology Science,
Journal Year:
2024,
Volume and Issue:
4(4), P. 100485 - 100485
Published: Feb. 6, 2024
ObjectiveTo
assess
the
quality,
empathy,
and
safety
of
expert
edited
large
language
model
(LLM),
human
created
LLM
responses
to
common
retina
patient
questionsDesignRandomized,
masked
multicenter
studyParticipantsTwenty-one
questions
were
randomly
assigned
among
13
specialists.
Each
a
response
(Expert)
then
(ChatGPT-4)-generated
that
question
(Expert+AI),
timing
themselves
for
both
tasks.
Five
LLMs
(ChatGPT-3.5,
ChatGPT-4,
Claude
2,
Bing,
Bard)
also
generated
each
question.
The
original
along
with
anonymized
randomized
Expert+AI,
Expert
evaluated
by
other
experts
who
did
not
write
an
Evaluators
judged
quality
empathy
(very
poor,
acceptable,
good,
or
very
good)
metrics
(incorrect
information,
likelihood
cause
harm,
extent
missing
content).Main
OutcomeMean
score,
proportion
incorrect
content
typeResultsThere
4008
total
grades
collected
(2608
empathy;
1400
metrics),
significant
differences
in
(p<0.001,
p<0.001)
between
LLM,
Expert+AI
groups.
For
(3.86
+/-
0.85)
performed
best
overall
while
GPT-3.5
(3.75
0.79)
was
top
performing
LLM.
0.69)
had
highest
mean
score
followed
(3.73
0.63).
By
placed
fourth
out
seven
sixth
empathy.
(p<0.001)
(p<0.001),
expert-edited
better
than
expert-created
responses.
There
time
savings
vs.
(p=0.02).
ChatGPT-4
similar
Inappropriate
Content
(p=0.35),
Missing
(p=0.001),
Extent
Possible
Harm
(p=0.356),
Likelihood
(p=0.129).Conclusions
RelevanceIn
this
randomized,
masked,
study,
comparable
terms
metrics,
warranting
further
exploration
their
potential
benefits
clinical
settings.
Current Opinion in Ophthalmology,
Journal Year:
2024,
Volume and Issue:
35(3), P. 205 - 209
Published: Feb. 7, 2024
Purpose
of
review
This
seeks
to
provide
a
summary
the
most
recent
research
findings
regarding
utilization
ChatGPT,
an
artificial
intelligence
(AI)-powered
chatbot,
in
field
ophthalmology
addition
exploring
limitations
and
ethical
considerations
associated
with
its
application.
Recent
ChatGPT
has
gained
widespread
recognition
demonstrated
potential
enhancing
patient
physician
education,
boosting
productivity,
streamlining
administrative
tasks.
In
various
studies
examining
utility
ophthalmology,
exhibited
fair
good
accuracy,
iteration
showcasing
superior
performance
providing
ophthalmic
recommendations
across
disorders
such
as
corneal
diseases,
orbital
disorders,
vitreoretinal
uveitis,
neuro-ophthalmology,
glaucoma.
proves
beneficial
for
patients
accessing
information
aids
physicians
triaging
well
formulating
differential
diagnoses.
Despite
benefits,
that
require
acknowledgment
including
risk
offering
inaccurate
or
harmful
information,
dependence
on
outdated
data,
necessity
high
level
education
data
comprehension,
concerns
privacy
within
domain.
Summary
is
promising
new
tool
could
contribute
healthcare
research,
potentially
reducing
work
burdens.
However,
current
necessitate
complementary
role
human
expert
oversight.
Medicine,
Journal Year:
2024,
Volume and Issue:
103(32), P. e39250 - e39250
Published: Aug. 9, 2024
ChatGPT,
a
powerful
AI
language
model,
has
gained
increasing
prominence
in
medicine,
offering
potential
applications
healthcare,
clinical
decision
support,
patient
communication,
and
medical
research.
This
systematic
review
aims
to
comprehensively
assess
the
of
ChatGPT
healthcare
education,
research,
writing,
practice
while
also
delineating
limitations
areas
for
improvement.
JAMA Ophthalmology,
Journal Year:
2024,
Volume and Issue:
142(9), P. 798 - 798
Published: July 18, 2024
Although
augmenting
large
language
models
(LLMs)
with
knowledge
bases
may
improve
medical
domain-specific
performance,
practical
methods
are
needed
for
local
implementation
of
LLMs
that
address
privacy
concerns
and
enhance
accessibility
health
care
professionals.
NEJM AI,
Journal Year:
2024,
Volume and Issue:
1(7)
Published: June 17, 2024
Large
language
models
(LLMs)
have
shown
significant
promise
related
to
their
application
in
medical
research,
education,
and
clinical
tasks.
While
acknowledging
capabilities,
we
face
the
challenge
of
striking
a
balance
between
defining
holding
ethical
boundaries
driving
innovation
LLM
technology
for
medicine.
We
herein
propose
framework,
grounded
four
bioethical
principles,
promote
responsible
use
LLMs.
This
model
requires
LLMs
by
three
parties
—
patient,
clinician,
systems
that
govern
itself
suggests
potential
approaches
mitigating
risks
approach
allows
us
ethically,
equitably,
effectively
Vascular,
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 18, 2024
Objectives
Generative
artificial
intelligence
(AI)
has
emerged
as
a
promising
tool
to
engage
with
patients.
The
objective
of
this
study
was
assess
the
quality
AI
responses
common
patient
questions
regarding
vascular
surgery
disease
processes.
Methods
OpenAI’s
ChatGPT-3.5
and
Google
Bard
were
queried
24
mock
spanning
seven
domains.
Six
experienced
faculty
at
tertiary
academic
center
independently
graded
on
their
accuracy
(rated
1–4
from
completely
inaccurate
accurate),
completeness
totally
incomplete
complete),
appropriateness
(binary).
Responses
also
evaluated
three
readability
scales.
Results
ChatGPT
rated,
average,
more
accurate
than
(3.08
±
0.33
vs
2.82
0.40,
p
<
.01).
scored,
complete
(2.98
0.34
2.62
0.36,
Most
(75.0%,
n
=
18)
almost
half
(45.8%,
11)
unanimously
deemed
appropriate.
Almost
one-third
(29.2%,
7)
inappropriate
by
least
two
reviewers
(29.2%),
(8.4%)
considered
majority.
mean
Flesch
Reading
Ease,
Flesch–Kincaid
Grade
Level,
Gunning
Fog
Index
29.4
10.8,
14.5
2.2,
17.7
3.1,
respectively,
indicating
that
readable
post-secondary
education.
Bard’s
scores
58.9
10.5,
8.2
1.7,
11.0
2.0,
high-school
education
(
.0001
for
metrics).
ChatGPT’s
response
length
(332
79
words)
higher
(183
53
words,
.001).
There
no
difference
in
accuracy,
completeness,
readability,
or
between
domains
>
.05
all
analyses).
Conclusions
offers
novel
means
educating
patients
avoids
inundation
information
“Dr
Google”
time
barriers
physician-patient
encounters.
provides
largely
valid,
though
imperfect,
myriad
expense
readability.
While
are
concise,
is
poorer.
Further
research
warranted
better
understand
failure
points
large
language
models
Journal of Medical Internet Research,
Journal Year:
2024,
Volume and Issue:
26, P. e54706 - e54706
Published: April 2, 2024
Background
There
is
a
dearth
of
feasibility
assessments
regarding
using
large
language
models
(LLMs)
for
responding
to
inquiries
from
autistic
patients
within
Chinese-language
context.
Despite
Chinese
being
one
the
most
widely
spoken
languages
globally,
predominant
research
focus
on
applying
these
in
medical
field
has
been
English-speaking
populations.
Objective
This
study
aims
assess
effectiveness
LLM
chatbots,
specifically
ChatGPT-4
(OpenAI)
and
ERNIE
Bot
(version
2.2.3;
Baidu,
Inc),
advanced
LLMs
China,
addressing
individuals
setting.
Methods
For
this
study,
we
gathered
data
DXY—a
acknowledged,
web-based,
consultation
platform
China
with
user
base
over
100
million
individuals.
A
total
patient
samples
were
rigorously
selected
January
2018
August
2023,
amounting
239
questions
extracted
publicly
available
autism-related
documents
platform.
To
maintain
objectivity,
both
original
responses
anonymized
randomized.
An
evaluation
team
3
chief
physicians
assessed
across
4
dimensions:
relevance,
accuracy,
usefulness,
empathy.
The
completed
717
evaluations.
initially
identified
best
response
then
used
Likert
scale
5
categories
gauge
responses,
each
representing
distinct
level
quality.
Finally,
compared
collected
different
sources.
Results
Among
evaluations
conducted,
46.86%
(95%
CI
43.21%-50.51%)
assessors
displayed
varying
preferences
physicians,
34.87%
31.38%-38.36%)
favoring
ChatGPT
18.27%
15.44%-21.10%)
Bot.
average
relevance
scores
ChatGPT,
3.75
3.69-3.82),
3.69
3.63-3.74),
3.41
3.35-3.46),
respectively.
Physicians
(3.66,
95%
3.60-3.73)
(3.73,
3.69-3.77)
demonstrated
higher
accuracy
ratings
(3.52,
3.47-3.57).
In
terms
usefulness
scores,
(3.54,
3.47-3.62)
received
than
(3.40,
3.34-3.47)
(3.05,
2.99-3.12).
concerning
empathy
dimension,
(3.64,
3.57-3.71)
outperformed
(3.13,
3.04-3.21)
(3.11,
3.04-3.18).
Conclusions
cross-sectional
physicians’
exhibited
superiority
present
Nonetheless,
can
provide
valuable
guidance
may
even
surpass
demonstrating
However,
it
crucial
acknowledge
that
further
optimization
are
imperative
prerequisites
before
effective
integration
clinical
settings
diverse
linguistic
environments
be
realized.
Trial
Registration
Clinical
Registry
ChiCTR2300074655;
https://www.chictr.org.cn/bin/project/edit?pid=199432
Journal of Medical Internet Research,
Journal Year:
2024,
Volume and Issue:
26, P. e57721 - e57721
Published: July 4, 2024
Discharge
letters
are
a
critical
component
in
the
continuity
of
care
between
specialists
and
primary
providers.
However,
these
time-consuming
to
write,
underprioritized
comparison
direct
clinical
care,
often
tasked
junior
doctors.
Prior
studies
assessing
quality
discharge
summaries
written
for
inpatient
hospital
admissions
show
inadequacies
many
domains.
Large
language
models
such
as
GPT
have
ability
summarize
large
volumes
unstructured
free
text
electronic
medical
records
potential
automate
tasks,
providing
time
savings
consistency
quality.
International Journal of Nursing Studies Advances,
Journal Year:
2024,
Volume and Issue:
6, P. 100181 - 100181
Published: Jan. 28, 2024
The
release
of
ChatGPT
for
general
use
in
2023
by
OpenAI
has
significantly
expanded
the
possible
applications
generative
artificial
intelligence
healthcare
sector,
particularly
terms
information
retrieval
patients,
medical
and
nursing
students,
personnel.
To
compare
performance
ChatGPT-3.5
ChatGPT-4.0
to
clinical
nurses
on
answering
questions
about
tracheostomy
care,
as
well
determine
whether
using
different
prompts
pre-define
scope
affects
accuracy
their
responses.
Cross-sectional
study
data
collected
from
was
4.0
access
provided
University
Hong
Kong.
working
mainland
China
Qualtrics
survey
program.
No
participants
were
needed
collecting
A
total
272
nurses,
with
98.5%
them
tertiary
care
hospitals
China,
recruited
a
snowball
sampling
approach.
We
used
43
care-related
multiple-choice
format
evaluate
ChatGPT-3.5,
ChatGPT-4.0,
nurses.
GPT-4.0
both
queried
three
times
same
prompts:
no
prompt,
patient-friendly
act-as-nurse
prompt.
All
responses
independently
graded
two
qualified
otorhinolaryngology
3-point
scale
(correct,
partially
correct,
incorrect).
Chi-squared
test
Fisher
exact
post-hoc
Bonferroni
adjustment
assess
differences
between
groups,
prompts.
showed
higher
accuracy,
64.3%
rated
'correct',
compared
60.5%
36.7%
(X
2
=
74.192,
p
<
0.001).
Except
'care
stoma
surrounding
skin'
domain
(X2=
6.227,
p=
0.156),
scores
-4.0
better
than
nurses'
domains
related
airway
humidification,
cuff
management,
tube
suction
techniques,
management
complications.
Overall,
consistently
performed
all
domains,
achieving
over
50%
each
domain.
Alterations
prompt
had
impact
or
-4.0.
may
serve
complementary
tool
patients
physicians
improve
knowledge
care.