Pediatric Transplantation,
Journal Year:
2025,
Volume and Issue:
29(3)
Published: March 13, 2025
ABSTRACT
Background
Education
and
enhancing
the
knowledge
of
adolescents
who
will
undergo
kidney
transplantation
are
among
primary
objectives
their
care.
While
there
specific
interventions
in
place
to
achieve
this,
they
require
extensive
resources.
The
rise
large
language
models
like
ChatGPT‐3.5
offers
potential
assistance
for
providing
information
patients.
This
study
aimed
evaluate
accuracy,
relevance,
safety
ChatGPT‐3.5's
responses
patient‐centered
questions
about
pediatric
transplantation.
objective
was
assess
whether
could
be
a
supplementary
educational
tool
caregivers
complex
medical
context.
Methods
A
total
37
were
presented
ChatGPT‐3.5,
which
prompted
respond
as
health
professional
would
layperson.
Five
nephrologists
independently
evaluated
outputs
comprehensiveness,
understandability,
readability,
safety.
Results
mean
relevancy,
comprehensiveness
scores
all
4.51,
4.56,
4.55,
respectively.
Out
outputs,
four
rated
completely
accurate,
seven
relevant
comprehensive.
Only
one
output
had
an
score
below
4.
Twelve
considered
potentially
risky,
but
only
three
risk
grade
moderate
or
higher.
Outputs
that
risky
accuracy
relevancy
average.
Conclusion
Our
findings
suggest
ChatGPT
useful
individuals
waiting
However,
presence
underscores
necessity
human
oversight
validation.
iScience,
Journal Year:
2024,
Volume and Issue:
27(5), P. 109713 - 109713
Published: April 23, 2024
This
study
systematically
reviewed
the
application
of
large
language
models
(LLMs)
in
medicine,
analyzing
550
selected
studies
from
a
vast
literature
search.
LLMs
like
ChatGPT
transformed
healthcare
by
enhancing
diagnostics,
medical
writing,
education,
and
project
management.
They
assisted
drafting
documents,
creating
training
simulations,
streamlining
research
processes.
Despite
their
growing
utility
diagnosis
improving
doctor-patient
communication,
challenges
persisted,
including
limitations
contextual
understanding
risk
over-reliance.
The
surge
LLM-related
indicated
focus
on
patient
but
highlighted
need
for
careful
integration,
considering
validation,
ethical
concerns,
balance
with
traditional
practice.
Future
directions
suggested
multimodal
LLMs,
deeper
algorithmic
understanding,
ensuring
responsible,
effective
use
healthcare.
JAMA Ophthalmology,
Journal Year:
2024,
Volume and Issue:
142(4), P. 371 - 371
Published: Feb. 22, 2024
Large
language
models
(LLMs)
are
revolutionizing
medical
diagnosis
and
treatment,
offering
unprecedented
accuracy
ease
surpassing
conventional
search
engines.
Their
integration
into
assistance
programs
will
become
pivotal
for
ophthalmologists
as
an
adjunct
practicing
evidence-based
medicine.
Therefore,
the
diagnostic
treatment
of
LLM-generated
responses
compared
with
fellowship-trained
can
help
assess
their
validate
potential
utility
in
ophthalmic
subspecialties.
JAMA Ophthalmology,
Journal Year:
2024,
Volume and Issue:
142(4), P. 321 - 321
Published: Feb. 29, 2024
Ophthalmology
is
reliant
on
effective
interpretation
of
multimodal
imaging
to
ensure
diagnostic
accuracy.
The
new
ability
ChatGPT-4
(OpenAI)
interpret
ophthalmic
images
has
not
yet
been
explored.
Ophthalmic and Physiological Optics,
Journal Year:
2024,
Volume and Issue:
44(3), P. 641 - 671
Published: Feb. 25, 2024
With
the
introduction
of
ChatGPT,
artificial
intelligence
(AI)-based
large
language
models
(LLMs)
are
rapidly
becoming
popular
within
scientific
community.
They
use
natural
processing
to
generate
human-like
responses
queries.
However,
application
LLMs
and
comparison
abilities
among
different
with
their
human
counterparts
in
ophthalmic
care
remain
under-reported.
Seminars in Ophthalmology,
Journal Year:
2024,
Volume and Issue:
39(6), P. 472 - 479
Published: March 22, 2024
Purpose
Patients
are
using
online
search
modalities
to
learn
about
their
eye
health.
While
Google
remains
the
most
popular
engine,
use
of
large
language
models
(LLMs)
like
ChatGPT
has
increased.
Cataract
surgery
is
common
surgical
procedure
in
US,
and
there
limited
data
on
quality
information
that
populates
after
searches
related
cataract
engines
such
as
LLM
platforms
ChatGPT.
We
identified
patient
frequently
asked
questions
(FAQs)
cataracts
evaluated
accuracy,
safety,
readability
answers
these
provided
by
both
demonstrated
utility
writing
notes
creating
education
materials.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 5, 2024
Abstract
The
introduction
of
large
language
models
(LLMs)
into
clinical
practice
promises
to
improve
patient
education
and
empowerment,
thereby
personalizing
medical
care
broadening
access
knowledge.
Despite
the
popularity
LLMs,
there
is
a
significant
gap
in
systematized
information
on
their
use
care.
Therefore,
this
systematic
review
aims
synthesize
current
applications
limitations
LLMs
using
data-driven
convergent
synthesis
approach.
We
searched
5
databases
for
qualitative,
quantitative,
mixed
methods
articles
published
between
2022
2023.
From
4,349
initial
records,
89
studies
across
29
specialties
were
included,
primarily
examining
based
GPT-3.5
(53.2%,
n=66
124
different
examined
per
study)
GPT-4
(26.6%,
n=33/124)
architectures
question
answering,
followed
by
generation,
including
text
summarization
or
translation,
documentation.
Our
analysis
delineates
two
primary
domains
LLM
limitations:
design
output.
Design
included
6
second-order
12
third-order
codes,
such
as
lack
domain
optimization,
data
transparency,
accessibility
issues,
while
output
9
32
example,
non-reproducibility,
non-comprehensiveness,
incorrectness,
unsafety,
bias.
In
conclusion,
study
first
systematically
map
care,
providing
foundational
framework
taxonomy
implementation
evaluation
healthcare
settings.
npj Digital Medicine,
Journal Year:
2024,
Volume and Issue:
7(1)
Published: Sept. 28, 2024
Abstract
With
generative
artificial
intelligence
(GenAI),
particularly
large
language
models
(LLMs),
continuing
to
make
inroads
in
healthcare,
assessing
LLMs
with
human
evaluations
is
essential
assuring
safety
and
effectiveness.
This
study
reviews
existing
literature
on
evaluation
methodologies
for
healthcare
across
various
medical
specialties
addresses
factors
such
as
dimensions,
sample
types
sizes,
selection,
recruitment
of
evaluators,
frameworks
metrics,
process,
statistical
analysis
type.
Our
review
142
studies
shows
gaps
reliability,
generalizability,
applicability
current
practices.
To
overcome
significant
obstacles
LLM
developments
deployments,
we
propose
QUEST,
a
comprehensive
practical
framework
covering
three
phases
workflow:
Planning,
Implementation
Adjudication,
Scoring
Review.
QUEST
designed
five
proposed
principles:
Quality
Information,
Understanding
Reasoning,
Expression
Style
Persona,
Safety
Harm,
Trust
Confidence.
Communications Medicine,
Journal Year:
2025,
Volume and Issue:
5(1)
Published: Jan. 21, 2025
Abstract
Background
The
introduction
of
large
language
models
(LLMs)
into
clinical
practice
promises
to
improve
patient
education
and
empowerment,
thereby
personalizing
medical
care
broadening
access
knowledge.
Despite
the
popularity
LLMs,
there
is
a
significant
gap
in
systematized
information
on
their
use
care.
Therefore,
this
systematic
review
aims
synthesize
current
applications
limitations
LLMs
Methods
We
systematically
searched
5
databases
for
qualitative,
quantitative,
mixed
methods
articles
published
between
2022
2023.
From
4349
initial
records,
89
studies
across
29
specialties
were
included.
Quality
assessment
was
performed
using
Mixed
Appraisal
Tool
2018.
A
data-driven
convergent
synthesis
approach
applied
thematic
syntheses
LLM
free
line-by-line
coding
Dedoose.
Results
show
that
most
investigate
Generative
Pre-trained
Transformers
(GPT)-3.5
(53.2%,
n
=
66
124
different
examined)
GPT-4
(26.6%,
33/124)
answering
questions,
followed
by
generation,
including
text
summarization
or
translation,
documentation.
Our
analysis
delineates
two
primary
domains
limitations:
design
output.
Design
include
6
second-order
12
third-order
codes,
such
as
lack
domain
optimization,
data
transparency,
accessibility
issues,
while
output
9
32
example,
non-reproducibility,
non-comprehensiveness,
incorrectness,
unsafety,
bias.
Conclusions
This
maps
care,
providing
foundational
framework
taxonomy
implementation
evaluation
healthcare
settings.
JAMA Network Open,
Journal Year:
2025,
Volume and Issue:
8(2), P. e2457879 - e2457879
Published: Feb. 4, 2025
Importance
There
is
much
interest
in
the
clinical
integration
of
large
language
models
(LLMs)
health
care.
Many
studies
have
assessed
ability
LLMs
to
provide
advice,
but
quality
their
reporting
uncertain.
Objective
To
perform
a
systematic
review
examine
variability
among
peer-reviewed
evaluating
performance
generative
artificial
intelligence
(AI)–driven
chatbots
for
summarizing
evidence
and
providing
advice
inform
development
Chatbot
Assessment
Reporting
Tool
(CHART).
Evidence
Review
A
search
MEDLINE
via
Ovid,
Embase
Elsevier,
Web
Science
from
inception
October
27,
2023,
was
conducted
with
help
sciences
librarian
yield
7752
articles.
Two
reviewers
screened
articles
by
title
abstract
followed
full-text
identify
primary
accuracy
AI-driven
(chatbot
studies).
then
performed
data
extraction
137
eligible
studies.
Findings
total
were
included.
Studies
examined
topics
surgery
(55
[40.1%]),
medicine
(51
[37.2%]),
care
(13
[9.5%]).
focused
on
treatment
(91
[66.4%]),
diagnosis
(60
[43.8%]),
or
disease
prevention
(29
[21.2%]).
Most
(136
[99.3%])
evaluated
inaccessible,
closed-source
did
not
enough
information
version
LLM
under
evaluation.
All
lacked
sufficient
description
characteristics,
including
temperature,
token
length,
fine-tuning
availability,
layers,
other
details.
describe
prompt
engineering
phase
study.
The
date
querying
reported
54
(39.4%)
(89
[65.0%])
used
subjective
means
define
successful
chatbot,
while
less
than
one-third
addressed
ethical,
regulatory,
patient
safety
implications
LLMs.
Conclusions
Relevance
In
this
chatbot
studies,
heterogeneous
may
CHART
standards.
Ethical,
considerations
are
crucial
as
grows
Global Spine Journal,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 17, 2025
Study
Design
Comparative
Analysis.
Objectives
The
American
College
of
Surgeons
developed
the
2022
Best
Practice
Guidelines
to
provide
evidence-based
recommendations
for
managing
spinal
injuries.
This
study
aims
assess
concordance
ChatGPT-4o
and
Gemini
Advanced
with
ACS
Guidelines,
offering
first
expert
evaluation
these
models
in
cord
Methods
Trauma
Quality
Program
Practices
Spine
Injury
were
used
create
52
questions
based
on
key
clinical
recommendations.
These
grouped
into
informational
(8),
diagnostic
(14),
treatment
(30)
categories
posed
Google
Advanced.
Responses
graded
guidelines
validated
by
a
board-certified
spine
surgeon.
Results
ChatGPT
was
concordant
38
(73.07%)
36
(69.23%).
Most
non-concordant
answers
due
insufficient
information.
disagreed
8
questions,
5
3.
Both
achieved
75%
information;
outperformed
diagnostics
(78.57%
vs
71.43%),
while
had
higher
(73.33%
63.33%).
Conclusions
demonstrate
potential
as
valuable
assets
injury
management
providing
responses
aligned
current
best
practices.
marginal
differences
rates
suggest
that
neither
model
exhibits
superior
ability
deliver
guidelines.
Despite
LLMs
increasing
sophistication
utility,
existing
limitations
currently
prevent
them
from
being
clinically
safe
practical
trauma-based
settings.