Ophthalmic and Physiological Optics,
Journal Year:
2024,
Volume and Issue:
44(3), P. 641 - 671
Published: Feb. 25, 2024
With
the
introduction
of
ChatGPT,
artificial
intelligence
(AI)-based
large
language
models
(LLMs)
are
rapidly
becoming
popular
within
scientific
community.
They
use
natural
processing
to
generate
human-like
responses
queries.
However,
application
LLMs
and
comparison
abilities
among
different
with
their
human
counterparts
in
ophthalmic
care
remain
under-reported.
Interactive Journal of Medical Research,
Journal Year:
2024,
Volume and Issue:
13, P. e54704 - e54704
Published: Jan. 26, 2024
Background
Adherence
to
evidence-based
practice
is
indispensable
in
health
care.
Recently,
the
utility
of
generative
artificial
intelligence
(AI)
models
care
has
been
evaluated
extensively.
However,
lack
consensus
guidelines
on
design
and
reporting
findings
these
studies
poses
a
challenge
for
interpretation
synthesis
evidence.
Objective
This
study
aimed
develop
preliminary
checklist
standardize
AI-based
education
practice.
Methods
A
literature
review
was
conducted
Scopus,
PubMed,
Google
Scholar.
Published
records
with
“ChatGPT,”
“Bing,”
or
“Bard”
title
were
retrieved.
Careful
examination
methodologies
employed
included
identify
common
pertinent
themes
possible
gaps
reporting.
panel
discussion
held
establish
unified
thorough
AI
The
finalized
used
evaluate
by
2
independent
raters.
Cohen
κ
as
method
interrater
reliability.
Results
final
data
set
that
formed
basis
theme
identification
analysis
comprised
total
34
records.
9
collectively
referred
METRICS
(Model,
Evaluation,
Timing,
Range/Randomization,
Individual
factors,
Count,
Specificity
prompts
language).
Their
details
are
follows:
(1)
Model
its
exact
settings;
(2)
Evaluation
approach
generated
content;
(3)
Timing
testing
model;
(4)
Transparency
source;
(5)
Range
tested
topics;
(6)
Randomization
selecting
queries;
(7)
factors
queries
reliability;
(8)
Count
executed
test
(9)
language
used.
overall
mean
score
3.0
(SD
0.58).
acceptable,
range
0.558
0.962
(P<.001
items).
With
classification
per
item,
highest
average
recorded
“Model”
followed
“Specificity”
while
lowest
scores
“Randomization”
item
(classified
suboptimal)
“Individual
factors”
satisfactory).
Conclusions
can
facilitate
guiding
researchers
toward
best
practices
results.
highlight
need
standardized
algorithms
care,
considering
variability
observed
proposed
could
be
helpful
base
universally
accepted
which
swiftly
evolving
research
topic.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 5, 2024
Abstract
The
introduction
of
large
language
models
(LLMs)
into
clinical
practice
promises
to
improve
patient
education
and
empowerment,
thereby
personalizing
medical
care
broadening
access
knowledge.
Despite
the
popularity
LLMs,
there
is
a
significant
gap
in
systematized
information
on
their
use
care.
Therefore,
this
systematic
review
aims
synthesize
current
applications
limitations
LLMs
using
data-driven
convergent
synthesis
approach.
We
searched
5
databases
for
qualitative,
quantitative,
mixed
methods
articles
published
between
2022
2023.
From
4,349
initial
records,
89
studies
across
29
specialties
were
included,
primarily
examining
based
GPT-3.5
(53.2%,
n=66
124
different
examined
per
study)
GPT-4
(26.6%,
n=33/124)
architectures
question
answering,
followed
by
generation,
including
text
summarization
or
translation,
documentation.
Our
analysis
delineates
two
primary
domains
LLM
limitations:
design
output.
Design
included
6
second-order
12
third-order
codes,
such
as
lack
domain
optimization,
data
transparency,
accessibility
issues,
while
output
9
32
example,
non-reproducibility,
non-comprehensiveness,
incorrectness,
unsafety,
bias.
In
conclusion,
study
first
systematically
map
care,
providing
foundational
framework
taxonomy
implementation
evaluation
healthcare
settings.
Communications Medicine,
Journal Year:
2025,
Volume and Issue:
5(1)
Published: Jan. 21, 2025
Abstract
Background
The
introduction
of
large
language
models
(LLMs)
into
clinical
practice
promises
to
improve
patient
education
and
empowerment,
thereby
personalizing
medical
care
broadening
access
knowledge.
Despite
the
popularity
LLMs,
there
is
a
significant
gap
in
systematized
information
on
their
use
care.
Therefore,
this
systematic
review
aims
synthesize
current
applications
limitations
LLMs
Methods
We
systematically
searched
5
databases
for
qualitative,
quantitative,
mixed
methods
articles
published
between
2022
2023.
From
4349
initial
records,
89
studies
across
29
specialties
were
included.
Quality
assessment
was
performed
using
Mixed
Appraisal
Tool
2018.
A
data-driven
convergent
synthesis
approach
applied
thematic
syntheses
LLM
free
line-by-line
coding
Dedoose.
Results
show
that
most
investigate
Generative
Pre-trained
Transformers
(GPT)-3.5
(53.2%,
n
=
66
124
different
examined)
GPT-4
(26.6%,
33/124)
answering
questions,
followed
by
generation,
including
text
summarization
or
translation,
documentation.
Our
analysis
delineates
two
primary
domains
limitations:
design
output.
Design
include
6
second-order
12
third-order
codes,
such
as
lack
domain
optimization,
data
transparency,
accessibility
issues,
while
output
9
32
example,
non-reproducibility,
non-comprehensiveness,
incorrectness,
unsafety,
bias.
Conclusions
This
maps
care,
providing
foundational
framework
taxonomy
implementation
evaluation
healthcare
settings.
JAMA Network Open,
Journal Year:
2025,
Volume and Issue:
8(2), P. e2457879 - e2457879
Published: Feb. 4, 2025
Importance
There
is
much
interest
in
the
clinical
integration
of
large
language
models
(LLMs)
health
care.
Many
studies
have
assessed
ability
LLMs
to
provide
advice,
but
quality
their
reporting
uncertain.
Objective
To
perform
a
systematic
review
examine
variability
among
peer-reviewed
evaluating
performance
generative
artificial
intelligence
(AI)–driven
chatbots
for
summarizing
evidence
and
providing
advice
inform
development
Chatbot
Assessment
Reporting
Tool
(CHART).
Evidence
Review
A
search
MEDLINE
via
Ovid,
Embase
Elsevier,
Web
Science
from
inception
October
27,
2023,
was
conducted
with
help
sciences
librarian
yield
7752
articles.
Two
reviewers
screened
articles
by
title
abstract
followed
full-text
identify
primary
accuracy
AI-driven
(chatbot
studies).
then
performed
data
extraction
137
eligible
studies.
Findings
total
were
included.
Studies
examined
topics
surgery
(55
[40.1%]),
medicine
(51
[37.2%]),
care
(13
[9.5%]).
focused
on
treatment
(91
[66.4%]),
diagnosis
(60
[43.8%]),
or
disease
prevention
(29
[21.2%]).
Most
(136
[99.3%])
evaluated
inaccessible,
closed-source
did
not
enough
information
version
LLM
under
evaluation.
All
lacked
sufficient
description
characteristics,
including
temperature,
token
length,
fine-tuning
availability,
layers,
other
details.
describe
prompt
engineering
phase
study.
The
date
querying
reported
54
(39.4%)
(89
[65.0%])
used
subjective
means
define
successful
chatbot,
while
less
than
one-third
addressed
ethical,
regulatory,
patient
safety
implications
LLMs.
Conclusions
Relevance
In
this
chatbot
studies,
heterogeneous
may
CHART
standards.
Ethical,
considerations
are
crucial
as
grows
Journal of Neuro-Ophthalmology,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 4, 2024
Background:
Patient
education
in
ophthalmology
poses
a
challenge
for
physicians
because
of
time
and
resource
limitations.
ChatGPT
(OpenAI,
San
Francisco)
may
assist
with
automating
production
patient
handouts
on
common
neuro-ophthalmic
diseases.
Methods:
We
queried
ChatGPT-3.5
to
generate
51
across
17
conditions.
devised
the
“Quality
Generated
Language
Outputs
Patients”
(QGLOP)
tool
assess
domains
accuracy/comprehensiveness,
bias,
currency,
tone,
each
scored
out
4
total
16.
A
fellowship-trained
neuro-ophthalmologist
passage.
Handout
readability
was
assessed
using
Simple
Measure
Gobbledygook
(SMOG),
which
estimates
years
required
understand
text.
Results:
The
QGLOP
scores
accuracy,
tone
were
found
be
2.43,
3,
3.43,
3.02
respectively.
mean
score
11.9
[95%
CI
8.98,
14.8]
16
points,
indicating
performance
74.4%
56.1%,
92.5%].
SMOG
responses
as
10.9
9.36,
12.4]
education.
Conclusions:
suggests
that
ophthalmologist
have
at-least
moderate
level
satisfaction
write-up
quality
conferred
by
ChatGPT.
This
still
requires
final
review
editing
before
dissemination.
Comparatively,
rarer
5%
collectively
either
extreme
would
require
very
mild
or
extensive
revision.
Also,
exceeded
accepted
upper
limits
grade
8
reading
health-related
handouts.
In
its
current
iteration,
should
used
an
efficiency
initial
draft
neuro-ophthalmologist,
who
then
refine
accuracy
lay
readership.
Journal of Bone and Joint Surgery,
Journal Year:
2024,
Volume and Issue:
106(12), P. 1136 - 1142
Published: Feb. 9, 2024
Background:
In
today’s
digital
age,
patients
increasingly
rely
on
online
search
engines
for
medical
information.
The
integration
of
large
language
models
such
as
GPT-4
into
Bing
raises
concerns
over
the
potential
transmission
misinformation
when
information
regarding
spine
surgery.
Methods:
SearchResponse.io,
a
database
that
archives
People
Also
Ask
(PAA)
data
from
Google,
was
utilized
to
determine
most
popular
patient
questions
4
specific
surgery
topics:
anterior
cervical
discectomy
and
fusion,
lumbar
laminectomy,
spinal
deformity.
Bing’s
responses
these
questions,
along
with
cited
sources,
were
recorded
analysis.
Two
fellowship-trained
surgeons
assessed
accuracy
answers
6-point
scale
completeness
3-point
scale.
Inaccurate
re-queried
2
weeks
later.
Cited
sources
categorized
evaluated
against
Journal
American
Medical
Association
(JAMA)
benchmark
criteria.
Interrater
reliability
measured
use
kappa
statistic.
A
linear
regression
analysis
explore
relationship
between
answer
type
source,
number
mean
JAMA
score.
Results:
71
PAA
analyzed.
average
score
2.03
(standard
deviation
[SD],
0.36),
4.49
(SD,
1.10).
Among
question
topics,
deformity
had
lowest
Re-querying
initially
low
scores
resulted
in
improved
accuracy.
commercial
prevalent.
across
all
averaged
2.63.
Government
highest
(3.30),
whereas
social
media
(1.75).
Conclusions:
generally
accurate
adequately
complete,
incorrect
rectified
upon
re-querying.
plurality
sourced
websites.
not
significantly
correlated
These
findings
underscore
importance
ongoing
evaluation
improvement
ensure
reliable
informative
results
seeking
amid
experience.
Current Opinion in Ophthalmology,
Journal Year:
2024,
Volume and Issue:
35(3), P. 205 - 209
Published: Feb. 7, 2024
Purpose
of
review
This
seeks
to
provide
a
summary
the
most
recent
research
findings
regarding
utilization
ChatGPT,
an
artificial
intelligence
(AI)-powered
chatbot,
in
field
ophthalmology
addition
exploring
limitations
and
ethical
considerations
associated
with
its
application.
Recent
ChatGPT
has
gained
widespread
recognition
demonstrated
potential
enhancing
patient
physician
education,
boosting
productivity,
streamlining
administrative
tasks.
In
various
studies
examining
utility
ophthalmology,
exhibited
fair
good
accuracy,
iteration
showcasing
superior
performance
providing
ophthalmic
recommendations
across
disorders
such
as
corneal
diseases,
orbital
disorders,
vitreoretinal
uveitis,
neuro-ophthalmology,
glaucoma.
proves
beneficial
for
patients
accessing
information
aids
physicians
triaging
well
formulating
differential
diagnoses.
Despite
benefits,
that
require
acknowledgment
including
risk
offering
inaccurate
or
harmful
information,
dependence
on
outdated
data,
necessity
high
level
education
data
comprehension,
concerns
privacy
within
domain.
Summary
is
promising
new
tool
could
contribute
healthcare
research,
potentially
reducing
work
burdens.
However,
current
necessitate
complementary
role
human
expert
oversight.
Current Opinion in Ophthalmology,
Journal Year:
2024,
Volume and Issue:
35(3), P. 238 - 243
Published: Jan. 22, 2024
Purpose
of
review
Recent
advances
in
artificial
intelligence
(AI),
robotics,
and
chatbots
have
brought
these
technologies
to
the
forefront
medicine,
particularly
ophthalmology.
These
been
applied
diagnosis,
prognosis,
surgical
operations,
patient-specific
care
It
is
thus
both
timely
pertinent
assess
existing
landscape,
recent
advances,
trajectory
trends
AI,
AI-enabled
robots,
findings
Some
developments
integrated
AI
enabled
robotics
with
procedures
More
recently,
large
language
models
(LLMs)
like
ChatGPT
shown
promise
augmenting
research
capabilities
diagnosing
ophthalmic
diseases.
may
portend
a
new
era
doctor-patient-machine
collaboration.
Summary
Ophthalmology
undergoing
revolutionary
change
research,
clinical
practice,
interventions.
Ophthalmic
chatbot
based
on
LLMs
are
converging
create
digital
Collectively,
future
which
conventional
knowledge
will
be
seamlessly
improve
patient
experience
enhance
therapeutic
outcomes.