medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 1, 2024
Abstract
Purpose
This
review
analyzes
the
application
of
large
language
models
(LLMs),
in
field
cardiology,
with
a
focus
on
evaluating
their
performances
across
various
clinical
tasks.
Methods
We
conducted
systematic
literature
search
PubMed
for
studies
published
up
to
April
14,
2024.
Our
used
wide
range
keywords
related
LLMs
and
cardiology
capture
relevant
terms.
The
risk
bias
was
evaluated
using
QUADAS-2
tool.
Results
Fifteen
met
inclusion
criteria,
categorized
into
four
domains:
chronic
progressive
cardiac
conditions,
acute
events,
education,
monitoring.
Six
addressing
conditions
demonstrated
variability
accuracy
depth
LLM-generated
responses.
In
scenarios,
three
articles
showed
that
provided
medical
advice
mixed
effectiveness,
particularly
delivering
CPR
instructions.
Two
educational
revealed
high
answering
assessment
questions
interpreting
cases.
Finally,
diagnostics
multimodal
displayed
capabilities
ECGs
interpretation,
some
performing
at
or
exceeding
level
human
specialists.
Conclusion
demonstrate
considerable
potential
applications
routine
diagnostics.
However,
performance
remains
inconsistent
care
settings
where
precision
is
critical.
Enhancing
real-world
complex
data
emergency
response
guidance
imperative
before
integration
practice.
Musculoskeletal Science and Practice,
Journal Year:
2025,
Volume and Issue:
76, P. 103275 - 103275
Published: Jan. 31, 2025
Generative
artificial
intelligence
tools,
such
as
ChatGPT,
are
becoming
increasingly
integrated
into
daily
life,
and
patients
might
turn
to
this
tool
seek
medical
information.
To
evaluate
the
performance
of
ChatGPT-4
in
responding
patient-centered
queries
for
patellar
tendinopathy
(PT).
Forty-eight
were
collected
from
online
sources,
PT
patients,
experts
then
submitted
ChatGPT-4.
Three
board-certified
independently
assessed
accuracy
comprehensiveness
responses.
Readability
was
measured
using
Flesch-Kincaid
Grade
Level
(FKGL:
higher
scores
indicate
a
grade
reading
level).
The
Patient
Education
Materials
Assessment
Tool
(PEMAT)
evaluated
understandability,
actionability
(0-100%,
information
with
clearer
messages
more
identifiable
actions).
Semantic
Textual
Similarity
(STS
score,
0-1;
similarity)
variation
meaning
texts
over
two
months
(including
ChatGPT-4o)
different
terminologies
related
PT.
Sixteen
(33%)
48
responses
rated
accurate,
while
36
(75%)
comprehensive.
Only
17%
treatment-related
questions
received
accurate
Most
written
at
college
level
(median
interquartile
range
[IQR]
FKGL
score:
15.4
[14.4-16.6]).
median
PEMAT
understandability
83%
(IQR:
70%-92%),
actionability,
it
60%
40%-60%).
medians
STS
across
all
≥
0.9.
provided
generally
comprehensive
response
but
lacked
difficult
read
individuals
below
level.
Cureus,
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 4, 2024
Artificial
intelligence
(AI)
is
a
burgeoning
new
field
that
has
increased
in
popularity
over
the
past
couple
of
years,
coinciding
with
public
release
large
language
model
(LLM)-driven
chatbots.
These
chatbots,
such
as
ChatGPT,
can
be
engaged
directly
conversation,
allowing
users
to
ask
them
questions
or
issue
other
commands.
Since
LLMs
are
trained
on
amounts
text
data,
they
also
answer
reliably
and
factually,
an
ability
allowed
serve
source
for
medical
inquiries.
This
study
seeks
assess
readability
patient
education
materials
cardiac
catheterization
across
four
most
common
chatbots:
Microsoft
Copilot,
Google
Gemini,
Meta
AI.
Frontiers in Medicine,
Journal Year:
2024,
Volume and Issue:
11
Published: Oct. 29, 2024
Large
Language
Models
(LLMs)
are
sophisticated
algorithms
that
analyze
and
generate
vast
amounts
of
textual
data,
mimicking
human
communication.
Notable
LLMs
include
GPT-4o
by
Open
AI,
Claude
3.5
Sonnet
Anthropic,
Gemini
Google.
This
scoping
review
aims
to
synthesize
the
current
applications
potential
uses
in
patient
education
engagement.
Information Development,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 25, 2025
Purpose
–
Artificial
Intelligence
(AI)
is
increasingly
becoming
a
popular
source
of
information,
including
health
information.
It
essential
to
explore
the
adoption
AI
achieve
Health
Information
Literacy
(HIL)
and
ensure
that
users
maximise
use
This
study
explores
AI's
in
advancing
HIL.
identifies
gaps,
concerns,
challenges
suggests
areas
where
could
be
improved.
Approach
The
retrieved
papers
were
initially
assessed
based
on
title
abstract
inclusion
criteria.
full
text
relevant
was
verified
following
exclusion
Additionally,
comprehensive
assessment
reference
lists
included
performed.
extracted
from
selected
articles,
bibliometric
thematic
analysis
applied
for
thorough
examination.
Methodology
Key
details
about
author,
publication
year,
type,
purpose,
key
findings,
collected
using
standardised
format.
As
themes
emerged,
information
publications
address
main
research
questions.
All
articles
reviewed
English
published
between
2019
2024.
Findings
growing
HIL
can
accounted
by
growth
128.13%
publications.
However,
concerns
must
addressed
as
continuous
guaranteed.
Originality
likely
first
assess
current
findings
will
provide
clear
landscape
investing,
identifying
partners,
providing
gap.
Journal of Hand Surgery Global Online,
Journal Year:
2024,
Volume and Issue:
6(3), P. 441 - 443
Published: April 6, 2024
The
American
Society
for
Surgery
of
the
Hand
and
British
produce
patient-focused
information
above
sixth-grade
readability
recommended
by
Medical
Association.
To
promote
health
equity,
content
should
be
aimed
at
an
appropriate
level
literacy.
Artificial
intelligence-driven
large
language
models
may
able
to
assist
hand
surgery
societies
in
improving
provided
patients.
was
calculated
all
articles
written
English
on
websites,
terms
seven
commonest
formulas.
Chat
Generative
Pre-Trained
Transformer
version
4
(ChatGPT-4)
then
asked
rewrite
each
article
a
level.
response
compared
with
unedited
articles.
improve
across
chosen
formulas
successful
achieving
mean
Flesch
Kincaid
Grade
Level
Simple
Measure
Gobbledygook
calculations.
It
increased
Reading
Ease
score,
higher
scores
representing
more
readable
material.
This
study
demonstrated
that
ChatGPT-4
can
used
material
surgery.
However,
is
interested
primarily
sounding
natural,
not
seeking
truth,
hence,
must
evaluated
surgeon
ensure
accuracy
being
sacrificed
sake
this
powerful
tool.
International Journal of Medical Informatics,
Journal Year:
2024,
Volume and Issue:
190, P. 105562 - 105562
Published: Oct. 1, 2024
Chatbots
using
the
Large
Language
Model
(LLM)
generate
human
responses
to
questions
from
all
categories.
Due
staff
shortages
in
healthcare
systems,
patients
waiting
for
an
appointment
increasingly
use
chatbots
get
information
about
their
condition.
Given
number
of
currently
available,
assessing
they
is
essential.
Blood Purification,
Journal Year:
2024,
Volume and Issue:
53(9), P. 725 - 731
Published: April 26, 2024
Introduction:
Acute
kidney
injury
(AKI)
and
continuous
renal
replacement
therapy
(CRRT)
are
critical
areas
in
nephrology.
The
effectiveness
of
ChatGPT
simpler,
patient
education-oriented
questions
has
not
been
thoroughly
assessed.
This
study
evaluates
the
proficiency
4.0
responding
to
such
questions,
subjected
various
linguistic
alterations.
Methods:
Eighty-nine
were
sourced
from
Mayo
Clinic
Handbook
for
educating
patients
on
AKI
CRRT.
These
categorized
as
original,
paraphrased
with
different
interrogative
adverbs,
resulting
incomplete
sentences,
containing
misspelled
words.
Two
nephrologists
verified
medical
accuracy.
A
χ2
test
was
conducted
ascertain
notable
discrepancies
4.0’s
performance
across
these
formats.
Results:
provided
accuracy
handling
a
variety
question
formats
education
Across
all
types,
demonstrated
an
97%
both
original
adverb-altered
98%
sentences
or
misspellings.
Specifically
AKI-related
consistently
maintained
at
versions.
In
subset
CRRT-related
tool
achieved
96%
this
increased
statistical
analysis
revealed
no
significant
difference
varied
types
(p
value:
1.00
CRRT),
there
disparity
between
artificial
intelligence
(AI)’s
responses
CRRT
0.71).
Conclusion:
demonstrates
consistent
high
interpreting
queries
related
CRRT,
irrespective
modifications.
findings
suggest
that
potential
be
reliable
support
delivery
education,
by
accurately
providing
information
range
Further
research
is
needed
explore
direct
impact
AI-generated
understanding
outcomes.
British and Irish Orthoptic Journal,
Journal Year:
2024,
Volume and Issue:
20(1), P. 183 - 192
Published: Jan. 1, 2024
Eye
surgeries
often
evoke
strong
negative
emotions
in
patients,
including
fear
and
anxiety.
Patient
education
material
plays
a
crucial
role
informing
empowering
individuals.
Traditional
sources
of
medical
information
may
not
effectively
address
individual
patient
concerns
or
cater
to
varying
levels
understanding.
This
study
aims
conduct
comparative
analysis
the
accuracy,
completeness,
readability,
tone,
understandability
generated
by
AI
chatbots
versus
traditional
Information
Leaflets
(PILs),
focusing
on
local
anesthesia
eye
surgery.
OTO Open,
Journal Year:
2024,
Volume and Issue:
8(3)
Published: July 1, 2024
Abstract
Objective
Evaluate
the
quality
of
responses
from
Chat
Generative
Pre‐Trained
Transformer
(ChatGPT)
models
compared
to
answers
for
“Frequently
Asked
Questions”
(FAQs)
American
Academy
Otolaryngology–Head
and
Neck
Surgery
(AAO‐HNS)
Clinical
Practice
Guidelines
(CPG)
Ménière's
disease
(MD).
Study
Design
Comparative
analysis.
Setting
The
AAO‐HNS
CPG
MD
includes
FAQs
that
clinicians
can
give
patients
MD‐related
questions.
ability
ChatGPT
properly
educate
regarding
is
unknown.
Methods
ChatGPT‐3.5
4.0
were
each
prompted
with
16
questions
FAQs.
Each
response
was
rated
in
terms
(1)
comprehensiveness,
(2)
extensiveness,
(3)
presence
misleading
information,
(4)
resources.
Readability
assessed
using
Flesch‐Kincaid
Grade
Level
(FKGL)
Flesch
Reading
Ease
Score
(FRES).
Results
comprehensive
5
whereas
ChatGPT‐4.0
9
(31.3%
vs
56.3%,
P
=
.2852).
extensive
all
(
1.0000).
3
18.75%,
.6851).
had
resources
10
(62.5%
100%,
.0177).
FRES
(62.4
±
16.6)
demonstrated
an
appropriate
readability
score
at
least
60,
while
both
(39.1
7.3)
(42.8
8.5)
failed
meet
this
standard.
All
platforms
FKGL
means
exceeded
recommended
level
6
or
lower.
Conclusion
While
significantly
better
resource
reporting,
have
room
improvement
being
more
comprehensive,
readable,
less
patients.