BMC Medical Informatics and Decision Making,
Journal Year:
2024,
Volume and Issue:
24(1)
Published: Nov. 29, 2024
Abstract
Background
Owing
to
the
rapid
growth
in
popularity
of
Large
Language
Models
(LLMs),
various
performance
evaluation
studies
have
been
conducted
confirm
their
applicability
medical
field.
However,
there
is
still
no
clear
framework
for
evaluating
LLMs.
Objective
This
study
reviews
on
LLM
evaluations
field
and
analyzes
research
methods
used
these
studies.
It
aims
provide
a
reference
future
researchers
designing
Methods
&
materials
We
scoping
review
three
databases
(PubMed,
Embase,
MEDLINE)
identify
LLM-related
articles
published
between
January
1,
2023,
September
30,
2023.
analyzed
types
methods,
number
questions
(queries),
evaluators,
repeat
measurements,
additional
analysis
use
prompt
engineering,
metrics
other
than
accuracy.
Results
A
total
142
met
inclusion
criteria.
was
primarily
categorized
as
either
providing
test
examinations
(
n
=
53,
37.3%)
or
being
evaluated
by
professional
80,
56.3%),
with
some
hybrid
cases
5,
3.5%)
combination
two
4,
2.8%).
Most
had
100
fewer
18,
29.0%),
15
(24.2%)
performed
repeated
18
(29.0%)
analyses,
8
(12.9%)
engineering.
For
assessment,
most
50
queries
54,
64.3%),
evaluators
43,
48.3%),
14
(14.7%)
Conclusions
More
required
regarding
application
LLMs
healthcare.
Although
previous
performance,
will
likely
focus
improving
performance.
well-structured
methodology
be
systematically.
Medical Teacher,
Journal Year:
2024,
Volume and Issue:
46(4), P. 446 - 470
Published: Feb. 29, 2024
Background
Artificial
Intelligence
(AI)
is
rapidly
transforming
healthcare,
and
there
a
critical
need
for
nuanced
understanding
of
how
AI
reshaping
teaching,
learning,
educational
practice
in
medical
education.
This
review
aimed
to
map
the
literature
regarding
applications
education,
core
areas
findings,
potential
candidates
formal
systematic
gaps
future
research.
International Journal of General Medicine,
Journal Year:
2024,
Volume and Issue:
Volume 17, P. 817 - 826
Published: March 1, 2024
ChatGPT,
an
AI-driven
conversational
large
language
model
(LLM),
has
garnered
significant
scholarly
attention
since
its
inception,
owing
to
manifold
applications
in
the
realm
of
medical
science.
This
study
primarily
examines
merits,
limitations,
anticipated
developments,
and
practical
ChatGPT
clinical
practice,
healthcare,
education,
research.
It
underscores
necessity
for
further
research
development
enhance
performance
deployment.
Moreover,
future
avenues
encompass
ongoing
enhancements
standardization
mitigating
exploring
integration
applicability
translational
personalized
medicine.
Reflecting
narrative
nature
this
review,
a
focused
literature
search
was
performed
identify
relevant
publications
on
ChatGPT's
use
process
aimed
at
gathering
broad
spectrum
insights
provide
comprehensive
overview
current
state
prospects
domain.
The
objective
is
aid
healthcare
professionals
understanding
groundbreaking
advancements
associated
with
latest
artificial
intelligence
tools,
while
also
acknowledging
opportunities
challenges
presented
by
ChatGPT.
Journal of Diabetes Science and Technology,
Journal Year:
2023,
Volume and Issue:
unknown
Published: Oct. 5, 2023
The
present
study
aimed
to
investigate
the
knowledge
level
of
Bard
and
ChatGPT
in
areas
endocrinology,
diabetes,
diabetes
technology
through
a
multiple-choice
question
(MCQ)
examination
format.Initially,
100-MCQ
bank
was
established
based
on
MCQs
technology.
were
created
from
physiology,
medical
textbooks,
academic
pools
pools.
team
members
analyzed
MCQ
contents
ensure
that
they
related
number
endocrinology
50,
science
also
50.
Google's
assessed
with
an
MCQ-based
examination.In
section,
obtained
29
marks
(correct
responses)
50
(58%),
similar
score
(58%).
However,
23
(46%),
20
(40%).
Overall,
entire
three-part
examination,
52
100
(52%),
49
(49%).
slightly
more
than
Bard.
both
did
not
achieve
satisfactory
scores
or
diabetes/technology
at
least
60%.The
overall
performance
better
appropriate
diabetes/diabetes
indicates
have
potential
facilitate
students
faculty
education
settings,
but
artificial
intelligence
tools
need
updated
information
fields
Cureus,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 16, 2024
Background:
ChatGPT
is
an
artificial
intelligence-powered
chatbot
that
has
demonstrated
capabilities
in
numerous
fields,
including
medical
and
healthcare
sciences.
This
study
evaluates
the
potential
for
application
telepharmacy,
delivering
of
pharmaceutical
care
via
means
telecommunications,
through
assessing
its
interactions,
adherence
to
instructions,
ability
role-play
as
a
pharmacist
while
handling
series
life-like
scenario
questions.
Methods:
Two
versions
(ChatGPT
3.5
4.0,
OpenAI)
were
assessed
using
two
independent
trials
each.
was
instructed
act
answer
patient
inquiries,
followed
by
set
20
assessment
Then,
stop
act,
provide
feedback
list
sources
drug
information.
The
responses
questions
evaluated
terms
accuracy,
precision
clarity
4-point
Likert-like
scale.
Results:
follow
detailed
pharmacist,
appropriately
handle
all
able
understand
case
details,
recognize
generic
brand
names,
identify
side
effects,
prescription
requirements
precautions,
proper
point-by-point
instructions
regarding
administration,
dosing,
storage
disposal.
overall
pooled
scores
3.425
(0.712)
3.7
(0.61)
respectively.
rank
distribution
not
significantly
different
(P>0.05).
None
answers
could
be
considered
directly
harmful
or
labeled
entirely
mostly
incorrect,
most
point
deductions
due
other
factors
such
indecisiveness,
adding
immaterial
information,
missing
certain
considerations,
partial
unclarity.
similar
length
across
concise.
4.0
showed
superior
performance,
higher
consistency,
better
character
report
various
reliable
information
sources.
However,
it
only
allowed
input
40
every
three
hours
provided
inaccurate
number
patients,
compared
which
unlimited
but
unable
feedback.
Conclusions:
Integrating
telepharmacy
holds
promising
potential;
however,
drawbacks
are
overcome
order
function
effectively.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: Jan. 7, 2025
To
explore
the
attitudes
of
healthcare
professionals
and
public
on
applying
ChatGPT
in
clinical
practice.
The
successful
application
practice
depends
technical
performance
critically
perceptions
non-healthcare
healthcare.
This
study
has
a
qualitative
design
based
artificial
intelligence.
was
divided
into
five
steps:
data
collection,
cleaning,
validation
relevance,
sentiment
analysis,
content
analysis
using
K-means
algorithm.
comprised
3130
comments
amounting
to
1,593,650
words.
dictionary
method
showed
positive
negative
emotions
such
as
anger,
disgust,
fear,
sadness,
surprise,
good,
happy
emotions.
Healthcare
prioritized
ChatGPT's
efficiency
but
raised
ethical
accountability
concerns,
while
valued
its
accessibility
emotional
support
expressed
worries
about
privacy
misinformation.
Bridging
these
perspectives
by
improving
reliability,
safeguarding
privacy,
clearly
defining
role
is
essential
for
practical
integration
Journal of Personalized Medicine,
Journal Year:
2025,
Volume and Issue:
15(1), P. 21 - 21
Published: Jan. 9, 2025
Total
hip
arthroplasty
(THA)
is
a
widely
performed
surgical
procedure
that
has
evolved
significantly
due
to
advancements
in
artificial
intelligence
(AI)
and
robotics.
As
demand
for
THA
grows,
reliable
tools
are
essential
enhance
diagnosis,
preoperative
planning,
precision,
postoperative
rehabilitation.
AI
applications
orthopedic
surgery
offer
innovative
solutions,
including
automated
osteoarthritis
(OA)
precise
implant
positioning,
personalized
risk
stratification,
thereby
improving
patient
outcomes.
Deep
learning
models
have
transformed
OA
severity
grading
identification
by
automating
traditionally
manual
processes
with
high
accuracy.
Additionally,
AI-powered
systems
optimize
planning
predicting
the
joint
center
identifying
complications
using
multimodal
data.
Robotic-assisted
enhances
precision
real-time
feedback,
reducing
such
as
dislocations
leg
length
discrepancies
while
accelerating
recovery.
Despite
these
advancements,
barriers
cost,
accessibility,
steep
curve
surgeons
hinder
widespread
adoption.
Postoperative
rehabilitation
benefits
from
technologies
like
virtual
augmented
reality
telemedicine,
which
engagement
adherence.
However,
limitations,
particularly
among
elderly
populations
lower
adaptability
technology,
underscore
need
user-friendly
platforms.
To
ensure
comprehensiveness,
structured
literature
search
was
conducted
PubMed,
Scopus,
Web
of
Science.
Keywords
included
"artificial
intelligence",
"machine
learning",
"robotics",
"total
arthroplasty".
Inclusion
criteria
emphasized
peer-reviewed
studies
published
English
within
last
decade
focusing
on
technological
clinical
This
review
evaluates
robotics'
role
THA,
highlighting
opportunities
challenges
emphasizing
further
research
real-world
validation
integrate
into
practice
effectively.
Cureus,
Journal Year:
2023,
Volume and Issue:
unknown
Published: Sept. 29, 2023
Background
Generative
artificial
intelligence
(AI)
systems
such
as
ChatGPT-3.5
and
Claude-2
may
assist
in
explaining
complex
medical
science
topics.
A
few
studies
have
shown
that
AI
can
solve
complicated
physiology
problems
require
critical
thinking
analysis.
However,
further
are
required
to
validate
the
effectiveness
of
answering
conceptual
multiple-choice
questions
(MCQs)
human
physiology.
Objective
This
study
aimed
evaluate
compare
proficiency
a
curated
set
MCQs
Methods
In
this
cross-sectional
study,
55
from
10
competencies
was
purposefully
constructed
comprehension,
problem-solving,
analytical
skills
them.
The
structured
prompt
for
response
generation
were
presented
Claude-2.
explanations
provided
by
both
documented
an
Excel
spreadsheet.
All
three
authors
subjected
these
rating
process
using
scale
0
3.
assigned
incorrect,
1
partially
correct,
2
correct
explanation
with
some
aspects
missing,
3
perfectly
explanation.
Both
models
evaluated
their
ability
choose
answer
(option)
provide
clear
comprehensive
MCQs.
Mann-Whitney
U
test
used
responses.
Fleiss
multi-rater
kappa
(κ)
determine
score
agreement
among
raters.
statistical
significance
level
decided
at
P
≤
0.05.
Results
answered
40
correctly,
which
significantly
higher
than
26
responses
ChatGPT-3.5.
distribution
generated
κ
values
0.804
0.818
ChatGPT-3.5,
respectively.
Conclusion
terms
elucidating
physiology,
surpassed
accessing
India
requires
use
virtual
private
network,
raise
security
concerns.
Advances in Medical Education and Practice,
Journal Year:
2024,
Volume and Issue:
Volume 15, P. 393 - 400
Published: May 1, 2024
Introduction:
This
research
investigated
the
capabilities
of
ChatGPT-4
compared
to
medical
students
in
answering
MCQs
using
revised
Bloom's
Taxonomy
as
a
benchmark.
Methods:
A
cross-sectional
study
was
conducted
at
The
University
West
Indies,
Barbados.
and
were
assessed
on
from
various
courses
computer-based
testing.
Results:
included
304
MCQs.
Students
demonstrated
good
knowledge,
with
78%
correctly
least
90%
questions.
However,
achieved
higher
overall
score
(73.7%)
(66.7%).
Course
type
significantly
affected
ChatGPT-4's
performance,
but
levels
did
not.
detailed
association
check
between
program
taxonomy
for
correct
answers
by
showed
highly
significant
correlation
(p<
0.001),
reflecting
concentration
"remember-level"
questions
preclinical
"evaluate-level"
clinical
courses.
Discussion:
highlights
proficiency
standardized
tests
indicates
limitations
reasoning
practical
skills.
performance
discrepancy
suggests
that
effectiveness
artificial
intelligence
(AI)
varies
based
course
content.
Conclusion:
While
shows
promise
an
educational
tool,
its
role
should
be
supplementary,
strategic
integration
into
education
leverage
strengths
address
limitations.
Further
is
needed
explore
AI's
impact
student
across
Keywords:
intelligence,
ChatGPT-4's,
students,
interpretation
abilities,
multiple
choice