Large
language
model
(LLM)
chatbots
have
many
applications
in
medical
settings.
However,
these
tools
can
potentially
perpetuate
racial
and
gender
biases
through
their
responses,
worsening
disparities
healthcare.
With
the
ongoing
discussion
of
LLM
oncology
widespread
goal
addressing
cancer
disparities,
this
study
focuses
on
propagated
by
oncology.
medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Апрель 28, 2023
The
United
States
Medical
Licensing
Examination
(USMLE)
is
a
critical
step
in
assessing
the
competence
of
future
physicians,
yet
process
creating
exam
questions
and
study
materials
both
time-consuming
costly.
While
Large
Language
Models
(LLMs),
such
as
OpenAI’s
GPT-4,
have
demonstrated
proficiency
answering
medical
questions,
their
potential
generating
remains
underexplored.
This
presents
QUEST-AI,
novel
system
that
utilizes
LLMs
to
(1)
generate
USMLE-style
(2)
identify
flag
incorrect
(3)
correct
errors
flagged
questions.
We
evaluated
this
system’s
output
by
constructing
test
set
50
LLM-generated
mixed
with
human-generated
conducting
two-part
assessment
three
physicians
two
students.
assessors
attempted
distinguish
between
LLM
validity
content.
A
majority
generated
QUEST-AI
were
deemed
valid
panel
clinicians,
strong
correlations
performance
on
pioneering
application
education
could
significantly
increase
ease
efficiency
developing
content,
offering
cost-effective
accessible
alternative
for
preparation.
Frontiers in Medicine,
Год журнала:
2025,
Номер
11
Опубликована: Янв. 30, 2025
Large
Language
Models
(LLMs)
like
ChatGPT,
Gemini,
and
Claude
gain
traction
in
healthcare
simulation;
this
paper
offers
simulationists
a
practical
guide
to
effective
prompt
design.
Grounded
structured
literature
review
iterative
testing,
proposes
best
practices
for
developing
calibrated
prompts,
explores
various
types
techniques
with
use
cases,
addresses
the
challenges,
including
ethical
considerations
using
LLMs
simulation.
This
helps
bridge
knowledge
gap
on
LLM
simulation-based
education,
offering
tailored
guidance
Examples
were
created
through
testing
ensure
alignment
simulation
objectives,
covering
cases
such
as
clinical
scenario
development,
OSCE
station
creation,
simulated
person
scripting,
debriefing
facilitation.
These
provide
easy-to-apply
methods
enhance
realism,
engagement,
educational
simulations.
Key
challenges
associated
integration,
bias,
privacy
concerns,
hallucinations,
lack
of
transparency,
need
robust
oversight
evaluation,
are
discussed
alongside
unique
education.
Recommendations
provided
help
craft
prompts
that
align
objectives
while
mitigating
these
challenges.
By
insights,
contributes
valuable,
timely
seeking
leverage
generative
AI’s
capabilities
education
responsibly.
JMIR Formative Research,
Год журнала:
2025,
Номер
9, С. e66478 - e66478
Опубликована: Янв. 31, 2025
Abstract
Background
Case
studies
have
shown
ChatGPT
can
run
clinical
simulations
at
the
medical
student
level.
However,
no
data
assessed
ChatGPT’s
reliability
in
meeting
desired
simulation
criteria
such
as
accuracy,
formatting,
and
robust
feedback
mechanisms.
Objective
This
study
aims
to
quantify
ability
consistently
follow
formatting
instructions
create
for
preclinical
learners
according
principles
of
multimedia
educational
technology.
Methods
Using
ChatGPT-4
a
prevalidated
starting
prompt,
authors
ran
360
separate
an
acute
asthma
exacerbation.
A
total
180
were
given
correct
answers
incorrect
answers.
was
evaluated
its
adhere
basic
parameters
(stepwise
progression,
free
response,
interactivity),
advanced
(autonomous
conclusion,
delayed
feedback,
comprehensive
feedback),
accuracy
(vignette,
treatment
updates,
feedback).
Significance
determined
with
χ
²
analyses
using
95%
CIs
odds
ratios.
Results
In
total,
100%
(n=360)
met
medically
accurate.
For
parameters,
55%
(200/360)
all
while
Correct
arm
(157/180,
87%)
significantly
more
than
Incorrect
(43/180,
24%;
P
<.001).
79%
(285/360)
concluded
autonomously,
there
difference
between
arms
autonomous
conclusion
(146/180,
81%
139/180,
77%;
=.36).
Overall,
78%
(282/360)
gave
(137/180,
76%
145/180,
81%;
=.31).
not
likely
conclude
autonomously
(
=.34)
provide
=.27)
when
compared
delayed.
Conclusions
These
potential
be
reliable
tool
simple
by
novel
9-part
metric.
Per
this
metric,
performed
perfectly
on
parameters.
It
well
conclusion.
Delayed
depended
user
inputs.
one
parameter
meet
Further
work
must
done
ensure
consistent
performance
across
broader
range
scenarios.
Healthcare,
Год журнала:
2025,
Номер
13(6), С. 603 - 603
Опубликована: Март 10, 2025
Background/Objectives:
Large
language
models
(LLMs)
have
shown
significant
potential
to
transform
various
aspects
of
healthcare.
This
review
aims
explore
the
current
applications,
challenges,
and
future
prospects
LLMs
in
medical
education,
clinical
decision
support,
healthcare
administration.
Methods:
A
comprehensive
literature
was
conducted,
examining
applications
across
three
key
domains.
The
analysis
included
their
performance,
advancements,
with
a
focus
on
techniques
like
retrieval-augmented
generation
(RAG).
Results:
In
show
promise
as
virtual
patients,
personalized
tutors,
tools
for
generating
study
materials.
Some
outperformed
junior
trainees
specific
knowledge
assessments.
Concerning
exhibit
diagnostic
assistance,
treatment
recommendations,
retrieval,
though
performance
varies
specialties
tasks.
administration,
effectively
automate
tasks
note
summarization,
data
extraction,
report
generation,
potentially
reducing
administrative
burdens
professionals.
Despite
promise,
challenges
persist,
including
hallucination
mitigation,
addressing
biases,
ensuring
patient
privacy
security.
Conclusions:
transformative
medicine
but
require
careful
integration
into
settings.
Ethical
considerations,
regulatory
interdisciplinary
collaboration
between
AI
developers
professionals
are
essential.
Future
advancements
LLM
reliability
through
such
RAG,
fine-tuning,
reinforcement
learning
will
be
critical
safety
improving
delivery.
Journal of Medical Internet Research,
Год журнала:
2024,
Номер
26, С. e57037 - e57037
Опубликована: Авг. 20, 2024
Background
ChatGPT
is
a
natural
language
processing
model
developed
by
OpenAI,
which
can
be
iteratively
updated
and
optimized
to
accommodate
the
changing
complex
requirements
of
human
verbal
communication.
Objective
The
study
aimed
evaluate
ChatGPT’s
accuracy
in
answering
orthopedics-related
multiple-choice
questions
(MCQs)
assess
its
short-term
effects
as
learning
aid
through
randomized
controlled
trial.
In
addition,
long-term
on
student
performance
other
subjects
were
measured
using
final
examination
results.
Methods
We
first
evaluated
MCQs
pertaining
orthopedics
across
various
question
formats.
Then,
129
undergraduate
medical
students
participated
group
used
tool,
while
control
was
prohibited
from
artificial
intelligence
software
support
learning.
Following
2-week
intervention,
2
groups’
understanding
assessed
an
test,
variations
disciplines
noted
follow-up
at
end
semester.
Results
ChatGPT-4.0
answered
1051
with
70.60%
(742/1051)
rate,
including
71.8%
(237/330)
for
A1
MCQs,
73.7%
(330/448)
A2
70.2%
(92/131)
A3/4
58.5%
(83/142)
case
analysis
MCQs.
As
April
7,
2023,
total
individuals
experiment.
However,
19
withdrew
experiment
phases;
thus,
July
1,
110
accomplished
trial
completed
all
work.
After
we
intervened
style
short
term,
more
correctly
than
(ChatGPT
group:
mean
141.20,
SD
26.68;
130.80,
25.56;
P=.04)
particularly
46.57,
8.52;
42.18,
9.43;
P=.01),
60.59,
10.58;
56.66,
9.91;
P=.047),
19.57,
5.48;
16.46,
4.58;
P=.002).
At
semester,
found
that
performed
better
examinations
surgery
76.54,
9.79;
72.54,
8.11;
P=.02)
obstetrics
gynecology
75.98,
8.94;
8.66;
group.
Conclusions
answers
accurately,
it
excel
both
assessments.
Our
findings
strongly
integration
into
education,
enhancing
contemporary
instructional
methods.
Trial
Registration
Chinese
Clinical
Registry
Chictr2300071774;
https://www.chictr.org.cn/hvshowproject.html
?id=225740&v=1.0
Abstract
Objectives
Artificial
intelligence
tools
such
as
Chat
Generative
Pre-trained
Transformer
(ChatGPT)
have
been
used
for
many
health
care-related
applications;
however,
there
is
a
lack
of
research
on
their
capabilities
evaluating
morally
and/or
ethically
complex
medical
decisions.
The
objective
this
study
was
to
assess
the
moral
competence
ChatGPT.
Materials
and
methods
This
cross-sectional
performed
between
May
2023
July
using
scenarios
from
Moral
Competence
Test
(MCT).
Numerical
responses
were
collected
ChatGPT
3.5
4.0
individual
overall
stage
scores,
including
C-index
preference.
Descriptive
analysis
2-sided
Student’s
t-test
all
continuous
data.
Results
A
total
100
iterations
MCT
preference
found
be
higher
in
latter
Kohlberg-derived
arguments.
(2.325
versus
1.755)
when
compared
3.5.
also
statistically
score
comparison
(29.03
±
11.10
19.32
10.95,
P
=.0000275).
Discussion
trended
towards
stages
Kohlberg’s
theory
both
dilemmas
with
C-indices
suggesting
medium
competence.
However,
models
showed
moderate
variation
scores
indicating
inconsistency
further
training
recommended.
Conclusion
demonstrates
can
evaluate
arguments
based
development.
These
findings
suggest
that
future
revisions
other
large
language
could
assist
physicians
decision-making
process
encountering
ethical
scenarios.
Future Microbiology,
Год журнала:
2024,
Номер
19(15), С. 1283 - 1292
Опубликована: Июль 29, 2024
Aim:
Assessing
the
visual
accuracy
of
two
large
language
models
(LLMs)
in
microbial
classification.
Materials
&
methods:
GPT-4o
and
Gemini
1.5
Pro
were
evaluated
distinguishing
Gram-positive
from
Gram-negative
bacteria
classifying
them
as
cocci
or
bacilli
using
80
Gram
stain
images
a
labeled
database.
Results:
achieved
100%
identifying
simultaneously
shape
for
Clostridium
perfringens,
Pseudomonas
aeruginosa
Staphylococcus
aureus.
showed
more
variability
similar
(45,
100
95%,
respectively).
Both
LLMs
failed
to
identify
both
bacterial
Neisseria
gonorrhoeae.
Cumulative
plots
indicated
that
consistently
performed
equally
better
every
identification,
except
gonorrhoeae's
shape.
Conclusion:
These
results
suggest
these
their
unprimed
state
are
not
ready
be
implemented
clinical
practice
highlight
need
research
with
larger
datasets
improve
LLMs'
effectiveness
microbiology.