medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 14, 2024
Abstract
Background
Medication-related
harm
has
a
significant
impact
on
global
healthcare
costs
and
patient
outcomes,
accounting
for
deaths
in
4.3
per
1000
patients.
Generative
artificial
intelligence
(GenAI)
emerged
as
promising
tool
mitigating
risks
of
medication-related
harm.
In
particular,
large
language
models
(LLMs)
well-developed
generative
adversarial
networks
(GANs)
showing
promise
related
tasks.
This
review
aims
to
explore
the
scope
effectiveness
AI
reducing
harm,
identifying
existing
development
challenges
research.
Methods
We
searched
peer
reviewed
articles
PubMed,
Web
Science,
Embase,
Scopus
literature
published
from
January
2012
February
2024.
included
studies
focusing
or
application
risk
during
entire
medication
use
process.
excluded
using
traditional
methods
only,
those
unrelated
settings,
concerning
non-prescribed
uses
such
supplements.
Extracted
variables
study
characteristics,
model
specifics
performance,
any
outcome
evaluated.
Findings
A
total
2203
were
identified,
14
met
criteria
inclusion
into
final
review.
found
that
used
few
key
applications:
drug-drug
interaction
identification
prediction;
clinical
decision
support
pharmacovigilance.
While
performance
utility
these
varied,
they
generally
showed
areas
like
early
classification
adverse
drug
events
decision-making
management.
However,
no
tested
prospectively,
suggesting
need
further
investigation
integration
real-world
tools
improve
safety
outcomes
effectively.
Interpretation
shows
harms,
but
there
are
gaps
research
rigor
ethical
considerations.
Future
should
focus
creation
high-quality,
task-specific
benchmarking
datasets
implementation
outcomes.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 24, 2024
Abstract
The
purpose
of
this
study
was
to
compare
performance
ChatGPT
(GPT-3.5),
(GPT-4),
Claude2,
Llama2-7b,
and
Llama2-13b
on
219
multiple-choice
questions
focusing
critical
care
pharmacotherapy.
To
further
assess
the
ability
engineering
LLMs
improve
reasoning
abilities
performance,
we
examined
responses
with
a
zero-shot
Chain-of-Thought
(CoT)
approach,
CoT
prompting,
custom
built
GPT
(PharmacyGPT).
A
focused
pharmacotherapy
topics
used
in
Doctor
Pharmacy
curricula
from
two
accredited
colleges
pharmacy
compiled
for
study.
total
five
were
evaluated:
Llama2-13b.
primary
outcome
response
accuracy.
Of
tested,
GPT-4
showed
highest
average
accuracy
rate
at
71.6%.
larger
variance
indicates
lower
consistency
reduced
confidence
its
answers.
had
lowest
(0.070)
all
LLMs,
but
performed
an
41.5%.
Following
analaysis
overall
accuracy,
knowledge-
vs.
skill-based
assessed.
All
demonstrated
higher
knowledge-based
compared
questions.
questions,
87%
67%,
respectively.
Response
domain
clinical
can
be
improved
by
using
prompt
techniques.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 30, 2024
Abstract
Background
Large
language
models
(LLMs)
such
as
ChatGPT
have
emerged
promising
artificial
intelligence
tools
to
support
clinical
decision
making.
The
ability
of
evaluate
medication
regimens,
identify
drug-drug
interactions
(DDIs),
and
provide
recommendations
is
unknown.
purpose
this
study
examine
the
performance
GPT-4
clinically
relevant
DDIs
assess
accuracy
provided.
Methods
A
total
15
regimens
were
created
containing
commonly
encountered
that
considered
either
significant
or
unimportant.
Two
separate
prompts
developed
for
regimen
evaluation.
primary
outcome
was
if
identified
most
DDI
within
regimen.
Secondary
outcomes
included
rating
GPT-4’s
interaction
rationale,
relevance
ranking,
overall
recommendations.
Interrater
reliability
determined
using
kappa
statistic.
Results
intended
in
90%
provided
(27/30).
categorized
86%
highly
compared
53%
being
by
expert
opinion.
Inappropriate
potentially
causing
patient
harm
14%
responses
(2/14),
63%
contained
accurate
information
but
incomplete
(19/30).
Conclusions
While
demonstrated
promise
its
DDIs,
application
cases
remains
an
area
investigation.
Findings
from
may
assist
future
development
refinement
LLMs
queries
decision-making.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 8, 2024
Abstract
Background
Large
language
models
(LLMs)
have
shown
capability
in
diagnosing
complex
medical
cases
and
passing
licensing
exams,
but
to
date,
only
limited
evaluations
studied
how
LLMs
interpret,
analyze,
optimize
medication
regimens.
The
purpose
of
this
evaluation
was
test
four
ability
identify
errors
appropriate
interventions
on
patient
from
the
intensive
care
unit
(ICU).
Methods
A
series
eight
were
developed
by
critical
pharmacists
including
history
present
illness,
laboratory
values,
vital
signs,
Then,
(ChatGPT
(GPT-3.5),
ChatGPT
(GPT-4),
Claude2,
Llama2-7b)
prompted
develop
a
regimen
for
patient.
LLM
generated
regimens
then
reviewed
panel
seven
assess
presence
clinical
relevance.
For
each
recommended
LLM,
clinicians
asked
if
they
would
continue
medication,
perceived
medications
recommended,
life-threatening
choices,
rank
overall
agreement
5-point
Likert
scale.
Results
clinician
rated
therapies
between
55.8-67.9%
time.
Clinicians
1.57-4.29
per
regimen,
recommendations
15.0-55.3%
Level
1.85-2.67
LLMs.
Conclusions
demonstrated
potential
serve
as
decision
support
management
with
further
domain
specific
training;
however,
caution
should
be
used
when
employing
given
capabilities.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 14, 2024
Abstract
Background
Medication-related
harm
has
a
significant
impact
on
global
healthcare
costs
and
patient
outcomes,
accounting
for
deaths
in
4.3
per
1000
patients.
Generative
artificial
intelligence
(GenAI)
emerged
as
promising
tool
mitigating
risks
of
medication-related
harm.
In
particular,
large
language
models
(LLMs)
well-developed
generative
adversarial
networks
(GANs)
showing
promise
related
tasks.
This
review
aims
to
explore
the
scope
effectiveness
AI
reducing
harm,
identifying
existing
development
challenges
research.
Methods
We
searched
peer
reviewed
articles
PubMed,
Web
Science,
Embase,
Scopus
literature
published
from
January
2012
February
2024.
included
studies
focusing
or
application
risk
during
entire
medication
use
process.
excluded
using
traditional
methods
only,
those
unrelated
settings,
concerning
non-prescribed
uses
such
supplements.
Extracted
variables
study
characteristics,
model
specifics
performance,
any
outcome
evaluated.
Findings
A
total
2203
were
identified,
14
met
criteria
inclusion
into
final
review.
found
that
used
few
key
applications:
drug-drug
interaction
identification
prediction;
clinical
decision
support
pharmacovigilance.
While
performance
utility
these
varied,
they
generally
showed
areas
like
early
classification
adverse
drug
events
decision-making
management.
However,
no
tested
prospectively,
suggesting
need
further
investigation
integration
real-world
tools
improve
safety
outcomes
effectively.
Interpretation
shows
harms,
but
there
are
gaps
research
rigor
ethical
considerations.
Future
should
focus
creation
high-quality,
task-specific
benchmarking
datasets
implementation
outcomes.