medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 14, 2024
Abstract
Background
Medication-related
harm
has
a
significant
impact
on
global
healthcare
costs
and
patient
outcomes,
accounting
for
deaths
in
4.3
per
1000
patients.
Generative
artificial
intelligence
(GenAI)
emerged
as
promising
tool
mitigating
risks
of
medication-related
harm.
In
particular,
large
language
models
(LLMs)
well-developed
generative
adversarial
networks
(GANs)
showing
promise
related
tasks.
This
review
aims
to
explore
the
scope
effectiveness
AI
reducing
harm,
identifying
existing
development
challenges
research.
Methods
We
searched
peer
reviewed
articles
PubMed,
Web
Science,
Embase,
Scopus
literature
published
from
January
2012
February
2024.
included
studies
focusing
or
application
risk
during
entire
medication
use
process.
excluded
using
traditional
methods
only,
those
unrelated
settings,
concerning
non-prescribed
uses
such
supplements.
Extracted
variables
study
characteristics,
model
specifics
performance,
any
outcome
evaluated.
Findings
A
total
2203
were
identified,
14
met
criteria
inclusion
into
final
review.
found
that
used
few
key
applications:
drug-drug
interaction
identification
prediction;
clinical
decision
support
pharmacovigilance.
While
performance
utility
these
varied,
they
generally
showed
areas
like
early
classification
adverse
drug
events
decision-making
management.
However,
no
tested
prospectively,
suggesting
need
further
investigation
integration
real-world
tools
improve
safety
outcomes
effectively.
Interpretation
shows
harms,
but
there
are
gaps
research
rigor
ethical
considerations.
Future
should
focus
creation
high-quality,
task-specific
benchmarking
datasets
implementation
outcomes.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 12, 2024
Background:
Generative
Large
language
models
(LLMs)
represent
a
significant
advancement
in
natural
processing,
achieving
state-of-the-art
performance
across
various
tasks.
However,
their
application
clinical
settings
using
real
electronic
health
records
(EHRs)
is
still
rare
and
presents
numerous
challenges.
Objective:
This
study
aims
to
systematically
review
the
use
of
generative
LLMs,
effectiveness
relevant
techniques
patient
care-related
topics
involving
EHRs,
summarize
challenges
faced,
suggest
future
directions.
Methods:
A
Boolean
search
for
peer-reviewed
articles
was
conducted
on
May
19th,
2024
PubMed
Web
Science
include
research
published
since
2023,
which
one
month
after
release
ChatGPT.
The
results
were
deduplicated.
Multiple
reviewers,
including
biomedical
informaticians,
computer
scientists,
physician,
screened
publications
eligibility
data
extraction.
Only
studies
utilizing
LLMs
analyze
EHR
included.
We
summarized
prompt
engineering,
fine-tuning,
multimodal
data,
evaluation
matrices.
Additionally,
we
identified
current
applying
as
reported
by
included
proposed
Results:
initial
6,328
unique
studies,
with
76
screening.
Of
these,
67
(88.2%)
employed
zero-shot
prompting,
five
them
100%
accuracy
specific
Nine
used
advanced
prompting
strategies;
four
tested
these
strategies
experimentally,
finding
that
engineering
improved
performance,
noting
non-linear
relationship
between
number
examples
improvement.
Eight
explored
fine-tuning
all
improvements
tasks,
but
three
noted
potential
degradation
certain
two
utilized
LLM-based
decision-making
enabled
accurate
disease
diagnosis
prognosis.
55
different
metrics
22
purposes,
such
correctness,
completeness,
conciseness.
Two
investigated
LLM
bias,
detecting
no
bias
other
male
patients
received
more
appropriate
suggestions.
Six
hallucinations,
fabricating
names
structured
thyroid
ultrasound
reports.
Additional
not
limited
impersonal
tone
consultations,
made
uncomfortable,
difficulty
had
understanding
responses.
Conclusion:
Our
indicates
few
have
computational
enhance
performance.
diverse
highlight
need
standardization.
currently
cannot
replace
physicians
due
Frontiers in Artificial Intelligence,
Journal Year:
2025,
Volume and Issue:
7
Published: Jan. 9, 2025
Large
language
models
(LLMs)
have
demonstrated
impressive
performance
on
medical
licensing
and
diagnosis-related
exams.
However,
comparative
evaluations
to
optimize
LLM
ability
in
the
domain
of
comprehensive
medication
management
(CMM)
are
lacking.
The
purpose
this
evaluation
was
test
various
LLMs
optimization
strategies
critical
care
pharmacotherapy
questions
used
assessment
Doctor
Pharmacy
students.
In
a
analysis
using
219
multiple-choice
questions,
five
(GPT-3.5,
GPT-4,
Claude
2,
Llama2-7b
2-13b)
were
evaluated.
Each
queried
times
evaluate
primary
outcome
accuracy
(i.e.,
correctness).
Secondary
outcomes
included
variance,
impact
prompt
engineering
techniques
(e.g.,
chain-of-thought,
CoT)
training
customized
GPT
performance,
comparison
third
year
doctor
pharmacy
students
knowledge
recall
vs.
application
questions.
Accuracy
variance
compared
with
student's
t-test
compare
under
different
model
settings.
ChatGPT-4
exhibited
highest
(71.6%),
while
Llama2-13b
had
lowest
(0.070).
All
performed
more
accurately
ChatGPT-4:
87%
67%).
When
applied
ChatGPT-4,
few-shot
CoT
across
runs
improved
(77.4%
71.5%)
no
effect
variance.
Self-consistency
custom-trained
similar
CoT.
Overall
student
81%,
an
optimal
overall
73%.
Comparing
question
types,
six
equivalent
or
higher
than
self-consistency
students:
93%
84%),
but
achieved
all
68%
80%).
most
accurate
most.
Average
overall,
These
findings
support
need
for
future
type
output
needed.
Reliance
is
only
supported
recall-based
World Neurosurgery,
Journal Year:
2025,
Volume and Issue:
196, P. 123753 - 123753
Published: March 6, 2025
Artificial
intelligence
(AI)
continues
to
advance
in
healthcare,
offering
innovative
approaches
enhance
clinical
decision-making
and
patient
management.
Peripheral
nerve
surgery
poses
unique
challenges
due
the
complexity
of
cases
need
for
precise
diagnostic
therapeutic
strategies.
This
study
investigates
application
OpenAI's
generative
AI
model,
o1,
assisting
with
intricate
processes
peripheral
surgery.
Utilizing
advanced
prompt
engineering
techniques,
o1
was
configured
as
a
virtual
medical
assistant
(GPT-NS)
process
five
simulated
scenarios
modeled
after
real-world
cases.
The
guided
surgeons
through
history,
diagnostics,
treatment
planning,
culminating
case
summaries.
A
panel
specialists
residents
evaluated
AI's
performance
using
Likert
scale
across
seven
criteria.
GPT-NS
demonstrated
strong
capabilities,
achieving
an
average
score
4.3.
High
ratings
were
observed
understanding
issues
presentation
clarity.
However,
areas
improvement
noted
sequencing
recommendations.
Despite
lower
indicating
human
evaluators'
perception
their
superiority
over
handling
cases,
showed
promise
supportive
tool
practice.
As
LLM
(Large
Language
Model)
improve,
it
is
becoming
increasingly
important
that
absolute
experts
assess
accuracy
answers
ensure
reliable
clinically
sound
integration
into
healthcare
practices.
underscores
potential
augmenting
highly
specialized
fields
like
while
demonstrating
ongoing
importance
expertise.
Future
research
should
explore
ways
further
refine
capabilities
its
routine
surgical
workflows.
npj Digital Medicine,
Journal Year:
2025,
Volume and Issue:
8(1)
Published: March 28, 2025
Abstract
Medication-related
harm
has
a
significant
impact
on
global
healthcare
costs
and
patient
outcomes.
Generative
artificial
intelligence
(GenAI)
large
language
models
(LLM)
have
emerged
as
promising
tool
in
mitigating
risks
of
medication-related
harm.
This
review
evaluates
the
scope
effectiveness
GenAI
LLM
reducing
We
screened
4
databases
for
literature
published
from
1st
January
2012
to
15th
October
2024.
A
total
3988
articles
were
identified,
30
met
criteria
inclusion
into
final
review.
AI
LLMs
applied
three
key
applications:
drug-drug
interaction
identification
prediction,
clinical
decision
support,
pharmacovigilance.
While
performance
utility
these
varied,
they
generally
showed
promise
early
identification,
classification
adverse
drug
events,
supporting
decision-making
medication
management.
However,
no
studies
tested
prospectively,
suggesting
need
further
investigation
integration
real-world
application.
Life,
Journal Year:
2024,
Volume and Issue:
14(5), P. 646 - 646
Published: May 20, 2024
The
role
of
artificial
intelligence
(AI)
in
healthcare
is
evolving,
offering
promising
avenues
for
enhancing
clinical
decision
making
and
patient
management.
Limited
knowledge
about
lipedema
often
leads
to
patients
being
frequently
misdiagnosed
with
conditions
like
lymphedema
or
obesity
rather
than
correctly
identifying
lipedema.
Furthermore,
present
intricate
extensive
medical
histories,
resulting
significant
time
consumption
during
consultations.
AI
could,
therefore,
improve
the
management
these
patients.
This
research
investigates
utilization
OpenAI’s
Generative
Pre-Trained
Transformer
4
(GPT-4),
a
sophisticated
large
language
model
(LLM),
as
an
assistant
consultations
Six
simulated
scenarios
were
designed
mirror
typical
commonly
encountered
clinic.
GPT-4
was
tasked
conducting
interviews
gather
presenting
its
findings,
preliminary
diagnoses,
recommending
further
diagnostic
therapeutic
actions.
Advanced
prompt
engineering
techniques
employed
refine
efficacy,
relevance,
accuracy
GPT-4’s
responses.
A
panel
experts
treatment,
using
Likert
Scale,
evaluated
responses
across
six
key
criteria.
Scoring
ranged
from
1
(lowest)
5
(highest),
achieving
average
score
4.24,
indicating
good
reliability
applicability
setting.
study
one
initial
forays
into
applying
models
specific
scenarios,
such
It
demonstrates
potential
supporting
practices
emphasizes
continuing
importance
human
expertise
field,
despite
ongoing
technological
advancements.
BMC Medical Informatics and Decision Making,
Journal Year:
2024,
Volume and Issue:
24(1)
Published: Nov. 26, 2024
The
large
language
models
(LLMs),
most
notably
ChatGPT,
released
since
November
30,
2022,
have
prompted
shifting
attention
to
their
use
in
medicine,
particularly
for
supporting
clinical
decision-making.
However,
there
is
little
consensus
the
medical
community
on
how
LLM
performance
contexts
should
be
evaluated.
We
performed
a
literature
review
of
PubMed
identify
publications
between
December
1,
and
April
2024,
that
discussed
assessments
LLM-generated
diagnoses
or
treatment
plans.
selected
108
relevant
articles
from
analysis.
frequently
used
LLMs
were
GPT-3.5,
GPT-4,
Bard,
LLaMa/Alpaca-based
models,
Bing
Chat.
five
criteria
scoring
outputs
"accuracy",
"completeness",
"appropriateness",
"insight",
"consistency".
defining
high-quality
been
consistently
by
researchers
over
past
1.5
years.
identified
high
degree
variation
studies
reported
findings
assessed
performance.
Standardized
reporting
qualitative
evaluation
metrics
assess
quality
can
developed
facilitate
research
healthcare.
British Journal of Clinical Pharmacology,
Journal Year:
2024,
Volume and Issue:
90(3), P. 618 - 619
Published: Feb. 5, 2024
The
proper
use
of
medicines—ensuring
the
right
medicine
is
used
at
time
and
in
way—is
a
fundamental
principle
for
clinical
pharmacologists,
physicians,
nurses
other
healthcare
professionals
globally.
Despite
this
being
an
apparent
rule,
real-life
practice
often
reveals
that
errors
medical
service
provision
are
more
common
than
exceptions.
There
multiple
stages
treatment
where
can
lead
to
adverse
outcomes.
A
recent
large
retrospective
cohort
study
among
hospitalized
adults
transferred
intensive
care
unit
who
died
reported
23%
patients
experienced
diagnostic
error,
17.8%
these
cases,
contributed
temporary
harm,
permanent
harm
or
death.1
Previously,
were
considered
third
leading
cause
death,
following
cancer
heart
disease.2
However,
claim
itself
turned
out
be
erroneous,3
highlighting
irony
fragility
our
understanding
medicine.
In
context,
emergence
artificial
intelligence
(AI)
offers
promising
avenue
enhance
pharmacology.
AI
provides
potential
safeguard
against
inadvertent
medication
errors,
surpassing
what
human
education,
professionalism
continuous
attention
prescribing
achieve
alone.
This
inspired
us
organize
special
'holidAI'
themed
issue,
focusing
on
exciting
intersection
machine
learning
(ML),
pharmacotherapy.
We
have
received
numerous
high-quality
papers
demonstrating
strengths
weaknesses
ML
applied
pharmacology,
along
with
future
areas
their
application.
Delving
into
specific
applications,
Rubinic
et
al4
present
thought-provoking
concept
thematic
issue.
They
explore
vulnerability
language
models
(LLMs)
misuse
bioweapon
development.
Their
includes
literature
review,
examination
regulatory
documents
concerning
ethical
case
illustrating
manipulation
creating
harmful
substances.
authors
conclude
current
landscape
ill-equipped
address
challenges
posed
by
LLMs
suggest
dual
role
LLMs:
not
just
as
risks
but
also
tools
developing
countermeasures
novel
hazardous
History
teaches
such
threats
real
overlooked
due
lack
contemporary
education
topic.5
Ryan
al6
highlight
necessity
pharmacologists
understand
its
implementation
practice.
model
development
issues
surrounding
evaluation
deployment.
Bakkum
al7
investigate
diverse
inclusive
vignettes
education.
Using
ChatGPT
(OpenAI,
GPT
3.5),
they
generated
cases
various
assignments
shared
open
educational
resources,
balancing
trade-offs
approach.
findings
will
further
evaluated
through
scientific
research.
Exploring
integration
pharmacy
practice,
international
survey
conducted
Busch
al8
spanned
12
countries
revealed
predominantly
positive
attitudes
towards
undergraduate
students.
notable
finding
was
students
prior
coursework
felt
prepared
professional
application,
underscoring
need
enhanced
within
curricula.
comparison
between
providers
decision-making
process
benzodiazepine
deprescribing
focus
Buzancic
al.9
found
high
agreement
rate
(95%)
variations
different
criteria.
identified
important
limitations
AI,
including
ambiguities
inaccuracies,
supportive
tool
rather
decision-maker
practices.
field
nephrotoxicity
prediction,
Noda
al10
developed
individualized
prediction
administered
tacrolimus.
showed
improved
predictive
ability
over
traditional
concentration
thresholds,
indicating
higher
accuracy
identifying
high-risk
before
initiation.
approach
predicting
preventing
complications
maxillofacial
surgery
using
algorithm
proposed
Prazetina
al.11
protocol
randomized
controlled
trial.
methodology
aimed
optimize
hemodynamic
parameters
during
free
flap
patients.
Furthermore,
Pavlov
al12
provided
analysis
outcomes
failure
treated
sodium-glucose
co-transporter-2
inhibitors.
work
contrasts
results
obtained
methods
those
from
algorithms,
calling
scrutiny
critical
appraisal
findings.
Finally,
modelling
illicit
substance
abuse
patterns
age
groups
undertaken
Tummala
al.13
presented
detailed
could
inform
trial
design
pharmacometrics
disorder
treatments
future.
These
studies
collectively
underscore
transformative
reshaping
As
we
embrace
innovations,
must
remember
haste
adopting
technologies
should
astray.
Like
iconic
scene
Malcolm
Middle,
avoid
ourselves
unprepared
unforeseen
consequences
powerful
tools.
Dewey's
famous
line
show
reminds
us,
'The
now,
old
man',
already
live
once
distant
Both
equally
writing
manuscript.
None.