BMC Medical Informatics and Decision Making,
Journal Year:
2025,
Volume and Issue:
25(1)
Published: April 14, 2025
The
integration
of
artificial
intelligence
(AI)
in
healthcare
has
rapidly
expanded,
particularly
clinical
decision-making.
Large
language
models
(LLMs)
such
as
GPT-4
and
GPT-3.5
have
shown
potential
various
medical
applications,
including
diagnostics
treatment
planning.
However,
their
efficacy
specialized
fields
like
sports
surgery
physiotherapy
remains
underexplored.
This
study
aims
to
compare
the
performance
decision-making
within
these
domains
using
a
structured
assessment
approach.
cross-sectional
included
56
professionals
specializing
physiotherapy.
Participants
evaluated
10
standardized
scenarios
generated
by
5-point
Likert
scale.
encompassed
common
musculoskeletal
conditions,
assessments
focused
on
diagnostic
accuracy,
appropriateness,
surgical
technique
detailing,
rehabilitation
plan
suitability.
Data
were
collected
anonymously
via
Google
Forms.
Statistical
analysis
paired
t-tests
for
direct
model
comparisons,
one-way
ANOVA
assess
across
multiple
criteria,
Cronbach's
alpha
evaluate
inter-rater
reliability.
significantly
outperformed
all
criteria.
Paired
t-test
results
(t(55)
=
10.45,
p
<
0.001)
demonstrated
that
provided
more
accurate
diagnoses,
superior
plans,
detailed
recommendations.
confirmed
higher
suitability
planning
(F(1,
55)
35.22,
protocols
32.10,
0.001).
values
indicated
internal
consistency
(α
0.478)
compared
0.234),
reflecting
reliable
performance.
demonstrates
These
findings
suggest
advanced
AI
can
aid
planning,
strategies.
should
function
decision-support
tool
rather
than
substitute
expert
judgment.
Future
studies
explore
into
real-world
workflows,
validate
larger
datasets,
additional
beyond
GPT
series.
Cureus,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 17, 2025
Background
The
integration
of
large
language
models
(LLMs)
such
as
GPT-4
into
healthcare
presents
potential
benefits
and
challenges.
While
LLMs
show
promise
in
applications
ranging
from
scientific
writing
to
personalized
medicine,
their
practical
utility
safety
clinical
settings
remain
under
scrutiny.
Concerns
about
accuracy,
ethical
considerations,
bias
necessitate
rigorous
evaluation
these
technologies
against
established
medical
standards.
Methods
This
study
involved
a
comparative
analysis
using
anonymized
patient
records
setting
the
state
West
Bengal,
India.
Management
plans
for
50
patients
with
type
2
diabetes
mellitus
were
generated
by
three
physicians,
who
blinded
each
other's
responses.
These
evaluated
reference
management
plan
based
on
American
Diabetes
Society
guidelines.
Completeness,
necessity,
dosage
accuracy
quantified
Prescribing
Error
Score
was
devised
assess
quality
plans.
also
assessed.
Results
indicated
that
physicians'
had
fewer
missing
medications
compared
those
(p=0.008).
However,
GPT-4-generated
included
unnecessary
(p=0.003).
No
significant
difference
observed
drug
dosages
(p=0.975).
overall
error
scores
comparable
between
physicians
(p=0.301).
Safety
issues
noted
16%
GPT-4,
highlighting
risks
associated
AI-generated
Conclusion
demonstrates
while
can
effectively
reduce
prescriptions,
it
does
not
yet
match
performance
terms
completeness.
findings
support
use
supplementary
tools
healthcare,
need
enhanced
algorithms
continuous
human
oversight
ensure
efficacy
artificial
intelligence
settings.
Deleted Journal,
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 21, 2025
Abstract
This
study
aims
to
investigate
the
feasibility,
usability,
and
effectiveness
of
a
Retrieval-Augmented
Generation
(RAG)-powered
Patient
Information
Assistant
(PIA)
chatbot
for
pre-CT
information
counseling
compared
standard
physician
consultation
informed
consent
process.
prospective
comparative
included
86
patients
scheduled
CT
imaging
between
November
December
2024.
Patients
were
randomly
assigned
either
PIA
group
(
n
=
43),
who
received
via
chat
app,
or
control
with
doctor-led
consultation.
satisfaction,
clarity
comprehension,
concerns
assessed
using
six
ten-point
Likert-scale
questions
after
doctor’s
Additionally,
duration
was
measured,
asked
about
their
preference
consultation,
while
two
radiologists
rated
each
in
five
categories.
Both
groups
reported
similarly
high
ratings
(PIA:
8.64
±
1.69;
control:
8.86
1.28;
p
0.82)
overall
comprehension
8.81
1.40;
8.93
1.61;
0.35).
However,
doctor
showed
greater
alleviating
patient
(8.30
2.63
versus
6.46
3.29;
0.003).
The
demonstrated
significantly
shorter
subsequent
times
(median:
120
s
[interquartile
range
(IQR):
100–140]
195
[IQR:
170–220];
0.04).
quality,
scientific
clinical
evidence,
usefulness
relevance,
consistency,
up-to-dateness
high.
RAG-powered
effectively
provided
reducing
time.
While
both
methods
achieved
comparable
satisfaction
physicians
more
effective
at
addressing
worries
regarding
examination.
BMC Medical Informatics and Decision Making,
Journal Year:
2025,
Volume and Issue:
25(1)
Published: April 14, 2025
The
integration
of
artificial
intelligence
(AI)
in
healthcare
has
rapidly
expanded,
particularly
clinical
decision-making.
Large
language
models
(LLMs)
such
as
GPT-4
and
GPT-3.5
have
shown
potential
various
medical
applications,
including
diagnostics
treatment
planning.
However,
their
efficacy
specialized
fields
like
sports
surgery
physiotherapy
remains
underexplored.
This
study
aims
to
compare
the
performance
decision-making
within
these
domains
using
a
structured
assessment
approach.
cross-sectional
included
56
professionals
specializing
physiotherapy.
Participants
evaluated
10
standardized
scenarios
generated
by
5-point
Likert
scale.
encompassed
common
musculoskeletal
conditions,
assessments
focused
on
diagnostic
accuracy,
appropriateness,
surgical
technique
detailing,
rehabilitation
plan
suitability.
Data
were
collected
anonymously
via
Google
Forms.
Statistical
analysis
paired
t-tests
for
direct
model
comparisons,
one-way
ANOVA
assess
across
multiple
criteria,
Cronbach's
alpha
evaluate
inter-rater
reliability.
significantly
outperformed
all
criteria.
Paired
t-test
results
(t(55)
=
10.45,
p
<
0.001)
demonstrated
that
provided
more
accurate
diagnoses,
superior
plans,
detailed
recommendations.
confirmed
higher
suitability
planning
(F(1,
55)
35.22,
protocols
32.10,
0.001).
values
indicated
internal
consistency
(α
0.478)
compared
0.234),
reflecting
reliable
performance.
demonstrates
These
findings
suggest
advanced
AI
can
aid
planning,
strategies.
should
function
decision-support
tool
rather
than
substitute
expert
judgment.
Future
studies
explore
into
real-world
workflows,
validate
larger
datasets,
additional
beyond
GPT
series.