medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 22, 2024
Abstract
Background
The
integration
of
large
language
models
(LLMs)
such
as
GPT-4
into
healthcare
presents
potential
benefits
and
challenges.
While
LLMs
have
shown
promise
in
applications
ranging
from
scientific
writing
to
personalized
medicine,
their
practical
utility
safety
clinical
settings
remain
under
scrutiny.
Concerns
about
accuracy,
ethical
considerations
bias
necessitate
rigorous
evaluation
these
technologies
against
established
medical
standards.
Objective
To
compare
the
completeness,
necessity,
dosage
accuracy
overall
type
2
diabetes
management
plans
created
by
with
those
devised
experts.
Methods
This
study
involved
a
comparative
analysis
using
anonymized
patient
records
setting
West
Bengal,
India.
Management
for
50
Type
patients
were
generated
three
blinded
These
evaluated
reference
plan
based
on
American
Diabetes
Society
guidelines.
Completeness,
necessity
quantified
an
error
score
was
assess
quality
plans.
also
assessed.
Results
indicated
that
experts’
had
fewer
missing
medications
compared
(p=0.008).
However,
included
unnecessary
(p=0.003).
No
significant
difference
observed
drug
dosages
(p=0.975).
scores
comparable
between
human
experts
(p=0.301).
Safety
issues
noted
16%
GPT-4,
highlighting
risks
associated
AI-generated
Conclusion
demonstrates
while
can
effectively
reduce
prescriptions,
it
does
not
yet
match
performance
terms
completeness
safety.
findings
support
use
supplementary
tools
healthcare,
underscoring
need
enhanced
algorithms
continuous
oversight
ensure
efficacy
AI
settings.
Further
research
is
necessary
improve
complex
environments.
Radiology,
Journal Year:
2025,
Volume and Issue:
314(1)
Published: Jan. 1, 2025
Artificial
intelligence
(AI)
offers
promising
solutions
for
many
steps
of
the
cardiac
imaging
workflow,
from
patient
and
test
selection
through
image
acquisition,
reconstruction,
interpretation,
extending
to
prognostication
reporting.
Despite
development
AI
algorithms,
tools
are
at
various
stages
face
challenges
clinical
implementation.
This
scientific
statement,
endorsed
by
several
societies
in
field,
provides
an
overview
current
landscape
applications
CT
MRI.
Each
section
is
organized
into
questions
statements
that
address
key
including
ethical,
legal,
environmental
sustainability
considerations.
A
technology
readiness
level
range
1
9
summarizes
maturity
reflects
progression
preliminary
research
document
aims
bridge
gap
between
burgeoning
developments
limited
JMIRx Med,
Journal Year:
2025,
Volume and Issue:
6, P. e65263 - e65263
Published: March 19, 2025
Rural
health
care
providers
face
unique
challenges
such
as
limited
specialist
access
and
high
patient
volumes,
making
accurate
diagnostic
support
tools
essential.
Large
language
models
like
GPT-3
have
demonstrated
potential
in
clinical
decision
but
remain
understudied
pediatric
differential
diagnosis.
This
study
aims
to
evaluate
the
accuracy
reliability
of
a
fine-tuned
model
compared
board-certified
pediatricians
rural
settings.
multicenter
retrospective
cohort
analyzed
500
encounters
(ages
0-18
years;
n=261,
52.2%
female)
from
organizations
Central
Louisiana
between
January
2020
December
2021.
The
(DaVinci
version)
was
using
OpenAI
application
programming
interface
trained
on
350
encounters,
with
150
reserved
for
testing.
Five
(mean
experience:
12,
SD
5.8
years)
provided
reference
standard
diagnoses.
Model
performance
assessed
accuracy,
sensitivity,
specificity,
subgroup
analyses.
achieved
an
87.3%
(131/150
cases),
sensitivity
85%
(95%
CI
82%-88%),
specificity
90%
87%-93%),
comparable
pediatricians'
91.3%
(137/150
cases;
P=.47).
Performance
consistent
across
age
groups
(0-5
years:
54/62,
87%;
6-12
47/53,
89%;
13-18
30/35,
86%)
common
complaints
(fever:
36/39,
92%;
abdominal
pain:
20/23,
87%).
For
rare
diagnoses
(n=20),
slightly
lower
(16/20,
80%)
(17/20,
85%;
P=.62).
demonstrates
that
can
provide
pediatricians,
particularly
presentations,
care.
Further
validation
diverse
populations
is
necessary
before
implementation.
Machine Learning and Knowledge Extraction,
Journal Year:
2024,
Volume and Issue:
6(4), P. 2355 - 2374
Published: Oct. 18, 2024
Artificial
Intelligence
(AI)
has
the
potential
to
revolutionise
medical
and
healthcare
sectors.
AI
related
technologies
could
significantly
address
some
supply-and-demand
challenges
in
system,
such
as
assistants,
chatbots
robots.
This
paper
focuses
on
tailoring
LLMs
data
utilising
a
Retrieval-Augmented
Generation
(RAG)
database
evaluate
their
performance
computationally
resource-constrained
environment.
Existing
studies
primarily
focus
fine-tuning
data,
but
this
combines
RAG
fine-tuned
models
compares
them
against
base
using
or
only
fine-tuning.
Open-source
(Flan-T5-Large,
LLaMA-2-7B,
Mistral-7B)
are
datasets
Meadow-MedQA
MedMCQA.
Experiments
reported
for
response
generation
multiple-choice
question
answering.
The
latter
uses
two
distinct
methodologies:
Type
A,
standard
answering
via
direct
choice
selection;
B,
language
probability
confidence
score
of
choices
available.
Results
domain
revealed
that
Fine-tuning
crucial
improved
performance,
methodology
A
outperforms
B.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: April 28, 2023
The
United
States
Medical
Licensing
Examination
(USMLE)
is
a
critical
step
in
assessing
the
competence
of
future
physicians,
yet
process
creating
exam
questions
and
study
materials
both
time-consuming
costly.
While
Large
Language
Models
(LLMs),
such
as
OpenAI’s
GPT-4,
have
demonstrated
proficiency
answering
medical
questions,
their
potential
generating
remains
underexplored.
This
presents
QUEST-AI,
novel
system
that
utilizes
LLMs
to
(1)
generate
USMLE-style
(2)
identify
flag
incorrect
(3)
correct
errors
flagged
questions.
We
evaluated
this
system’s
output
by
constructing
test
set
50
LLM-generated
mixed
with
human-generated
conducting
two-part
assessment
three
physicians
two
students.
assessors
attempted
distinguish
between
LLM
validity
content.
A
majority
generated
QUEST-AI
were
deemed
valid
panel
clinicians,
strong
correlations
performance
on
pioneering
application
education
could
significantly
increase
ease
efficiency
developing
content,
offering
cost-effective
accessible
alternative
for
preparation.
JACC Advances,
Journal Year:
2024,
Volume and Issue:
3(9), P. 101202 - 101202
Published: Aug. 28, 2024
Despite
the
potential
of
artificial
intelligence
(AI)
in
enhancing
cardiovascular
care,
its
integration
into
clinical
practice
is
limited
by
a
lack
evidence
on
effectiveness
with
respect
to
human
experts
or
gold
standard
practices
real-world
settings.
Applied Sciences,
Journal Year:
2025,
Volume and Issue:
15(2), P. 524 - 524
Published: Jan. 8, 2025
A
conversational
system
is
an
artificial
intelligence
application
designed
to
interact
with
users
in
natural
language,
providing
accurate
and
contextually
relevant
responses.
Building
such
systems
for
low-resource
languages
like
Swahili
presents
significant
challenges
due
the
limited
availability
of
large-scale
training
datasets.
This
paper
proposes
a
Retrieval-Augmented
Generation-based
address
these
improve
quality
AI.
The
leverages
fine-tuning,
where
models
are
trained
on
available
data,
combined
external
knowledge
retrieval
enhance
response
accuracy
fluency.
Four
models—mT5,
GPT-2,
mBART,
GPT-Neo—were
evaluated
using
metrics
as
BLEU,
METEOR,
Query
Performance,
inference
time.
Results
show
that
Generation
consistently
outperforms
fine-tuning
alone,
particularly
generating
detailed
appropriate
Among
tested
models,
mT5
demonstrated
best
performance,
achieving
BLEU
score
56.88%,
METEOR
72.72%,
Performance
84.34%,
while
maintaining
relevance
Although
introduces
slightly
longer
times,
its
ability
significantly
makes
it
effective
approach
systems.
study
highlights
potential
advance
AI
other
languages,
future
work
focusing
optimizing
efficiency
exploring
multilingual
applications.
Advances in computational intelligence and robotics book series,
Journal Year:
2025,
Volume and Issue:
unknown, P. 277 - 298
Published: April 24, 2025
Large
Language
Models
like
transformers
and
chat
GPT
have
brought
about
a
considerable
shift
in
the
healthcare
system
areas
support
for
clinical
decision,
education
of
patients,
diagnosis.
These
models
been
used
different
applications
such
as
Natural
Processing
(NLP),
medical
image
analysis,
Electronic
Health
Record
(EHR).
However,
involvement
LLMs
has
some
challenges
because
importance
information
that
makes
any
error
critical,
hence
rigorous
evaluation
is
required
to
prevent
error.
This
Chapter
provides
an
extensive
literature
review
LLMs'
use
demonstrate
how
these
may
contribute
profound
changes
improvements
processes
studies.
Besides
highlighting
many
beneficial
uses
LLMs,
this
paper
further
presents
ethical
issues
sustainability
data
privacy,
bias
necessity
adequate
validation.
UNSTRUCTURED
This
Viewpoint
proposes
a
robust
framework
for
developing
medical
chatbot
dedicated
to
radiotherapy
education,
emphasizing
accuracy,
reliability,
privacy,
ethics,
and
future
innovations.
By
analyzing
existing
research,
the
evaluates
performance
identifies
challenges
such
as
content
bias,
system
integration.
The
findings
highlight
opportunities
advancements
in
natural
language
processing,
personalized
learning,
immersive
technologies.
When
designed
with
focus
on
ethical
standards
large
model–based
chatbots
could
significantly
impact
education
health
care
delivery,
positioning
them
valuable
tools
developments
globally.