Purpose
The
COVID-19
pandemic
has
intensified
the
demand
and
use
of
healthcare
resources,
prompting
search
for
efficient
solutions
under
budgetary
constraints.
In
this
context,
increasing
artificial
intelligence
telemedicine
emerged
as
a
key
strategy
to
optimize
delivery
resources.
Consequently,
chatbots
have
innovative
tools
in
various
fields,
such
mental
health
patient
monitoring,
offering
therapeutic
conversations
early
interventions.
This
systematic
review
aims
explore
current
state
sector,
meticulously
evaluating
their
effectiveness,
practical
applications,
potential
benefits.
Methods
was
conducted
following
PRISMA
guidelines,
utilizing
three
databases,
including
PubMed,
Web
Science,
Scopus,
identify
relevant
studies
on
cost
over
past
5
years.
Results
Several
articles
were
identified
through
database
(
n
=
31).
chatbot
interventions
categorized
by
similar
types.
reviewed
highlight
diverse
applications
healthcare,
support,
medical
information,
appointment
management,
education,
lifestyle
changes,
demonstrating
significant
across
these
areas.
Conclusion
Furthermore,
there
are
challenges
regarding
implementation
chatbots,
compatibility
with
other
systems,
ethical
considerations
that
may
arise
different
settings.
Addressing
issues
will
be
essential
maximize
benefits
mitigate
risks,
ensure
equitable
access
innovations.
Applied Sciences,
Год журнала:
2025,
Номер
15(4), С. 1796 - 1796
Опубликована: Фев. 10, 2025
The
rapid
advancement
of
large
language
models
(LLMs)
and
vision-language
(VLMs)
holds
enormous
promise
across
industries,
including
healthcare
but
hospitals
face
unique
barriers,
such
as
stringent
privacy
regulations,
heterogeneous
IT
infrastructures,
limited
customization.
To
address
these
challenges,
we
present
the
joint
AI
versatile
implementation
system
chat
(JAVIS
chat),
an
open-source
framework
for
deploying
LLMs
VLMs
within
secure
hospital
networks.
JAVIS
features
a
modular
architecture,
real-time
feedback
mechanisms,
customizable
components,
scalable
containerized
workflows.
It
integrates
Ray
distributed
computing
vLLM
optimized
model
inference,
delivering
smooth
scaling
from
single
workstations
to
hospital-wide
systems.
consistently
demonstrates
robust
scalability
significantly
reduces
response
times
on
legacy
servers
through
Ray-managed
multiple-instance
models,
operating
seamlessly
diverse
hardware
configurations
enabling
departmental
By
ensuring
compliance
with
global
data
protection
laws
solely
closed
networks,
safeguards
patient
while
facilitating
adoption
in
clinical
This
paradigm
shift
supports
care
operational
efficiency
by
bridging
potential
utility,
future
developments
speech-to-text
integration,
further
enhancing
its
versatility.
npj Digital Medicine,
Год журнала:
2025,
Номер
8(1)
Опубликована: Фев. 11, 2025
Clinical
notes
recorded
during
a
patient's
perioperative
journey
holds
immense
informational
value.
Advances
in
large
language
models
(LLMs)
offer
opportunities
for
bridging
this
gap.
Using
84,875
preoperative
and
its
associated
surgical
cases
from
2018
to
2021,
we
examine
the
performance
of
LLMs
predicting
six
postoperative
risks
using
various
fine-tuning
strategies.
Pretrained
outperformed
traditional
word
embeddings
by
an
absolute
AUROC
38.3%
AUPRC
33.2%.
Self-supervised
further
improved
3.2%
1.5%.
Incorporating
labels
into
training
increased
1.8%
2%.
The
highest
was
achieved
with
unified
foundation
model,
improvements
3.6%
2.6%
compared
self-supervision,
highlighting
foundational
capabilities
risks,
which
could
be
potentially
beneficial
when
deployed
care.
Frontiers in Public Health,
Год журнала:
2025,
Номер
13
Опубликована: Фев. 20, 2025
This
paper
introduces
an
intelligent
question-answering
system
designed
to
deliver
personalized
medical
information
diabetic
patients.
By
integrating
large
language
models
with
knowledge
graphs,
the
aims
provide
more
accurate
and
contextually
relevant
guidance,
addressing
limitations
of
traditional
healthcare
systems
in
handling
complex
queries.
The
combines
a
Neo4j-based
graph
Baichuan2-13B
Qwen2.5-7B
models.
To
enhance
performance,
Low-Rank
Adaptation
(LoRA)
prompt-based
learning
techniques
are
applied.
These
methods
improve
system's
semantic
understanding
ability
generate
high-quality
responses.
performance
is
evaluated
using
entity
recognition
intent
classification
tasks.
achieves
85.91%
precision
88.55%
classification.
integration
structured
significantly
improves
accuracy
clinical
relevance,
enhancing
its
responses
for
diabetes
management.
study
demonstrates
effectiveness
graphs
systems.
proposed
approach
offers
promising
framework
advancing
management
other
applications,
providing
solid
foundation
future
interventions.
The
study
evaluates
the
appropriateness
and
reliability
of
thyroid
nodule
cancer
risk
assessment
recommendations
provided
by
large
language
models
(LLMs)
ChatGPT,
Gemini,
Claude
in
alignment
with
clinical
guidelines
from
American
Thyroid
Association
(ATA)
National
Comprehensive
Cancer
Network
(NCCN).
A
team
comprising
a
medical
imaging
informatics
specialist
two
radiologists
developed
24
clinically
relevant
questions
based
on
ATA
NCCN
guidelines.
readability
AI-generated
responses
was
evaluated
using
Readability
Scoring
System.
total
322
training
or
practice
United
States,
recruited
via
Amazon
Mechanical
Turk,
assessed
AI
responses.
Quantitative
analysis
SPSS
measured
recommendations,
while
qualitative
feedback
analyzed
through
Dedoose.
compared
performance
three
providing
appropriate
recommendations.
Paired
samples
t-tests
showed
no
statistically
significant
differences
overall
among
models.
achieved
highest
mean
score
(21.84),
followed
closely
ChatGPT
(21.83)
Gemini
(21.47).
Inappropriate
response
rates
did
not
differ
significantly,
though
trend
toward
higher
rates.
However,
accuracy
(92.5%)
responses,
(92.1%)
(90.4%).
Qualitative
highlighted
ChatGPT's
clarity
structure,
Gemini's
accessibility
but
shallowness,
Claude's
organization
occasional
divergence
focus.
LLMs
like
show
potential
supporting
require
oversight
to
ensure
performed
nearly
identically
overall,
having
score,
difference
marginal.
Further
development
is
necessary
enhance
their
for
use.
The
objective
is
to
provide
an
overview
of
the
application
large
language
models
(LLMs)
in
healthcare
by
employing
a
bibliometric
analysis
methodology.
We
performed
comprehensive
search
for
peer-reviewed
English-language
articles
using
PubMed
and
Web
Science.
selected
were
subsequently
clustered
analyzed
textually,
with
focus
on
lexical
co-occurrences,
country-level
inter-author
collaborations,
other
relevant
factors.
This
textual
produced
high-level
concept
maps
that
illustrate
specific
terms
their
interconnections.
Our
final
sample
comprised
371
journal
articles.
study
revealed
sharp
rise
number
publications
related
LLMs
healthcare.
However,
development
geographically
imbalanced,
higher
concentration
originating
from
developed
countries
like
United
States,
Italy,
Germany,
which
also
exhibit
strong
inter-country
collaboration.
are
applied
across
various
specialties,
researchers
investigating
use
medical
education,
diagnosis,
treatment,
administrative
reporting,
enhancing
doctor-patient
communication.
Nonetheless,
significant
concerns
persist
regarding
risks
ethical
implications
LLMs,
including
potential
gender
racial
bias,
as
well
lack
transparency
training
datasets,
can
lead
inaccurate
or
misleading
responses.
While
promising,
widespread
adoption
practice
requires
further
improvements
standardization
accuracy.
It
critical
establish
clear
accountability
guidelines,
develop
robust
regulatory
framework,
ensure
datasets
based
evidence-based
sources
minimize
risk
reliable
use.
Pediatric Nephrology,
Год журнала:
2025,
Номер
unknown
Опубликована: Март 5, 2025
Artificial
intelligence
(AI)
has
emerged
as
a
transformative
tool
in
healthcare,
offering
significant
advancements
providing
accurate
clinical
information.
However,
the
performance
and
applicability
of
AI
models
specialized
fields
such
pediatric
nephrology
remain
underexplored.
This
study
is
aimed
at
evaluating
ability
two
AI-based
language
models,
GPT-3.5
GPT-4,
to
provide
reliable
information
nephrology.
The
were
evaluated
on
four
criteria:
accuracy,
scope,
patient
friendliness,
applicability.
Forty
specialists
with
≥
5
years
experience
rated
GPT-4
responses
10
questions
using
1-5
scale
via
Google
Forms.
Ethical
approval
was
obtained,
informed
consent
secured
from
all
participants.
Both
demonstrated
comparable
across
criteria,
no
statistically
differences
observed
(p
>
0.05).
exhibited
slightly
higher
mean
scores
parameters,
but
negligible
(Cohen's
d
<
0.1
for
criteria).
Reliability
analysis
revealed
low
internal
consistency
both
(Cronbach's
alpha
ranged
between
0.019
0.162).
Correlation
indicated
relationship
participants'
professional
their
evaluations
(correlation
coefficients
-
0.026
0.074).
While
provided
foundational
level
support,
neither
model
superior
addressing
unique
challenges
findings
highlight
need
domain-specific
training
integration
updated
guidelines
enhance
reliability
fields.
underscores
potential
while
emphasizing
importance
human
oversight
further
refinements
applications.