Purpose
The
COVID-19
pandemic
has
intensified
the
demand
and
use
of
healthcare
resources,
prompting
search
for
efficient
solutions
under
budgetary
constraints.
In
this
context,
increasing
artificial
intelligence
telemedicine
emerged
as
a
key
strategy
to
optimize
delivery
resources.
Consequently,
chatbots
have
innovative
tools
in
various
fields,
such
mental
health
patient
monitoring,
offering
therapeutic
conversations
early
interventions.
This
systematic
review
aims
explore
current
state
sector,
meticulously
evaluating
their
effectiveness,
practical
applications,
potential
benefits.
Methods
was
conducted
following
PRISMA
guidelines,
utilizing
three
databases,
including
PubMed,
Web
Science,
Scopus,
identify
relevant
studies
on
cost
over
past
5
years.
Results
Several
articles
were
identified
through
database
(
n
=
31).
chatbot
interventions
categorized
by
similar
types.
reviewed
highlight
diverse
applications
healthcare,
support,
medical
information,
appointment
management,
education,
lifestyle
changes,
demonstrating
significant
across
these
areas.
Conclusion
Furthermore,
there
are
challenges
regarding
implementation
chatbots,
compatibility
with
other
systems,
ethical
considerations
that
may
arise
different
settings.
Addressing
issues
will
be
essential
maximize
benefits
mitigate
risks,
ensure
equitable
access
innovations.
iScience,
Год журнала:
2024,
Номер
27(5), С. 109713 - 109713
Опубликована: Апрель 23, 2024
This
study
systematically
reviewed
the
application
of
large
language
models
(LLMs)
in
medicine,
analyzing
550
selected
studies
from
a
vast
literature
search.
LLMs
like
ChatGPT
transformed
healthcare
by
enhancing
diagnostics,
medical
writing,
education,
and
project
management.
They
assisted
drafting
documents,
creating
training
simulations,
streamlining
research
processes.
Despite
their
growing
utility
diagnosis
improving
doctor-patient
communication,
challenges
persisted,
including
limitations
contextual
understanding
risk
over-reliance.
The
surge
LLM-related
indicated
focus
on
patient
but
highlighted
need
for
careful
integration,
considering
validation,
ethical
concerns,
balance
with
traditional
practice.
Future
directions
suggested
multimodal
LLMs,
deeper
algorithmic
understanding,
ensuring
responsible,
effective
use
healthcare.
PLOS Digital Health,
Год журнала:
2024,
Номер
3(11), С. e0000651 - e0000651
Опубликована: Ноя. 7, 2024
Biases
in
medical
artificial
intelligence
(AI)
arise
and
compound
throughout
the
AI
lifecycle.
These
biases
can
have
significant
clinical
consequences,
especially
applications
that
involve
decision-making.
Left
unaddressed,
biased
lead
to
substandard
decisions
perpetuation
exacerbation
of
longstanding
healthcare
disparities.
We
discuss
potential
at
different
stages
development
pipeline
how
they
affect
algorithms
Bias
occur
data
features
labels,
model
evaluation,
deployment,
publication.
Insufficient
sample
sizes
for
certain
patient
groups
result
suboptimal
performance,
algorithm
underestimation,
clinically
unmeaningful
predictions.
Missing
findings
also
produce
behavior,
including
capturable
but
nonrandomly
missing
data,
such
as
diagnosis
codes,
is
not
usually
or
easily
captured,
social
determinants
health.
Expertly
annotated
labels
used
train
supervised
learning
models
may
reflect
implicit
cognitive
care
practices.
Overreliance
on
performance
metrics
during
obscure
bias
diminish
a
model's
utility.
When
applied
outside
training
cohort,
deteriorate
from
previous
validation
do
so
differentially
across
subgroups.
How
end
users
interact
with
deployed
solutions
introduce
bias.
Finally,
where
are
developed
published,
by
whom,
impacts
trajectories
priorities
future
development.
Solutions
mitigate
must
be
implemented
care,
which
include
collection
large
diverse
sets,
statistical
debiasing
methods,
thorough
emphasis
interpretability,
standardized
reporting
transparency
requirements.
Prior
real-world
implementation
settings,
rigorous
through
trials
critical
demonstrate
unbiased
application.
Addressing
crucial
ensuring
all
patients
benefit
equitably
AI.
Frontiers in Artificial Intelligence,
Год журнала:
2025,
Номер
7
Опубликована: Янв. 13, 2025
In
this
article,
we
introduce
a
sociolinguistic
perspective
on
language
modeling.
We
claim
that
models
in
general
are
inherently
modeling
varieties
of
,
and
consider
how
insight
can
inform
the
development
deployment
models.
begin
by
presenting
technical
definition
concept
variety
as
developed
sociolinguistics.
then
discuss
could
help
us
better
understand
five
basic
challenges
modeling:
social
bias,
domain
adaptation,
alignment,
change
scale
.
argue
to
maximize
performance
societal
value
it
is
important
carefully
compile
training
corpora
accurately
represent
specific
being
modeled,
drawing
theories,
methods,
descriptions
from
field
To
evaluate
the
hallucination
tendencies
of
state-of-the-art
language
models
is
crucial
for
improving
their
reliability
and
applicability
across
various
domains.
This
article
presents
a
comprehensive
evaluation
Google
Gemini
Kimi
using
HaluEval
benchmark,
focusing
on
key
performance
metrics
such
as
accuracy,
relevance,
coherence,
rate.
demonstrated
superior
performance,
particularly
in
maintaining
low
rates
high
contextual
while
Kimi,
though
robust,
showed
areas
needing
further
refinement.
The
study
highlights
importance
advanced
training
techniques
optimization
enhancing
model
efficiency
accuracy.
Practical
recommendations
future
development
are
provided,
emphasizing
need
continuous
improvement
rigorous
to
achieve
reliable
efficient
models.
The
evaluation
of
visual
hallucinations
in
multimodal
AI
models
is
novel
and
significant
because
it
addresses
a
critical
gap
understanding
how
systems
interpret
deceptive
inputs.
study
systematically
assessed
ChatGPT's
performance
on
synthetic
dataset
visually
non-deceptive
images,
employing
both
quantitative
qualitative
analysis.
Results
revealed
that
while
ChatGPT
achieved
high
accuracy
standard
recognition
tasks,
its
diminished
when
faced
with
highlighting
areas
for
further
improvement.
analysis
provided
insights
into
the
model's
underlying
mechanisms,
such
as
extensive
pretraining
sophisticated
integration
capabilities,
which
contribute
to
robustness
against
deceptions.
study's
findings
have
important
implications
development
more
reliable
robust
technologies,
offering
benchmark
future
evaluations
practical
guidelines
enhancing
systems.
European Journal of Investigation in Health Psychology and Education,
Год журнала:
2025,
Номер
15(1), С. 9 - 9
Опубликована: Янв. 18, 2025
Large
language
models
(LLMs)
offer
promising
possibilities
in
mental
health,
yet
their
ability
to
assess
disorders
and
recommend
treatments
remains
underexplored.
This
quantitative
cross-sectional
study
evaluated
four
LLMs
(Gemini
2.0
Flash
Experimental),
Claude
(Claude
3.5
Sonnet),
ChatGPT-3.5,
ChatGPT-4)
using
text
vignettes
representing
conditions
such
as
depression,
suicidal
ideation,
early
chronic
schizophrenia,
social
phobia,
PTSD.
Each
model’s
diagnostic
accuracy,
treatment
recommendations,
predicted
outcomes
were
compared
with
norms
established
by
health
professionals.
Findings
indicated
that
for
certain
conditions,
including
depression
PTSD,
like
ChatGPT-4
achieved
higher
accuracy
human
However,
more
complex
cases,
LLM
performance
varied,
achieving
only
55%
while
other
professionals
performed
better.
tended
suggest
a
broader
range
of
proactive
treatments,
whereas
recommended
targeted
psychiatric
consultations
specific
medications.
In
terms
outcome
predictions,
generally
optimistic
regarding
full
recovery,
especially
treatment,
lower
recovery
rates
partial
rates,
particularly
untreated
cases.
While
range,
conservative
highlight
the
need
professional
oversight.
provide
valuable
support
diagnostics
planning
but
cannot
replace
discretion.
Journal of Personalized Medicine,
Год журнала:
2025,
Номер
15(2), С. 45 - 45
Опубликована: Янв. 24, 2025
Background:
Large
language
models
(LLMs)
have
seen
a
significant
boost
recently
in
the
field
of
natural
processing
(NLP)
due
to
their
capabilities
analyzing
words.
These
autoregressive
prove
robust
classification
tasks
where
texts
need
be
analyzed
and
classified.
Objectives:
In
this
paper,
we
explore
power
base
LLMs
such
as
Generative
Pre-trained
Transformer
2
(GPT-2),
Bidirectional
Encoder
Representations
from
Transformers
(BERT),
Distill-BERT,
TinyBERT
diagnosing
acute
inflammations
urinary
bladder
nephritis
renal
pelvis.
Materials
Methods:
were
trained
tested
using
supervised
fine-tuning
(SFT)
on
dataset
120
examples
that
include
symptoms
may
indicate
occurrence
these
two
conditions.
Results:
By
employing
method
carefully
crafted
prompts
present
data,
demonstrate
feasibility
minimal
training
data
achieve
reasonable
diagnostic,
with
overall
testing
accuracies
100%,
94%,
79%,
for
GPT-2,
BERT,
TinyBERT,
respectively.
Diagnostics,
Год журнала:
2025,
Номер
15(5), С. 587 - 587
Опубликована: Фев. 28, 2025
Background:
Dupuytren's
fibroproliferative
disease
affecting
the
hand's
palmar
fascia
leads
to
progressive
finger
contractures
and
functional
limitations.
Management
of
this
condition
relies
heavily
on
expertise
hand
surgeons,
who
tailor
interventions
based
clinical
assessment.
With
growing
interest
in
artificial
intelligence
(AI)
medical
decision-making,
study
aims
evaluate
feasibility
integrating
AI
into
management
by
comparing
AI-generated
recommendations
with
those
expert
surgeons.
Methods:
This
multicentric
comparative
involved
three
experienced
surgeons
five
systems
(ChatGPT,
Gemini,
Perplexity,
DeepSeek,
Copilot).
Twenty-two
standardized
prompts
representing
various
scenarios
were
used
assess
decision-making.
Surgeons
provided
recommendations,
which
analyzed
for
concordance,
rationale,
predicted
outcomes.
Key
metrics
included
union
accuracy,
surgeon
agreement,
precision,
recall,
F1
scores.
The
also
evaluated
performance
unanimous
versus
non-unanimous
cases
inter-AI
agreements.
Results:
Gemini
ChatGPT
demonstrated
highest
accuracy
(86.4%
81.8%,
respectively),
while
Copilot
showed
lowest
(40.9%).
Surgeon
agreement
was
(45.5%)
(42.4%).
performed
better
(accuracy
up
92.0%)
than
as
low
35.0%).
Inter-AI
agreements
ranged
from
75.0%
(ChatGPT-Gemini)
48.0%
(DeepSeek-Copilot).
Precision,
scores
consistently
higher
other
systems.
Conclusions:
systems,
particularly
ChatGPT,
show
promise
aligning
surgical
especially
straightforward
cases.
However,
significant
variability
exists,
complex
scenarios.
should
be
viewed
complementary
judgment,
requiring
further
refinement
validation
integration
practice.
BMC Medical Informatics and Decision Making,
Год журнала:
2025,
Номер
25(1)
Опубликована: Март 7, 2025
Large
Language
Models
(LLMs),
advanced
AI
tools
based
on
transformer
architectures,
demonstrate
significant
potential
in
clinical
medicine
by
enhancing
decision
support,
diagnostics,
and
medical
education.
However,
their
integration
into
workflows
requires
rigorous
evaluation
to
ensure
reliability,
safety,
ethical
alignment.
This
systematic
review
examines
the
parameters
methodologies
applied
LLMs
medicine,
highlighting
capabilities,
limitations,
application
trends.
A
comprehensive
of
literature
was
conducted
across
PubMed,
Scopus,
Web
Science,
IEEE
Xplore,
arXiv
databases,
encompassing
both
peer-reviewed
preprint
studies.
Studies
were
screened
against
predefined
inclusion
exclusion
criteria
identify
original
research
evaluating
LLM
performance
contexts.
The
results
reveal
a
growing
interest
leveraging
settings,
with
761
studies
meeting
criteria.
While
general-domain
LLMs,
particularly
ChatGPT
GPT-4,
dominated
evaluations
(93.55%),
medical-domain
accounted
for
only
6.45%.
Accuracy
emerged
as
most
commonly
assessed
parameter
(21.78%).
Despite
these
advancements,
evidence
base
highlights
certain
limitations
biases
included
studies,
emphasizing
need
careful
interpretation
robust
frameworks.
exponential
growth
underscores
transformative
healthcare.
addressing
challenges
such
risks,
variability,
underrepresentation
critical
specialties
will
be
essential.
Future
efforts
should
prioritize
standardized
frameworks
safe,
effective,
equitable
practice.