APOS Trends in Orthodontics,
Journal Year:
2025,
Volume and Issue:
0, P. 1 - 6
Published: Jan. 6, 2025
Objectives:
The
objective
of
this
study
was
to
conduct
a
comprehensive
and
patient-centered
evaluation
chatbot
responses
within
the
field
orthodontics,
comparing
three
prominent
platforms:
ChatGPT-4,
Microsoft
Copilot,
Google
Gemini.
Material
Methods:
Twenty
orthodontic-related
queries
were
presented
Gemini
by
ten
orthodontic
experts.
To
assess
accuracy
completeness
responses,
Likert
scale
(LS)
employed,
while
clarity
evaluated
using
Global
Quality
Scale
(GQS).
Statistical
analyses
included
One-way
analysis
variance
post-hoc
Tukey
tests
data,
Pearson
correlation
test
used
determine
relationship
between
variables.
Results:
results
indicated
that
ChatGPT-4
(1.69
±
0.10)
Copilot
(1.68
achieved
significantly
higher
LS
scores
compared
(2.27
0.53)
(
P
<
0.05).
However,
GQS
scores,
which
4.01
0.31
for
3.92
0.60
Gemini,
4.09
0.15
showed
no
significant
differences
among
chatbots
>
Conclusion:
While
these
generally
handle
basic
well,
they
show
in
complex
scenarios.
outperform
accurately
addressing
scenario-based
questions,
highlighting
importance
strong
language
comprehension,
knowledge
access,
advanced
algorithms.
This
underscores
need
continued
improvements
technology.
Journal of the American Pharmacists Association,
Journal Year:
2023,
Volume and Issue:
64(2), P. 422 - 428.e8
Published: Dec. 2, 2023
Abstract
Background
The
use
of
artificial
intelligence
(AI)
to
optimize
medication
therapy
management
(MTM)
in
identifying
drug
interactions
may
potentially
improve
MTM
efficiency.
ChatGPT,
an
AI
language
model,
be
applied
identify
interventions
by
integrating
patient
and
databases.
ChatGPT
has
been
shown
effective
other
areas
clinical
medicine,
from
diagnosis
management.
However,
ChatGPT's
ability
manage
related
activities
is
little
known.
Objectives
To
evaluate
the
effectiveness
services
simple,
complex,
very
complex
cases
understand
contributions
MTM.
Methods
Two
pharmacists
rated
validated
difficulty
complex.
response
was
assessed
based
on
3
criteria:
interactions,
precision
recommending
alternatives,
appropriateness
devising
plans.
accuracy
responses
compared
them
actual
answers
for
each
complexity
level.
Results
4.0
accurately
solved
39
out
(100
%)
cases.
successfully
identified
provided
recommendations
formulated
general
plans,
but
it
did
not
recommend
specific
dosages.
Results
suggest
can
assist
formulating
plans
overall
Conclusion
application
potential
enhance
safety
involvement,
lower
healthcare
costs,
providers
interactions.
Future
utilize
models
such
as
care.
future
pharmacy
profession
will
depend
how
field
responds
changing
need
care
optimized
automation.
Interactive Journal of Medical Research,
Journal Year:
2024,
Volume and Issue:
13, P. e54704 - e54704
Published: Jan. 26, 2024
Background
Adherence
to
evidence-based
practice
is
indispensable
in
health
care.
Recently,
the
utility
of
generative
artificial
intelligence
(AI)
models
care
has
been
evaluated
extensively.
However,
lack
consensus
guidelines
on
design
and
reporting
findings
these
studies
poses
a
challenge
for
interpretation
synthesis
evidence.
Objective
This
study
aimed
develop
preliminary
checklist
standardize
AI-based
education
practice.
Methods
A
literature
review
was
conducted
Scopus,
PubMed,
Google
Scholar.
Published
records
with
“ChatGPT,”
“Bing,”
or
“Bard”
title
were
retrieved.
Careful
examination
methodologies
employed
included
identify
common
pertinent
themes
possible
gaps
reporting.
panel
discussion
held
establish
unified
thorough
AI
The
finalized
used
evaluate
by
2
independent
raters.
Cohen
κ
as
method
interrater
reliability.
Results
final
data
set
that
formed
basis
theme
identification
analysis
comprised
total
34
records.
9
collectively
referred
METRICS
(Model,
Evaluation,
Timing,
Range/Randomization,
Individual
factors,
Count,
Specificity
prompts
language).
Their
details
are
follows:
(1)
Model
its
exact
settings;
(2)
Evaluation
approach
generated
content;
(3)
Timing
testing
model;
(4)
Transparency
source;
(5)
Range
tested
topics;
(6)
Randomization
selecting
queries;
(7)
factors
queries
reliability;
(8)
Count
executed
test
(9)
language
used.
overall
mean
score
3.0
(SD
0.58).
acceptable,
range
0.558
0.962
(P<.001
items).
With
classification
per
item,
highest
average
recorded
“Model”
followed
“Specificity”
while
lowest
scores
“Randomization”
item
(classified
suboptimal)
“Individual
factors”
satisfactory).
Conclusions
can
facilitate
guiding
researchers
toward
best
practices
results.
highlight
need
standardized
algorithms
care,
considering
variability
observed
proposed
could
be
helpful
base
universally
accepted
which
swiftly
evolving
research
topic.
Journal of Educational Computing Research,
Journal Year:
2024,
Volume and Issue:
62(6), P. 1509 - 1537
Published: April 17, 2024
The
field
of
computer-assisted
language
learning
has
recently
brought
about
a
notable
change
in
English
as
Foreign
Language
(EFL)
writing.
Starting
from
October
2022,
students
across
different
academic
fields
have
increasingly
depended
on
ChatGPT-4
helpful
resource
for
addressing
particular
challenges
EFL
This
study
aimed
to
investigate
the
use
and
acceptance
students’
To
this
end,
an
experiment
was
conducted
with
76
undergraduate
private
school
Algeria.
participants
were
randomly
allocated
into
two
groups:
experimental
group
(n
=
37)
control
39).
Additionally,
questionnaire
administered.
results
showed
that
(EG)
outperformed
(CG).
Besides,
findings
revealed
EG
post-test
their
pre-test
scores.
also
substantial
improvements
EG’s
views
perceived
usefulness,
ease
use,
attitudes,
behavioral
intention.
According
results,
helped
boost
students'
writing
skills,
which
ultimately
led
acceptance.
Students
appear
particularly
interested
because
its
potential
usefulness
putting
what
they
learn
practice.
Some
suggestions
recommendations
provided.
The Oncologist,
Journal Year:
2024,
Volume and Issue:
29(5), P. 407 - 414
Published: Feb. 3, 2024
Abstract
Background
The
capability
of
large
language
models
(LLMs)
to
understand
and
generate
human-readable
text
has
prompted
the
investigation
their
potential
as
educational
management
tools
for
patients
with
cancer
healthcare
providers.
Materials
Methods
We
conducted
a
cross-sectional
study
aimed
at
evaluating
ability
ChatGPT-4,
ChatGPT-3.5,
Google
Bard
answer
questions
related
4
domains
immuno-oncology
(Mechanisms,
Indications,
Toxicities,
Prognosis).
generated
60
open-ended
(15
each
section).
Questions
were
manually
submitted
LLMs,
responses
collected
on
June
30,
2023.
Two
reviewers
evaluated
answers
independently.
Results
ChatGPT-4
ChatGPT-3.5
answered
all
questions,
whereas
only
53.3%
(P
<
.0001).
number
reproducible
was
higher
(95%)
ChatGPT3.5
(88.3%)
than
(50%)
In
terms
accuracy,
deemed
fully
correct
75.4%,
58.5%,
43.8%
Bard,
respectively
=
.03).
Furthermore,
highly
relevant
71.9%,
77.4%,
.04).
Regarding
readability,
readable
(98.1%)
(100%)
compared
(87.5%)
.02).
Conclusion
are
potentially
powerful
in
immuno-oncology,
demonstrated
relatively
poorer
performance.
However,
risk
inaccuracy
or
incompleteness
evident
3
highlighting
importance
expert-driven
verification
outputs
returned
by
these
technologies.
Journal of Personalized Medicine,
Journal Year:
2024,
Volume and Issue:
14(1), P. 107 - 107
Published: Jan. 18, 2024
Accurate
information
regarding
oxalate
levels
in
foods
is
essential
for
managing
patients
with
hyperoxaluria,
nephropathy,
or
those
susceptible
to
calcium
stones.
This
study
aimed
assess
the
reliability
of
chatbots
categorizing
based
on
their
content.
We
assessed
accuracy
ChatGPT-3.5,
ChatGPT-4,
Bard
AI,
and
Bing
Chat
classify
dietary
content
per
serving
into
low
(<5
mg),
moderate
(5–8
high
(>8
mg)
categories.
A
total
539
food
items
were
processed
through
each
chatbot.
The
was
compared
between
stratified
by
AI
had
highest
84%,
followed
(60%),
GPT-4
(52%),
GPT-3.5
(49%)
(p
<
0.001).
There
a
significant
pairwise
difference
chatbots,
except
=
0.30).
all
decreased
higher
degree
categories
but
remained
having
accuracy,
regardless
considerable
variation
classifying
consistently
showed
Chat,
GPT-4,
GPT-3.5.
These
results
underline
potential
management
at-risk
patient
groups
need
enhancements
chatbot
algorithms
clinical
accuracy.
JAMA Network Open,
Journal Year:
2025,
Volume and Issue:
8(2), P. e2457879 - e2457879
Published: Feb. 4, 2025
Importance
There
is
much
interest
in
the
clinical
integration
of
large
language
models
(LLMs)
health
care.
Many
studies
have
assessed
ability
LLMs
to
provide
advice,
but
quality
their
reporting
uncertain.
Objective
To
perform
a
systematic
review
examine
variability
among
peer-reviewed
evaluating
performance
generative
artificial
intelligence
(AI)–driven
chatbots
for
summarizing
evidence
and
providing
advice
inform
development
Chatbot
Assessment
Reporting
Tool
(CHART).
Evidence
Review
A
search
MEDLINE
via
Ovid,
Embase
Elsevier,
Web
Science
from
inception
October
27,
2023,
was
conducted
with
help
sciences
librarian
yield
7752
articles.
Two
reviewers
screened
articles
by
title
abstract
followed
full-text
identify
primary
accuracy
AI-driven
(chatbot
studies).
then
performed
data
extraction
137
eligible
studies.
Findings
total
were
included.
Studies
examined
topics
surgery
(55
[40.1%]),
medicine
(51
[37.2%]),
care
(13
[9.5%]).
focused
on
treatment
(91
[66.4%]),
diagnosis
(60
[43.8%]),
or
disease
prevention
(29
[21.2%]).
Most
(136
[99.3%])
evaluated
inaccessible,
closed-source
did
not
enough
information
version
LLM
under
evaluation.
All
lacked
sufficient
description
characteristics,
including
temperature,
token
length,
fine-tuning
availability,
layers,
other
details.
describe
prompt
engineering
phase
study.
The
date
querying
reported
54
(39.4%)
(89
[65.0%])
used
subjective
means
define
successful
chatbot,
while
less
than
one-third
addressed
ethical,
regulatory,
patient
safety
implications
LLMs.
Conclusions
Relevance
In
this
chatbot
studies,
heterogeneous
may
CHART
standards.
Ethical,
considerations
are
crucial
as
grows
The
current
standard
method
for
the
analysis
of
potential
drug–drug
interactions
(pDDIs)
is
time‐consuming
and
includes
use
multiple
clinical
decision
support
systems
(CDSSs)
interpretation
by
healthcare
professionals.
With
emergence
large
language
models
developed
with
artificial
intelligence,
an
interesting
alternative
arose.
This
retrospective
study
included
30
patients
polypharmacy,
who
underwent
a
pDDI
between
October
2022
August
2023,
compared
performance
Chat
GPT
established
CDSSs
(MediQ®,
Lexicomp®,
Micromedex®)
in
pDDIs.
A
multidisciplinary
team
interpreted
obtained
results
decided
upon
relevance
assigned
severity
grades
using
three
categories:
(i)
contraindicated,
(ii)
severe,
(iii)
moderate.
expert
review
identified
total
280
clinically
relevant
pDDIs
(3
contraindications,
13
264
moderate)
CDSSs,
80
(2
5
73
GPT.
almost
entirely
neglected
risk
to
QTc
prolongation
(85
vs.
8),
which
could
also
not
be
sufficiently
improved
specific
prompt.
To
assess
consistency
provided
GPT,
we
repeated
each
query
found
inconsistent
90%
cases.
In
contrast,
acceptable
comprehensible
recommendations
questions
on
side
effects.
identification
cannot
recommended
currently,
because
were
detected,
there
obvious
errors
inconsistent.
However,
if
these
limitations
are
addressed
accordingly,
it
promising
platform
future.
Cureus,
Journal Year:
2023,
Volume and Issue:
unknown
Published: Nov. 24, 2023
Background
Artificial
intelligence
(AI)-based
conversational
models,
such
as
Chat
Generative
Pre-trained
Transformer
(ChatGPT),
Microsoft
Bing,
and
Google
Bard,
have
emerged
valuable
sources
of
health
information
for
lay
individuals.
However,
the
accuracy
provided
by
these
AI
models
remains
a
significant
concern.
This
pilot
study
aimed
to
test
new
tool
with
key
themes
inclusion
follows:
Completeness
content,
Lack
false
in
Evidence
supporting
Appropriateness
Relevance,
referred
"CLEAR",
designed
assess
quality
delivered
AI-based
models.
Methods
Tool
development
involved
literature
review
on
quality,
followed
initial
establishment
CLEAR
tool,
which
comprised
five
items
that
following:
completeness,
lack
information,
evidence
support,
appropriateness,
relevance.
Each
item
was
scored
five-point
Likert
scale
from
excellent
poor.
Content
validity
checked
expert
review.
Pilot
testing
32
healthcare
professionals
using
content
eight
different
topics
deliberately
varying
qualities.
The
internal
consistency
Cronbach's
alpha
(α).
Feedback
resulted
language
modifications
improve
clarity
items.
final
used
generated
four
distinct
topics.
were
ChatGPT
3.5,
4,
two
independent
raters
Cohen's
kappa
(κ)
inter-rater
agreement.
Results
were:
(1)
Is
sufficient?;
(2)
accurate?;
(3)
evidence-based?;
(4)
clear,
concise,
easy
understand?;
(5)
free
irrelevant
information?
revealed
acceptable
α
range
0.669-0.981.
use
yielded
following
average
scores:
Bing
(mean=24.4±0.42),
ChatGPT-4
(mean=23.6±0.96),
Bard
(mean=21.2±1.79),
ChatGPT-3.5
(mean=20.6±5.20).
agreement
Cohen
κ
values:
(κ=0.875,
P<.001),
(κ=0.780,
(κ=0.348,
P=.037),
(κ=.749,
P<.001).
Conclusions
is
brief
yet
helpful
can
aid
standardizing
Future
studies
are
recommended
validate
utility
assessment
AI-generated
health-related
larger
sample
across
various
complex
Audiology and Neurotology,
Journal Year:
2024,
Volume and Issue:
unknown, P. 1 - 7
Published: May 6, 2024
<b><i>Introduction:</i></b>
The
purpose
of
this
study
was
to
evaluate
three
chatbots
–
OpenAI
ChatGPT,
Microsoft
Bing
Chat
(currently
Copilot),
and
Google
Bard
Gemini)
in
terms
their
responses
a
defined
set
audiological
questions.
<b><i>Methods:</i></b>
Each
chatbot
presented
with
the
same
10
authors
rated
on
Likert
scale
ranging
from
1
5.
Additional
features,
such
as
number
inaccuracies
or
errors
provision
references,
were
also
examined.
<b><i>Results:</i></b>
Most
given
by
all
satisfactory
better.
However,
generated
at
least
few
inaccuracies.
ChatGPT
achieved
highest
overall
score,
while
worst.
only
unable
provide
response
one
that
did
not
information
about
its
sources.
<b><i>Conclusions:</i></b>
Chatbots
are
an
intriguing
tool
can
be
used
access
basic
specialized
area
like
audiology.
Nevertheless,
needs
careful,
correct
is
infrequently
mixed
hard
pick
up
unless
user
well
versed
field.
JMIR Public Health and Surveillance,
Journal Year:
2024,
Volume and Issue:
10, P. e53086 - e53086
Published: Jan. 4, 2024
The
online
pharmacy
market
is
growing,
with
legitimate
pharmacies
offering
advantages
such
as
convenience
and
accessibility.
However,
this
increased
demand
has
attracted
malicious
actors
into
space,
leading
to
the
proliferation
of
illegal
vendors
that
use
deceptive
techniques
rank
higher
in
search
results
pose
serious
public
health
risks
by
dispensing
substandard
or
falsified
medicines.
Search
engine
providers
have
started
integrating
generative
artificial
intelligence
(AI)
interfaces,
which
could
revolutionize
delivering
more
personalized
through
a
user-friendly
experience.
improper
integration
these
new
technologies
carries
potential
further
exacerbate
posed
illicit
inadvertently
directing
users
vendors.