Advances in Medical Education and Practice,
Год журнала:
2024,
Номер
Volume 15, С. 857 - 871
Опубликована: Сен. 1, 2024
Artificial
intelligence
(AI)
chatbots
excel
in
language
understanding
and
generation.
These
models
can
transform
healthcare
education
practice.
However,
it
is
important
to
assess
the
performance
of
such
AI
various
topics
highlight
its
strengths
possible
limitations.
This
study
aimed
evaluate
ChatGPT
(GPT-3.5
GPT-4),
Bing,
Bard
compared
human
students
at
a
postgraduate
master's
level
Medical
Laboratory
Sciences.
Introduction:
ChatGPT
has
been
tested
in
many
disciplines,
but
only
a
few
have
involved
hearing
diagnosis
and
none
to
physiology
or
audiology
more
generally.
The
consistency
of
the
chatbot's
responses
same
question
posed
multiple
times
not
well
investigated
either.
This
study
aimed
assess
accuracy
repeatability
3.5
4
on
test
questions
concerning
objective
measures
hearing.
Of
particular
interest
was
short-term
which
here
four
separate
days
extended
over
one
week.
Methods:
We
used
30
single-answer,
multiple-choice
exam
from
one-year
course
methods
testing
were
five
both
(the
free
version)
paid
each
(two
week
two
following
week).
evaluated
terms
response
key.
To
evaluate
time,
percent
agreement
Cohen's
Kappa
calculated.
Results:
overall
48-49%,
while
that
65-69%.
consistently
failed
pass
threshold
50%
correct
responses.
Within
single
day,
76-79%
for
87-88%
(Cohen's
0.67-0.71
0.81-0.84
respectively).
between
different
75-79%
85-88%
0.65-0.69
0.80-0.85
Conclusion:
outperforms
higher
time.
However,
great
variability
casts
doubt
possible
professional
applications
versions.
Scientific Reports,
Год журнала:
2024,
Номер
14(1)
Опубликована: Апрель 12, 2024
Abstract
Health
equity
and
accessing
Spanish
kidney
transplant
information
continues
being
a
substantial
challenge
facing
the
Hispanic
community.
This
study
evaluated
ChatGPT’s
capabilities
in
translating
54
English
frequently
asked
questions
(FAQs)
into
using
two
versions
of
AI
model,
GPT-3.5
GPT-4.0.
The
FAQs
included
19
from
Organ
Procurement
Transplantation
Network
(OPTN),
15
National
Service
(NHS),
20
Kidney
Foundation
(NKF).
Two
native
Spanish-speaking
nephrologists,
both
whom
are
Mexican
heritage,
scored
translations
for
linguistic
accuracy
cultural
sensitivity
tailored
to
Hispanics
1–5
rubric.
inter-rater
reliability
evaluators,
measured
by
Cohen’s
Kappa,
was
0.85.
Overall
4.89
±
0.31
versus
4.94
0.23
GPT-4.0
(non-significant
p
=
0.23).
Both
4.96
0.19
(p
1.00).
By
source,
4.84
0.37
4.93
0.26
4.90
4.95
0.22
For
sensitivity,
5.00
0.00
(NKF),
while
These
high
scores
demonstrate
Chat
GPT
effectively
translated
across
systems.
findings
suggest
GPT’s
potential
promote
health
improving
access
essential
information.
Additional
research
should
evaluate
its
medical
translation
diverse
contexts/languages.
English-to-Spanish
may
increase
vital
underserved
patients.
BMC Medical Education,
Год журнала:
2024,
Номер
24(1)
Опубликована: Сен. 16, 2024
ChatGPT,
a
recently
developed
artificial
intelligence
(AI)
chatbot,
has
demonstrated
improved
performance
in
examinations
the
medical
field.
However,
thus
far,
an
overall
evaluation
of
potential
ChatGPT
models
(ChatGPT-3.5
and
GPT-4)
variety
national
health
licensing
is
lacking.
This
study
aimed
to
provide
comprehensive
assessment
models'
for
medical,
pharmacy,
dentistry,
nursing
research
through
meta-analysis.
Clinical Kidney Journal,
Год журнала:
2024,
Номер
17(8)
Опубликована: Июнь 21, 2024
In
November
2022,
OpenAI
released
a
chatbot
named
ChatGPT,
product
capable
of
processing
natural
language
to
create
human-like
conversational
dialogue.
It
has
generated
lot
interest,
including
from
the
scientific
community
and
medical
science
community.
Recent
publications
have
shown
that
ChatGPT
can
correctly
answer
questions
exams
such
as
United
States
Medical
Licensing
Examination
other
specialty
exams.
To
date,
there
been
no
studies
in
which
tested
on
field
nephrology
anywhere
world.
Introduction
The
Chat
Generative
Pretrained
Transformer
(ChatGPT)
has
developed
rapidly
and
is
used
in
many
fields,
including
healthcare
informatics.
This
study
evaluated
ChatGPT
(GPT-4V)'s
performance
on
the
Healthcare
Information
Technologist
(HCIT)
certification
exam
Japan,
which
assesses
certified
professionals
who
work
with
electronic
health
records
to
improve
patient
care.
Methodology
Four
hundred
seventy-six
questions
from
HCIT
were
targeted
over
three
years.
(GPT-4V)
was
tested
its
ability
answer
an
determine
if
it
could
perform
as
well
or
better
than
aspirants
taking
exam.
Moreover,
for
each
academic
category,
format,
presence
absence
of
images,
calculations.
Results
mean
correct
rate
all
84%.
achieved
passing
criteria.
simple-choice
(A-type)
higher
that
multiple-choice
(X2-type)
(P
<
0.05).
success
images
lower
text-only
0.01),
requiring
calculations
those
without
Conclusions
met
criteria
19th
21st
exams,
suggesting
effective
may
possess
minimum
required
knowledge,
understanding,
application
skills
certification.
JMIR Medical Education,
Год журнала:
2025,
Номер
11, С. e65108 - e65108
Опубликована: Март 5, 2025
Advancements
in
ChatGPT
are
transforming
medical
education
by
providing
new
tools
for
assessment
and
learning,
potentially
enhancing
evaluations
doctors
improving
instructional
effectiveness.
This
study
evaluates
the
performance
consistency
of
ChatGPT-3.5
Turbo
ChatGPT-4o
mini
solving
European
Portuguese
examination
questions
(2023
National
Examination
Access
to
Specialized
Training;
Prova
Nacional
de
Acesso
à
Formação
Especializada
[PNA])
compares
their
human
candidates.
was
tested
on
first
part
(74
questions)
July
18,
2024,
second
19,
2024.
Each
model
generated
an
answer
using
its
natural
language
processing
capabilities.
To
test
consistency,
each
asked,
"Are
you
sure?"
after
answer.
Differences
between
responses
were
analyzed
McNemar
with
continuity
correction.
A
single-parameter
t
compared
models'
Frequencies
percentages
used
categorical
variables,
means
CIs
numerical
variables.
Statistical
significance
set
at
P<.05.
achieved
accuracy
rate
65%
(48/74)
2023
PNA
examination,
surpassing
Turbo.
outperformed
candidates,
while
had
a
more
moderate
performance.
highlights
advancements
potential
models
education,
emphasizing
need
careful
implementation
teacher
oversight
further
research.
Objectives:
ChatGPT
is
an
advanced
chatbot
based
on
Large
Language
Model
that
has
the
ability
to
answer
questions.
Undoubtedly,
capable
of
transforming
communication,
education,
and
customer
support;
however,
can
it
play
role
a
doctor?
In
Poland,
prior
obtaining
medical
diploma,
candidates
must
successfully
pass
Medical
Final
Examination.
Methods:
The
purpose
this
research
was
determine
how
well
performed
Polish
Examination,
which
passing
required
become
doctor
in
Poland
(an
exam
considered
passed
if
at
least
56%
tasks
are
answered
correctly).
A
total
2138
categorized
Examination
questions
(from
11
examination
sessions
held
between
2013–2015
2021–2023)
were
presented
ChatGPT-3.5
from
19
26
May
2023.
For
further
analysis,
divided
into
quintiles
difficulty
duration,
as
question
types
(simple
A-type
or
complex
K-type).
answers
provided
by
compared
official
key,
reviewed
for
any
changes
resulting
advancement
knowledge.
Results:
correctly
53.4%–64.9%
8
out
sessions,
achieved
scores
(60%).
correlation
efficacy
artificial
intelligence
level
complexity,
difficulty,
length
found
be
negative.
AI
outperformed
humans
one
category:
psychiatry
(77.18%
vs.
70.25%,
p
=
0.081).
Conclusions:
performance
deemed
satisfactory;
observed
markedly
inferior
human
graduates
majority
instances.
Despite
its
potential
utility
many
areas,
constrained
inherent
limitations
prevent
entirely
supplanting
expertise