OSMANGAZİ JOURNAL OF MEDICINE,
Journal Year:
2024,
Volume and Issue:
46(5)
Published: Sept. 3, 2024
Ücretsiz
olarak
erişim
sağlanabilen
ChatGPT-3,5,
Copilot
ve
Gemini
yapay
zeka
sohbet
botlarına
okülofasiyal
plastik
orbita
cerrahisi
ile
ilişkili
farklı
dillerdeki
aynı
soru
uygulamalarının
bu
programların
performanslarına
olan
etkilerini
araştırmaktır.
Okülofasiyal
30
sorunun
İngilizce
Türkçe
versiyonları
uygulandı.
Sohbet
botlarının
verdikleri
cevaplar
kitap
arkasında
yer
alan
cevap
anahtarı
karşılaştırıldı,
doğru
yanlış
gruplandırıldı.
Birbirlerine
üstünlükleri
istatistiksel
karşılaştırıldı.
ChatGPT-3,5
soruların
%43,3’üne
verirken,
%23,3’üne
verdi
(p=0,07).
%73,3’üne
%63,3’üne
(p=0,375).
%46,7’sine
%33,3’üne
(p=0,344).
Copilot,
soruları
cevaplamada
diğer
programlardan
daha
yüksek
performans
gösterdi
(p<0,05).
bilgi
düzeylerinin
geliştirilmesinin
yanında
performanslarının
da
incelenmeye
geliştirilmeye
ihtiyacı
vardır.
botlarındaki
dezavantajların
düzeltilmesi,
yaygın
güvenilir
bir
şekilde
kullanılmasına
zemin
hazırlayacaktır.
Education Sciences,
Journal Year:
2024,
Volume and Issue:
14(6), P. 636 - 636
Published: June 13, 2024
The
use
of
generative
artificial
intelligence
(GenAI)
in
academia
is
a
subjective
and
hotly
debated
topic.
Currently,
there
are
no
agreed
guidelines
towards
the
usage
GenAI
systems
higher
education
(HE)
and,
thus,
it
still
unclear
how
to
make
effective
technology
for
teaching
learning
practice.
This
paper
provides
an
overview
current
state
research
on
HE.
To
this
end,
study
conducted
systematic
review
relevant
studies
indexed
by
Scopus,
using
preferred
reporting
items
reviews
meta-analyses
(PRISMA)
guidelines.
search
criteria
revealed
total
625
papers,
which
355
met
final
inclusion
criteria.
findings
from
showed
future
trends
documents,
citations,
document
sources/authors,
keywords,
co-authorship.
gaps
identified
suggest
that
while
some
authors
have
looked
at
understanding
detection
AI-generated
text,
may
be
beneficial
understand
can
incorporated
into
supporting
educational
curriculum
assessments,
teaching,
delivery.
Furthermore,
need
additional
interdisciplinary,
multidimensional
HE
through
collaboration.
will
strengthen
awareness
students,
tutors,
other
stakeholders,
instrumental
formulating
guidelines,
frameworks,
policies
usage.
Cureus,
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 31, 2024
Purpose:
To
assess
the
performance
of
"Bard,"
one
ChatGPT's
competitors,
in
answering
practice
questions
for
ophthalmology
board
certification
exam.
Methods:
In
December
2023,
250
multiple-choice
from
"BoardVitals"
exam
question
bank
were
randomly
selected
and
entered
into
Bard
to
artificial
intelligence
chatbot's
ability
comprehend,
process,
answer
complex
scientific
clinical
ophthalmic
questions.
A
random
mix
text-only
image-and-text
10
subsections.
Each
subsection
included
25
The
percentage
correct
responses
was
calculated
per
section,
an
overall
assessment
score
determined.
Results:
On
average,
answered
62.4%
(156/250)
correctly.
worst
24%
(6/25)
on
topic
"Retina
Vitreous,"
best
"Oculoplastics,"
with
a
84%
(21/25).
While
majority
minimal
difficulty,
not
all
could
be
processed
by
Bard.
This
particularly
issue
that
human
images
multiple
visual
files.
Some
vignette-style
also
understood
therefore
omitted.
Future
investigations
will
focus
having
more
increase
available
data
points.
Conclusions:
correctly
is
capable
analyzing
vast
amounts
medical
data,
it
ultimately
lacks
holistic
understanding
experience-informed
knowledge
ophthalmologist.
An
ophthalmologist's
synthesize
diverse
pieces
information
draw
experience
standardized
at
present
irreplaceable,
intelligence,
its
current
form,
can
employed
as
valuable
tool
supplementing
clinicians'
study
methods.
JMIR Medical Education,
Journal Year:
2024,
Volume and Issue:
10, P. e57054 - e57054
Published: March 9, 2024
Artificial
intelligence
models
can
learn
from
medical
literature
and
clinical
cases
generate
answers
that
rival
human
experts.
However,
challenges
remain
in
the
analysis
of
complex
data
containing
images
diagrams.
Asia-Pacific Journal of Ophthalmology,
Journal Year:
2024,
Volume and Issue:
13(4), P. 100085 - 100085
Published: July 1, 2024
Large
language
models
(LLMs),
a
natural
processing
technology
based
on
deep
learning,
are
currently
in
the
spotlight.
These
closely
mimic
comprehension
and
generation.
Their
evolution
has
undergone
several
waves
of
innovation
similar
to
convolutional
neural
networks.
The
transformer
architecture
advancement
generative
artificial
intelligence
marks
monumental
leap
beyond
early-stage
pattern
recognition
via
supervised
learning.
With
expansion
parameters
training
data
(terabytes),
LLMs
unveil
remarkable
human
interactivity,
encompassing
capabilities
such
as
memory
retention
comprehension.
advances
make
particularly
well-suited
for
roles
healthcare
communication
between
medical
practitioners
patients.
In
this
comprehensive
review,
we
discuss
trajectory
their
potential
implications
clinicians
For
clinicians,
can
be
used
automated
documentation,
given
better
inputs
extensive
validation,
may
able
autonomously
diagnose
treat
future.
patient
care,
triage
suggestions,
summarization
documents,
explanation
patient's
condition,
customizing
education
materials
tailored
level.
limitations
possible
solutions
real-world
use
also
presented.
Given
rapid
advancements
area,
review
attempts
briefly
cover
many
that
play
ophthalmic
space,
with
focus
improving
quality
delivery.
Clinical Anatomy,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 21, 2024
The
increasing
application
of
generative
artificial
intelligence
large
language
models
(LLMs)
in
various
fields,
including
medical
education,
raises
questions
about
their
accuracy.
primary
aim
our
study
was
to
undertake
a
detailed
comparative
analysis
the
proficiencies
and
accuracies
six
different
LLMs
(ChatGPT-4,
ChatGPT-3.5-turbo,
ChatGPT-3.5,
Copilot,
PaLM,
Bard,
Gemini)
responding
multiple-choice
(MCQs),
generating
clinical
scenarios
MCQs
for
upper
limb
topics
Gross
Anatomy
course
students.
Selected
chatbots
were
tested,
answering
50
USMLE-style
MCQs.
randomly
selected
from
exam
database
students
reviewed
by
three
independent
experts.
results
five
successive
attempts
answer
each
set
evaluated
terms
accuracy,
relevance,
comprehensiveness.
best
result
provided
ChatGPT-4,
which
answered
60.5%
±
1.9%
accurately,
then
Copilot
(42.0%
0.0%)
ChatGPT-3.5
(41.0%
5.3%),
followed
ChatGPT-3.5-turbo
(38.5%
5.7%).
Google
PaLM
2
(34.5%
4.4%)
Bard
(33.5%
3.0%)
gave
poorest
results.
overall
performance
GPT-4
statistically
superior
(p
<
0.05)
those
GPT-3.5,
GPT-Turbo,
PaLM2,
18.6%,
19.5%,
22%,
26%,
27%,
respectively.
Each
chatbot
asked
generate
scenario
topics-anatomical
snuffbox,
supracondylar
fracture
humerus,
cubital
fossa-and
related
anatomical
with
options
each,
indicate
correct
answers.
Two
experts
analyzed
graded
216
records
received
(0-5
scale).
recorded
Gemini,
2;
had
lowest
grade.
Technological
progress
notwithstanding,
have
yet
mature
sufficiently
take
over
role
teacher
or
facilitator
completely
within
course;
however,
they
can
be
valuable
tools
educators.
Harran Üniversitesi Tıp Fakültesi Dergisi,
Journal Year:
2025,
Volume and Issue:
22(1), P. 61 - 64
Published: March 11, 2025
Amaç:
ChatGPT-3,5,
Copilot
ve
Gemini
yapay
zeka
sohbet
botlarının
oftalmik
patolojiler
intraoküler
tümörlerle
ilişkili
çoktan
seçmeli
sorularda
ki
başarısına
dil
farklılığının
etkisini
araştırmak
Materyal
Method:
Oftalmik
ilgili
bilgi
düzeyini
test
eden
36
İngilizce
soru
çalışmaya
dahil
edildi.
Sertifikasyonlu
çevirmen
(native
speaker)
tarafından
Türkçe
çevirilerinin
gerçekleştirilmesi
sonrasında
bu
soruların
hem
de
olarak
botlarına
soruldu.
Verilen
cevaplar
cevap
anahtarı
ile
karşılaştırılıp
doğru
yanlış
gruplandırıldı.
Bulgular:
sorulara
sırası
%75,
%66,7
%63,9
oranında
verdi.
Bu
programlar
ise
%63,9,
%69,4
Sohbet
botları
arasında
hallerini
cevaplamada
farklı
oranda
görüldüğü
halde,
istatistiksel
anlamlı
bir
fark
tespit
edilmedi
(p>0,05).
Sonuç:
Yapay
dağarcığının
geliştirilmesinin
yanında
dillerde
aynı
algıyı
oluşturabilmek
tek
doğruya
erişimi
sağlayabilmek
için
dilleri
anlama,
çevirebilme
fikir
üretebilme
özelliklerinin
geliştirilmeye
ihtiyacı
vardır.
Research Square (Research Square),
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 13, 2025
Abstract
Large
language
models
(LLMs)
show
potential
for
medical
education,
but
their
domain-specific
capabilities
need
systematic
evaluation.
This
study
presents
a
comparative
assessment
of
thirteen
LLMs
in
urinary
system
histology
education.
Using
multi-dimensional
framework,
we
evaluated
across
two
tasks:
answering
65
validated
multiple-choice
questions
(MCQs)
and
generating
clinical
scenarios
with
items.
For
MCQ
performance,
assessed
accuracy
along
explanation
quality
through
relevance
comprehensiveness
metrics.
scenario
generation,
Quality,
Complexity,
Relevance,
Correctness,
Variety
dimensions.
Performance
varied
substantially
tasks,
ChatGPT-o1
achieving
highest
(96.31
±
17.85%)
Claude-3.5
demonstrating
superior
generation
(91.4%
maximum
possible
score).
All
significantly
outperformed
random
guessing
large
effect
sizes.
Statistical
analyses
revealed
significant
differences
consistency
multiple
attempts
dimensional
most
showing
higher
Correctness
than
Quality
scores
generation.
Term
frequency
analysis
content
imbalances
all
models,
overemphasis
certain
anatomical
structures
complete
omission
others.
Our
findings
demonstrate
that
while
considerable
promise
reliable
implementation
requires
matching
specific
to
appropriate
educational
implementing
verification
mechanisms,
recognizing
current
limitations
pedagogically
balanced
content.
BMJ Open Ophthalmology,
Journal Year:
2025,
Volume and Issue:
10(1), P. e002076 - e002076
Published: April 1, 2025
Background
The
advent
of
generative
artificial
intelligence
has
led
to
the
emergence
multiple
vision
large
language
models
(VLLMs).
This
study
aimed
evaluate
capabilities
commonly
available
VLLMs,
such
as
OpenAI’s
GPT-4V
and
Google’s
Gemini,
in
detecting
diagnosing
ocular
diseases
from
retinal
images.
Methods
analysis
From
Singapore
Epidemiology
Eye
Diseases
(SEED)
study,
we
selected
44
representative
photographs,
including
10
healthy
34
representing
six
eye
(age-related
macular
degeneration,
diabetic
retinopathy,
glaucoma,
visually
significant
cataract,
myopic
degeneration
vein
occlusion).
(both
default
data
analyst
modes)
Google
Gemini
were
prompted
with
each
image
determine
if
retina
was
normal
or
abnormal
provide
diagnostic
descriptions
deemed
abnormal.
outputs
VLLMs
evaluated
for
accuracy
by
three
attending-level
ophthalmologists
using
a
three-point
scale
(poor,
borderline,
good).
Results
mode
demonstrated
highest
detection
rate,
correctly
identifying
33
out
detected
(97.1%),
outperforming
its
(61.8%)
(41.2%).
Despite
relatively
high
rates,
quality
generally
suboptimal—with
only
21.2%
GPT-4V’s
(default)
responses,
4.8%
(data
analyst)
responses
28.6%
Gemini’s
rated
good.
Conclusions
Although
showed
sensitivity
abnormality
detection,
all
inadequate
providing
accurate
diagnoses
diseases.
These
findings
emphasise
need
domain-customised
suggest
continued
human
oversight
clinical
ophthalmology.