AI,
Journal Year:
2024,
Volume and Issue:
5(4), P. 1942 - 1954
Published: Oct. 16, 2024
Background:
Nonsurgical
treatment
of
uncomplicated
appendicitis
is
a
reasonable
option
in
many
cases
despite
the
sparsity
robust,
easy
access,
externally
validated,
and
multimodally
informed
clinical
decision
support
systems
(CDSSs).
Developed
by
OpenAI,
Generative
Pre-trained
Transformer
3.5
model
(GPT-3)
may
provide
enhanced
for
surgeons
less
certain
or
those
posing
higher
risk
(relative)
operative
contra-indications.
Our
objective
was
to
determine
whether
GPT-3.5,
when
provided
high-throughput
clinical,
laboratory,
radiological
text-based
information,
will
come
decisions
similar
machine
learning
board-certified
surgeon
(reference
standard)
decision-making
appendectomy
versus
conservative
treatment.
Methods:
In
this
cohort
study,
we
randomly
collected
patients
presenting
at
emergency
department
(ED)
two
German
hospitals
(GFO,
Troisdorf,
University
Hospital
Cologne)
with
right
abdominal
pain
between
October
2022
2023.
Statistical
analysis
performed
using
R,
version
3.6.2,
on
RStudio,
2023.03.0
+
386.
Overall
agreement
GPT-3.5
output
reference
standard
assessed
means
inter-observer
kappa
values
as
well
accuracy,
sensitivity,
specificity,
positive
negative
predictive
“Caret”
“irr”
packages.
significance
defined
p
<
0.05.
Results:
There
surgeon’s
102
113
cases,
all
where
decided
upon
were
correctly
classified
GPT-3.5.
The
estimated
training
accuracy
83.3%
(95%
CI:
74.0,
90.4),
while
validation
87.0%
66.4,
97.2).
This
comparison
90.3%
83.2,
95.0),
which
did
not
perform
significantly
better
(p
=
0.21).
Conclusions:
first
study
“intended
use”
surgical
our
knowledge,
comparing
an
algorithm
found
high
degree
lower
pain.
Ophthalmic Epidemiology,
Journal Year:
2025,
Volume and Issue:
unknown, P. 1 - 8
Published: April 2, 2025
It
is
difficult
to
explain
the
complications
of
surgery
patients.
Care
has
be
taken
convey
facts
clearly
and
objectively
while
expressing
concern
for
their
wellbeing.
This
study
compared
responses
from
surgeons
with
a
large
language
model
(LLM)-based
chatbot.
We
presented
10
common
scenarios
cataract
seven
senior
The
were
graded
by
two
independent
graders
comprehension,
readability,
complexity
using
previously
validated
indices.
analyzed
accuracy
completeness.
Honesty
empathy
both
groups.
Scores
averaged
tabulated.
readability
scores
(10.64)
significantly
less
complex
than
chatbot
(12.54)
(p
<
0.001).
shorter,
whereas
tended
give
more
detailed
answers.
average
completeness
score
chatbot-generated
conversations
was
2.36
(0.55),
which
similar
surgeons'
2.58
(0.36)
=
0.164).
generalized,
lacking
specific
alternative
measures.
While
higher
(1.81
vs.
1.20,
p
0.041),
honesty
showed
no
significant
difference.
LLM-based
gave
description
complication
but
about
had
in-depth
understanding
situation.
complete
scored
empathy.
With
training
real-world
specialized
ophthalmologic
data,
chatbots
could
used
assist
in
counselling
patients
postoperative
complications.
BMC Medical Education,
Journal Year:
2025,
Volume and Issue:
25(1)
Published: April 25, 2025
Abstract
Objective
To
evaluate
the
performance
of
advanced
large
language
models
(LLMs)—OpenAI-ChatGPT
4,
Google
AI-Gemini
1.5
Pro,
Cohere-Command
R
+
and
Meta
AI-Llama
3
70B
on
questions
from
Turkish
Medical
Specialty
Training
Entrance
Exam
(2021,
1st
semester)
analyze
their
answers
for
user
interpretability
in
languages
other
than
English.
Methods
The
study
used
Basic
Sciences
Clinical
exams
held
March
21,
2021.
240
were
presented
to
LLMs
Turkish,
responses
evaluated
based
official
published
by
Student
Selection
Placement
Centre.
Results
ChatGPT
4
was
best-performing
model
with
an
overall
accuracy
88.75%.
Llama
followed
closely
79.17%
accuracy.
Gemini
Pro
achieved
78.13%
accuracy,
while
Command
lagged
50%
demonstrated
strengths
both
basic
clinical
medical
science
questions.
Performance
varied
across
question
difficulties,
maintaining
high
even
most
challenging
Conclusions
GPT-4
satisfactory
results
Exam,
demonstrating
potential
as
safe
sources
sciences
knowledge
These
could
be
valuable
resources
education
support
non-English
speaking
areas.
However,
show
but
need
significant
improvement
compete
models.
Proceedings of the ACM on Human-Computer Interaction,
Journal Year:
2025,
Volume and Issue:
9(2), P. 1 - 22
Published: May 2, 2025
Software
engineers
have
historically
relied
on
human-powered
Q&A
platforms
like
Stack
Overflow
(SO)
as
coding
aids.
With
the
rise
of
generative
AI,
developers
started
to
adopt
AI
chatbots,
such
ChatGPT,
in
their
software
development
process.
Recognizing
potential
parallels
between
and
AI-powered
question-based
we
investigate
compare
how
integrate
this
assistance
into
real-world
experiences
by
conducting
a
thematic
analysis
1700+
Reddit
posts.
Through
comparative
study
SO
identified
each
platform's
strengths,
use
cases,
barriers.
Our
findings
suggest
that
ChatGPT
offers
fast,
clear,
comprehensive
responses
fosters
more
respectful
environment
than
SO.
However,
concerns
about
ChatGPT's
reliability
stem
from
its
overly
confident
tone
absence
validation
mechanisms
SO's
voting
system.
Based
these
findings,
synthesized
design
implications
for
future
GenAI
code
assistants
recommend
workflow
leveraging
unique
features
improve
developer
experiences.
Cureus,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 16, 2024
The
emergence
of
large
language
models
(LLMs)
has
led
to
significant
interest
in
their
potential
use
as
medical
assistive
tools.
Prior
investigations
have
analyzed
the
overall
comparative
performance
LLM
versions
within
different
ophthalmology
subspecialties.
However,
limited
characterized
on
image-based
questions,
a
recent
advance
capabilities.
purpose
this
study
was
evaluate
Chat
Generative
Pre-Trained
Transformers
(ChatGPT)
3.5
and
4.0
text-only
questions
using
oculoplastic
subspecialty
from
StatPearls
OphthoQuestions
question
banks.
Annals of Medicine and Surgery,
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 3, 2024
Background:
The
integration
of
artificial
intelligence
(AI)
chatbots
like
Google’s
Bard,
OpenAI’s
ChatGPT,
and
Microsoft’s
Bing
Chatbot
into
academic
professional
domains,
including
cardiology,
has
been
rapidly
evolving.
Their
application
in
educational
research
frameworks,
however,
raises
questions
about
their
efficacy,
particularly
specialized
fields
cardiology.
This
study
aims
to
evaluate
the
knowledge
depth
accuracy
these
AI
cardiology
using
a
multiple-choice
question
(MCQ)
format.
Methods:
was
conducted
as
an
exploratory,
cross-sectional
November
2023
on
bank
100
MCQs
covering
various
topics
that
created
from
authoritative
textbooks
banks.
These
were
then
used
assess
level
Microsoft
Bing,
ChatGPT
4.0.
Each
entered
manually
chatbots,
ensuring
no
memory
retention
bias.
Results:
found
4.0
demonstrated
highest
score
with
87%
accuracy,
followed
by
at
60%
Bard
46%.
performance
varied
across
different
subtopics,
consistently
outperforming
others.
Notably,
revealed
significant
differences
proficiency
specific
domains.
Conclusion:
highlights
spectrum
efficacy
among
disseminating
knowledge.
emerged
potential
auxiliary
resource
surpassing
traditional
learning
methods
some
aspects.
However,
variability
systems
underscores
need
for
cautious
evaluation
continuous
improvement,
especially
ensure
reliability
medical
dissemination.
Frontiers in Artificial Intelligence,
Journal Year:
2023,
Volume and Issue:
6
Published: Dec. 14, 2023
This
paper
presents
a
study
on
the
use
of
AI
models
for
classification
case
reports
assisted
suicide
procedures.
The
database
five
Dutch
regional
bioethics
committees
was
scraped
to
collect
72
available
in
English.
We
trained
several
according
categories
defined
by
Termination
Life
Request
and
Assisted
Suicide
(Review
Procedures)
Act.
also
conducted
related
project
fine-tune
an
OpenAI
GPT-3.5-turbo
large
language
model
generating
new
fictional
but
plausible
cases.
As
is
increasingly
being
used
judgement,
it
possible
imagine
application
decision-making
regarding
suicide.
Here
we
explore
two
arising
questions:
feasibility
ethics,
with
aim
contributing
critical
assessment
potential
role
highly
sensitive
areas.