Journal of Science Humanities and Arts - JOSHA,
Год журнала:
2024,
Номер
11(2)
Опубликована: Янв. 1, 2024
The
last
few
years
have
seen
incredibly
rapid
progress
in
the
field
of
generative
artificial
intelligence.Talking
to
machines
and
getting
answers
natural
language
is
part
our
new,
elusive
normal.Driven
by
exponential
growth
both
computing
power
internet-scale
data,
new
digital
assistants
are
trained
estimating
most
likely
next
element
a
given
context.Recent
clearly
shown
that
this
general
objective
can
lead
ability
develop
complex
diverse
capabilities
from
simple
principles.At
same
time,
however,
it
interesting
structures
compression
training
data
sometimes
unpredictable
artefacts.The
aim
article
shed
light
on
mechanisms
behind
current
large
models
provide
guidance
how
get
best
question.
Journal of the American Medical Informatics Association,
Год журнала:
2024,
Номер
31(10), С. 2315 - 2327
Опубликована: Июнь 20, 2024
Although
supervised
machine
learning
is
popular
for
information
extraction
from
clinical
notes,
creating
large
annotated
datasets
requires
extensive
domain
expertise
and
time-consuming.
Meanwhile,
language
models
(LLMs)
have
demonstrated
promising
transfer
capability.
In
this
study,
we
explored
whether
recent
LLMs
could
reduce
the
need
large-scale
data
annotations.
Research Square (Research Square),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 6, 2024
Although
supervised
machine
learning
is
popular
for
information
extraction
from
clinical
notes,
creating
large,
annotated
datasets
requires
extensive
domain
expertise
and
time-consuming.
Meanwhile,
large
language
models
(LLMs)
have
demonstrated
promising
transfer
capability.
In
this
study,
we
explored
whether
recent
LLMs
can
reduce
the
need
large-scale
data
annotations.
We
curated
a
manually
labeled
dataset
of
769
breast
cancer
pathology
reports,
with
13
categories,
to
compare
zero-shot
classification
capability
GPT-4
model
GPT-3.5
performance
three
architectures:
random
forests
classifier,
long
short-term
memory
networks
attention
(LSTM-Att),
UCSF-BERT
model.
Across
all
tasks,
performed
either
significantly
better
than
or
as
well
best
model,
LSTM-Att
(average
macro
F1
score
0.83
vs.
0.75).
On
tasks
high
imbalance
between
labels,
differences
were
more
prominent.
Frequent
sources
errors
included
inferences
multiple
samples
complex
task
design.
where
cannot
be
easily
collected,
burden
labeling.
However,
if
use
prohibitive,
simpler
provide
comparable
results.
potential
speed
up
execution
NLP
studies
by
reducing
curating
datasets.
This
may
increase
utilization
NLP-based
variables
outcomes
in
observational
studies.
American Journal of Medical Genetics Part A,
Год журнала:
2024,
Номер
unknown
Опубликована: Сен. 13, 2024
ABSTRACT
Accurately
diagnosing
rare
pediatric
diseases
frequently
represent
a
clinical
challenge
due
to
their
complex
and
unusual
presentations.
Here,
we
explore
the
capabilities
of
three
large
language
models
(LLMs),
GPT‐4,
Gemini
Pro,
custom‐built
LLM
(GPT‐4
integrated
with
Human
Phenotype
Ontology
[GPT‐4
HPO]),
by
evaluating
diagnostic
performance
on
61
disease
case
reports.
The
LLMs
were
assessed
for
accuracy
in
identifying
specific
diagnoses,
listing
correct
diagnosis
among
differential
list,
broad
categories.
In
addition,
GPT‐4
HPO
was
tested
100
general
pediatrics
reports
previously
other
further
validate
its
performance.
results
indicated
that
able
predict
13.1%,
whereas
both
Pro
had
accuracies
8.2%.
Further,
showed
an
improved
compared
two
list
category.
Although
these
findings
underscore
potential
support,
particularly
when
enhanced
domain‐specific
ontologies,
they
also
stress
need
improvement
prior
integration
into
practice.
Research
Questions:
(1)
Is
there
a
pattern
of
racial
bias
in
student
advising
recommendations
made
by
generative
AI?
(2)
What
safeguards
can
promote
equity
when
using
AI
high-stakes
decision-making?
Methodology:
Using
lists
names
associated
with
various
ethnic/racial
groups,
we
asked
ChatGPT
and
Claude
for
colleges
majors
each
student.
Results:
was
more
likely
to
recommend
STEM
some
groups.
did
not
show
systematic
metrics
school
quality,
but
did.
There
were
also
overall
differences
the
recommended
ChatGPT.
Implications:
We
provide
cautions
tasks.
The
growing
interest
in
advanced
large
language
models
(LLMs)
like
ChatGPT
has
sparked
debate
about
how
best
to
use
them
various
human
activities.
However,
a
neglected
issue
the
concerning
applications
of
LLMs
is
whether
they
can
reason
logically
and
follow
rules
novel
contexts,
which
are
critical
for
our
understanding
LLMs.
To
address
this
knowledge
gap,
study
investigates
five
(ChatGPT-4o,
Claude,
Gemini,
Meta
AI,
Mistral)
using
word
ladder
puzzles
assess
their
logical
reasoning
rule-adherence
capabilities.
Our
two-phase
methodology
involves
(1)
explicit
instructions
regarding
solve
then
evaluate
rule
understanding,
followed
by
(2)
assessing
LLMs’
ability
create
while
adhering
rules.
Additionally,
we
test
implicitly
recognize
avoid
HIPAA
privacy
violations
as
an
example
real-world
scenario.
findings
reveal
that
show
persistent
lack
systematically
fail
puzzle
Furthermore,
all
except
Claude
prioritized
task
completion
(text
writing)
over
ethical
considerations
test.
expose
flaws
rule-following
capabilities,
raising
concerns
reliability
tasks
requiring
strict
reasoning.
Therefore,
urge
caution
when
integrating
into
fields
highlight
need
further
research
capabilities
limitations
ensure
responsible
AI
development.
npj Digital Medicine,
Год журнала:
2025,
Номер
8(1)
Опубликована: Март 4, 2025
Effectively
managing
evidence-based
information
is
increasingly
challenging.
This
study
tested
large
language
models
(LLMs),
including
document-
and
online-enabled
retrieval-augmented
generation
(RAG)
systems,
using
13
recent
neurology
guidelines
across
130
questions.
Results
showed
substantial
variability.
RAG
improved
accuracy
compared
to
base
but
still
produced
potentially
harmful
answers.
RAG-based
systems
performed
worse
on
case-based
than
knowledge-based
Further
refinement
regulation
needed
for
safe
clinical
integration
of
RAG-enhanced
LLMs.
From
the
standpoint
of
articles
we
published,
2024
represents
first
post-COVID
year
for
JAMA
Pediatrics.
Articles
related
to
COVID—its
direct
and
indirect
effects—now
represent
a
small
fraction
science
disseminate.