JAMA Network Open,
Год журнала:
2025,
Номер
8(2), С. e2457879 - e2457879
Опубликована: Фев. 4, 2025
Importance
There
is
much
interest
in
the
clinical
integration
of
large
language
models
(LLMs)
health
care.
Many
studies
have
assessed
ability
LLMs
to
provide
advice,
but
quality
their
reporting
uncertain.
Objective
To
perform
a
systematic
review
examine
variability
among
peer-reviewed
evaluating
performance
generative
artificial
intelligence
(AI)–driven
chatbots
for
summarizing
evidence
and
providing
advice
inform
development
Chatbot
Assessment
Reporting
Tool
(CHART).
Evidence
Review
A
search
MEDLINE
via
Ovid,
Embase
Elsevier,
Web
Science
from
inception
October
27,
2023,
was
conducted
with
help
sciences
librarian
yield
7752
articles.
Two
reviewers
screened
articles
by
title
abstract
followed
full-text
identify
primary
accuracy
AI-driven
(chatbot
studies).
then
performed
data
extraction
137
eligible
studies.
Findings
total
were
included.
Studies
examined
topics
surgery
(55
[40.1%]),
medicine
(51
[37.2%]),
care
(13
[9.5%]).
focused
on
treatment
(91
[66.4%]),
diagnosis
(60
[43.8%]),
or
disease
prevention
(29
[21.2%]).
Most
(136
[99.3%])
evaluated
inaccessible,
closed-source
did
not
enough
information
version
LLM
under
evaluation.
All
lacked
sufficient
description
characteristics,
including
temperature,
token
length,
fine-tuning
availability,
layers,
other
details.
describe
prompt
engineering
phase
study.
The
date
querying
reported
54
(39.4%)
(89
[65.0%])
used
subjective
means
define
successful
chatbot,
while
less
than
one-third
addressed
ethical,
regulatory,
patient
safety
implications
LLMs.
Conclusions
Relevance
In
this
chatbot
studies,
heterogeneous
may
CHART
standards.
Ethical,
considerations
are
crucial
as
grows
Frontiers of Computer Science,
Год журнала:
2024,
Номер
18(6)
Опубликована: Март 22, 2024
Abstract
Autonomous
agents
have
long
been
a
research
focus
in
academic
and
industry
communities.
Previous
often
focuses
on
training
with
limited
knowledge
within
isolated
environments,
which
diverges
significantly
from
human
learning
processes,
makes
the
hard
to
achieve
human-like
decisions.
Recently,
through
acquisition
of
vast
amounts
Web
knowledge,
large
language
models
(LLMs)
shown
potential
human-level
intelligence,
leading
surge
LLM-based
autonomous
agents.
In
this
paper,
we
present
comprehensive
survey
these
studies,
delivering
systematic
review
holistic
perspective.
We
first
discuss
construction
agents,
proposing
unified
framework
that
encompasses
much
previous
work.
Then,
overview
diverse
applications
social
science,
natural
engineering.
Finally,
delve
into
evaluation
strategies
commonly
used
for
Based
also
several
challenges
future
directions
field.
IEEE Access,
Год журнала:
2024,
Номер
12, С. 26839 - 26874
Опубликована: Янв. 1, 2024
Large
Language
Models
(LLMs)
recently
demonstrated
extraordinary
capability,
including
natural
language
processing
(NLP),
translation,
text
generation,
question
answering,
etc.
Moreover,
LLMs
are
a
new
and
essential
part
of
computerized
processing,
having
the
ability
to
understand
complex
verbal
patterns
generate
coherent
appropriate
replies
for
situation.
Though
this
success
has
prompted
substantial
increase
in
research
contributions,
rapid
growth
made
it
difficult
overall
impact
these
improvements.
Since
lot
on
is
coming
out
quickly,
getting
tough
get
an
overview
all
them
short
note.
Consequently,
community
would
benefit
from
but
thorough
review
recent
changes
area.
This
article
thoroughly
overviews
LLMs,
their
history,
architectures,
transformers,
resources,
training
methods,
applications,
impacts,
challenges,
paper
begins
by
discussing
fundamental
concepts
with
its
traditional
pipeline
phase.
It
then
provides
existing
works,
history
evolution
over
time,
architecture
transformers
different
resources
methods
that
have
been
used
train
them.
also
datasets
utilized
studies.
After
that,
discusses
wide
range
applications
biomedical
healthcare,
education,
social,
business,
agriculture.
illustrates
how
create
society
shape
future
AI
they
can
be
solve
real-world
problems.
Then
explores
open
issues
challenges
deploying
scenario.
Our
aims
help
practitioners,
researchers,
experts
pre-trained
goals.
Natural Language Processing Journal,
Год журнала:
2023,
Номер
6, С. 100048 - 100048
Опубликована: Дек. 19, 2023
Large
language
models
(LLMs)
are
a
special
class
of
pretrained
(PLMs)
obtained
by
scaling
model
size,
pretraining
corpus
and
computation.
LLMs,
because
their
large
size
on
volumes
text
data,
exhibit
abilities
which
allow
them
to
achieve
remarkable
performances
without
any
task-specific
training
in
many
the
natural
processing
tasks.
The
era
LLMs
started
with
OpenAI's
GPT-3
model,
popularity
has
increased
exponentially
after
introduction
like
ChatGPT
GPT4.
We
refer
its
successor
OpenAI
models,
including
GPT4,
as
family
(GLLMs).
With
ever-rising
GLLMs,
especially
research
community,
there
is
strong
need
for
comprehensive
survey
summarizes
recent
progress
multiple
dimensions
can
guide
community
insightful
future
directions.
start
paper
foundation
concepts
transformers,
transfer
learning,
self-supervised
models.
then
present
brief
overview
GLLMs
discuss
various
downstream
tasks,
specific
domains
languages.
also
data
labelling
augmentation
robustness
effectiveness
evaluators,
finally,
conclude
To
summarize,
this
will
serve
good
resource
both
academic
industry
people
stay
updated
latest
related
GLLMs.
Computational Linguistics,
Год журнала:
2024,
Номер
50(3), С. 1097 - 1179
Опубликована: Янв. 1, 2024
Abstract
Rapid
advancements
of
large
language
models
(LLMs)
have
enabled
the
processing,
understanding,
and
generation
human-like
text,
with
increasing
integration
into
systems
that
touch
our
social
sphere.
Despite
this
success,
these
can
learn,
perpetuate,
amplify
harmful
biases.
In
article,
we
present
a
comprehensive
survey
bias
evaluation
mitigation
techniques
for
LLMs.
We
first
consolidate,
formalize,
expand
notions
fairness
in
natural
defining
distinct
facets
harm
introducing
several
desiderata
to
operationalize
then
unify
literature
by
proposing
three
intuitive
taxonomies,
two
evaluation,
namely,
metrics
datasets,
one
mitigation.
Our
taxonomy
disambiguates
relationship
between
organizes
different
levels
at
which
they
operate
model:
embeddings,
probabilities,
generated
text.
second
datasets
categorizes
their
structure
as
counterfactual
inputs
or
prompts,
identifies
targeted
harms
groups;
also
release
consolidation
publicly
available
improved
access.
third
classifies
methods
intervention
during
pre-processing,
in-training,
intra-processing,
post-processing,
granular
subcategories
elucidate
research
trends.
Finally,
identify
open
problems
challenges
future
work.
Synthesizing
wide
range
recent
research,
aim
provide
clear
guide
existing
empowers
researchers
practitioners
better
understand
prevent
propagation
Computers and Education Artificial Intelligence,
Год журнала:
2023,
Номер
6, С. 100199 - 100199
Опубликована: Дек. 29, 2023
Writing
proficiency
is
an
essential
skill
for
upper
secondary
students
that
can
be
enhanced
through
effective
feedback.
Creating
feedback
on
writing
tasks,
however,
time-intensive
and
presents
a
challenge
educators,
often
resulting
in
receiving
insufficient
or
no
The
advent
of
text-generating
large
language
models
(LLMs)
offers
promising
solution,
namely,
automated
evidence-based
generation.
Yet,
empirical
evidence
from
randomized
controlled
studies
about
the
effectiveness
LLM-generated
missing.
To
address
this
issue,
current
study
compared
to
A
sample
N
=
459
English
as
foreign
wrote
argumentative
essay.
Students
experimental
group
were
asked
revise
their
text
according
was
generated
using
LLM
GPT-3.5-turbo.
control
revised
essays
without
We
assessed
improvement
revision
essay
scoring.
results
showed
increased
performance
(d
.19)
task
motivation
0.36).
Moreover,
it
positive
emotions
0.34)
revising
findings
highlight
LLMs
allows
create
timely
positively
relate
students'
cognitive
affective-motivational
outcomes.
Future
perspectives
implications
research
practice
intelligent
tutoring
systems
are
discussed.