As
dialogue
systems
become
more
popular,
evaluation
of
their
response
quality
gains
importance.
Engagingness
highly
correlates
with
overall
and
creates
a
sense
connection
that
gives
human
participants
fulfilling
experience.
Although
qualities
like
coherence
fluency
are
readily
measured
well-worn
automatic
metrics,
evaluating
engagingness
often
relies
on
assessment,
which
is
costly
time-consuming
process.
Existing
metrics
evaluate
the
without
conversation
history,
designed
for
one
dataset,
or
have
limited
correlation
annotations.
Furthermore,
they
been
tested
exclusively
English
conversations.
Given
increasingly
available
in
languages
beyond
English,
multilingual
capabilities
essential.
We
propose
large
language
models
(LLMs)
may
be
used
through
prompting,
ask
how
prompt
constructs
translated
prompts
compare
setting.
provide
prompt-design
taxonomy
find
using
selected
elements
LLMs,
including
our
comprehensive
definition
engagingness,
outperforms
state-of-the-art
methods
across
multiple
languages.
Journal of Artificial Intelligence Research,
Год журнала:
2024,
Номер
79, С. 417 - 446
Опубликована: Фев. 6, 2024
Generative
Artificial
Intelligence
(AI)
is
one
of
the
most
exciting
developments
in
Computer
Science
last
decade.
At
same
time,
Reinforcement
Learning
(RL)
has
emerged
as
a
very
successful
paradigm
for
variety
machine
learning
tasks.
In
this
survey,
we
discuss
state
art,
opportunities
and
open
research
questions
applying
RL
to
generative
AI.
particular,
will
three
types
applications,
namely,
an
alternative
way
generation
without
specified
objectives;
generating
outputs
while
concurrently
maximizing
objective
function;
and,
finally,
embedding
desired
characteristics,
which
cannot
be
easily
captured
by
means
function,
into
process.
We
conclude
survey
with
in-depth
discussion
challenges
fascinating
emerging
area.
Transactions of the Association for Computational Linguistics,
Год журнала:
2024,
Номер
12, С. 484 - 506
Опубликована: Янв. 1, 2024
Abstract
While
large
language
models
(LLMs)
have
shown
remarkable
effectiveness
in
various
NLP
tasks,
they
are
still
prone
to
issues
such
as
hallucination,
unfaithful
reasoning,
and
toxicity.
A
promising
approach
rectify
these
flaws
is
correcting
LLMs
with
feedback,
where
the
LLM
itself
prompted
or
guided
feedback
fix
problems
its
own
output.
Techniques
leveraging
automated
feedback—either
produced
by
(self-correction)
some
external
system—are
of
particular
interest
make
LLM-based
solutions
more
practical
deployable
minimal
human
intervention.
This
paper
provides
an
exhaustive
review
recent
advances
categorizing
them
into
training-time,
generation-time,
post-hoc
approaches.
We
also
identify
potential
challenges
future
directions
this
emerging
field.
Transactions of the Association for Computational Linguistics,
Год журнала:
2024,
Номер
12, С. 1011 - 1026
Опубликована: Янв. 1, 2024
Abstract
One
widely
cited
barrier
to
the
adoption
of
LLMs
as
proxies
for
humans
in
subjective
tasks
is
their
sensitivity
prompt
wording—but
interestingly,
also
display
sensitivities
instruction
changes
form
response
biases.
We
investigate
extent
which
reflect
human
biases,
if
at
all.
look
survey
design,
where
biases
caused
by
wordings
“prompts”
have
been
extensively
explored
social
psychology
literature.
Drawing
from
these
works,
we
design
a
dataset
and
framework
evaluate
whether
exhibit
human-like
questionnaires.
Our
comprehensive
evaluation
nine
models
shows
that
popular
open
commercial
generally
fail
behavior,
particularly
undergone
RLHF.
Furthermore,
even
model
significant
change
same
direction
humans,
find
they
are
sensitive
perturbations
do
not
elicit
humans.
These
results
highlight
pitfalls
using
proxies,
underscore
need
finer-grained
characterizations
behavior.1
Journal of Informatics Education and Research,
Год журнала:
2024,
Номер
unknown
Опубликована: Янв. 1, 2024
Explainable
AI
(XAI)
is
one
of
the
key
game-changing
features
in
machine
learning
models,
which
contribute
to
making
them
more
transparent,
regulated
and
usable
different
applications.
In
(the)
investigation
this
paper,
we
consider
four
rows
explanation
methods—LIME,
SHAP,
Anchor,
Decision
Tree-based
Explanation—in
disentangling
decision-making
process
black
box
models
within
fields.
our
experiments,
use
datasets
that
cover
domains,
for
example,
health,
finance
image
classification,
compare
accuracy,
fidelity,
coverage,
precision
human
satisfaction
each
method.
Our
work
shows
rule
trees
approach
called
(Decision
explanation)
mostly
superior
comparison
other
non-model-specific
methods
performing
higher
coverage
regardless
classifier.
addition
this,
respondents
who
answered
qualitative
evaluation
indicated
they
were
very
content
with
decision
tree-based
explanations
these
types
are
easy
understandable.
Furthermore,
most
famous
sorts
clarifications
instinctive
significant.
The
over
discoveries
stretch
on
utilize
interpretable
strategies
facilitating
hole
between
understanding
thus
advancing
straightforwardness
responsibility
AI-driven
decision-making.
Patrick
Fernandes,
Daniel
Deutsch,
Mara
Finkelstein,
Parker
Riley,
André
Martins,
Graham
Neubig,
Ankush
Garg,
Jonathan
Clark,
Markus
Freitag,
Orhan
Firat.
Proceedings
of
the
Eighth
Conference
on
Machine
Translation.
2023.
Afra
Feyza
Akyurek,
Ekin
Ashwin
Kalyan,
Peter
Clark,
Derry
Tanti
Wijaya,
Niket
Tandon.
Proceedings
of
the
61st
Annual
Meeting
Association
for
Computational
Linguistics
(Volume
1:
Long
Papers).
2023.
Applied Mathematics and Nonlinear Sciences,
Год журнала:
2025,
Номер
10(1)
Опубликована: Янв. 1, 2025
Abstract
The
rapid
expansion
of
cross-national
e-commerce
has
brought
significant
opportunities
and
challenges
in
understanding
diverse
consumer
behavior.
This
study
introduces
an
innovative
framework
combining
the
XLSTM
(Extended
Long
Short-Term
Memory)
model
with
K-means
clustering
to
analyze
user
behavior
optimize
conversion
rates
on
global
platforms.
extends
traditional
LSTM
models
by
incorporating
multi-dimensional
cell
states,
attention
mechanisms,
improved
memory
capabilities,
enabling
it
effectively
capture
complex
temporal
cross-cultural
patterns.
integration
enhances
process
providing
high-quality
embeddings
that
lead
well-defined
stable
clusters.
Through
comprehensive
evaluations,
combined
approach
demonstrates
superior
performance
across
key
metrics,
including
Silhouette
Score,
Davies-Bouldin
Index
(DBI),
Adjusted
Rand
(ARI),
compared
standalone
algorithms
LSTM-based
methods.
Feature
importance
analysis
further
identifies
coupon
usage,
visit
frequency,
product
category
interest
as
most
influential
factors
purchase
decisions.
findings
highlight
potential
this
methodology
improve
engagement
marketing
strategies
for