Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Год журнала:
2023,
Номер
unknown, С. 2604 - 2620
Опубликована: Янв. 1, 2023
Recent
work
has
shown
the
promise
of
learning
with
human
feedback
paradigms
to
produce
human-determined
high-quality
text.
Existing
works
use
train
large
language
models
(LLMs)
in
general
domain
abstractive
summarization
and
have
obtained
summary
quality
exceeding
traditional
likelihood
training.
In
this
paper,
we
focus
on
a
less
explored
form
–
Human
Edits.
We
propose
Sequence
Alignment
(un)Likelihood
Training
(SALT),
novel
technique
both
human-edited
model-generated
data
together
training
loop.
addition,
demonstrate
simulating
Edits
ground
truth
summaries
coming
from
existing
Imitation
edits,
along
after
training,
reduce
need
for
expensive
human-edit
data.
our
experiments,
extend
exploration
medical
summarization.
Our
results
effectiveness
SALT
improving
Through
additional
show
that
outperforms
conventional
RLHF
method
(designed
preferences)
DPO,
when
applied
hope
evidence
paper
prompts
researchers
explore,
collect,
better
different
approaches
scalably.
Transactions of the Association for Computational Linguistics,
Год журнала:
2024,
Номер
12, С. 484 - 506
Опубликована: Янв. 1, 2024
Abstract
While
large
language
models
(LLMs)
have
shown
remarkable
effectiveness
in
various
NLP
tasks,
they
are
still
prone
to
issues
such
as
hallucination,
unfaithful
reasoning,
and
toxicity.
A
promising
approach
rectify
these
flaws
is
correcting
LLMs
with
feedback,
where
the
LLM
itself
prompted
or
guided
feedback
fix
problems
its
own
output.
Techniques
leveraging
automated
feedback—either
produced
by
(self-correction)
some
external
system—are
of
particular
interest
make
LLM-based
solutions
more
practical
deployable
minimal
human
intervention.
This
paper
provides
an
exhaustive
review
recent
advances
categorizing
them
into
training-time,
generation-time,
post-hoc
approaches.
We
also
identify
potential
challenges
future
directions
this
emerging
field.
Transactions of the Association for Computational Linguistics,
Год журнала:
2023,
Номер
11, С. 1643 - 1668
Опубликована: Янв. 1, 2023
Abstract
Natural
language
generation
has
witnessed
significant
advancements
due
to
the
training
of
large
models
on
vast
internet-scale
datasets.
Despite
these
advancements,
there
exists
a
critical
challenge:
These
can
inadvertently
generate
content
that
is
toxic,
inaccurate,
and
unhelpful,
existing
automatic
evaluation
metrics
often
fall
short
identifying
shortcomings.
As
become
more
capable,
human
feedback
an
invaluable
signal
for
evaluating
improving
models.
This
survey
aims
provide
overview
recent
research
leveraged
improve
natural
generation.
First,
we
introduce
taxonomy
distilled
from
categorize
organize
varied
forms
feedback.
Next,
discuss
how
be
described
by
its
format
objective,
cover
two
approaches
proposed
use
(either
or
decoding):
directly
using
We
also
datasets
human-feedback
data
collection,
concerns
surrounding
collection.
Finally,
nascent
field
AI
feedback,
which
uses
make
judgments
based
set
principles
minimize
need
intervention.
release
website
this
at
feedback-gap-survey.info.
Proceedings of the National Academy of Sciences,
Год журнала:
2024,
Номер
121(24)
Опубликована: Июнь 3, 2024
There
is
much
excitement
about
the
opportunity
to
harness
power
of
large
language
models
(LLMs)
when
building
problem-solving
assistants.
However,
standard
methodology
evaluating
LLMs
relies
on
static
pairs
inputs
and
outputs;
this
insufficient
for
making
an
informed
decision
which
are
best
use
in
interactive
setting,
how
that
varies
by
setting.
Static
assessment
therefore
limits
we
understand
model
capabilities.
We
introduce
CheckMate,
adaptable
prototype
platform
humans
interact
with
evaluate
LLMs.
conduct
a
study
CheckMate
three
(InstructGPT,
ChatGPT,
GPT-4)
as
assistants
proving
undergraduate-level
mathematics,
mixed
cohort
participants
from
undergraduate
students
professors
mathematics.
release
resulting
interaction
rating
dataset,
MathConverse.
By
analyzing
MathConverse,
derive
taxonomy
human
query
behaviors
uncover
despite
generally
positive
correlation,
there
notable
instances
divergence
between
correctness
perceived
helpfulness
LLM
generations,
among
other
findings.
Further,
garner
more
granular
understanding
GPT-4
mathematical
through
series
case
studies,
contributed
experienced
mathematicians.
conclude
actionable
takeaways
ML
practitioners
mathematicians:
communicate
uncertainty,
respond
well
user
corrections,
can
provide
concise
rationale
their
recommendations,
may
constitute
better
Humans
should
inspect
output
carefully
given
current
shortcomings
potential
surprising
fallibility.
Existing
text
summarization
systems
have
made
significant
progress
in
recent
years,
but
typically
generate
summaries
a
single
step.
The
one-shot
setting
is
sometimes
inadequate,
however,
as
the
generated
summary
may
contain
hallucinations
or
overlook
important
details
related
to
reader's
interests.
In
this
paper,
we
address
limitation
by
proposing
SummIt,
an
iterative
framework
based
on
large
language
models
like
ChatGPT.
Our
enables
model
refine
iteratively
through
self-evaluation
and
feedback,
closely
resembling
process
humans
undertake
when
drafting
revising
summaries.
Furthermore,
explore
potential
benefits
of
integrating
knowledge
topic
extractors
into
enhance
faithfulness
controllability.
We
evaluate
performance
our
three
benchmark
datasets
empirical
qualitative
analyses.
also
conduct
human
evaluation
validate
effectiveness
model's
refinements
find
issue
over-correction.
ACM Transactions on Intelligent Systems and Technology,
Год журнала:
2025,
Номер
unknown
Опубликована: Фев. 18, 2025
Generative
Artificial
Intelligence
(GenAI)
has
recently
gained
immense
popularity
by
offering
various
applications
for
generating
high-quality
and
aesthetically
pleasing
content
of
image,
3D,
video
data
format.
The
innovative
GenAI
solutions
have
shifted
paradigms
across
design-related
industries,
particularly
fashion.
In
this
paper,
we
explore
the
incorporation
into
fashion-related
tasks
applications.
Our
examination
encompasses
a
thorough
review
more
than
470
research
papers
an
in-depth
analysis
over
300
applications,
focusing
on
their
contributions
to
field.
These
are
identified
as
13
within
four
categories:
multi-modal
fashion
understanding,
synthesis
dynamic
(video
animatable
3D)
formats
We
delve
these
methods,
recognizing
potential
propel
future
endeavours
toward
achieving
state-of-the-art
(SOTA)
performance.
Furthermore,
present
comprehensive
overview
53
publicly
available
datasets
suitable
training
benchmarking
fashion-centric
models,
accompanied
relevant
evaluation
metrics.
Finally,
real-world
unveiling
existing
challenges
directions.
With
investigation
analysis,
paper
is
targeted
serve
useful
resource
understanding
current
landscape
in
fashion,
paving
way
innovations
Papers
discussed
along
with
public
code
links
at:
https://github.com/wendashi/Cool-GenAI-Fashion-Papers/
.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Год журнала:
2023,
Номер
unknown, С. 6591 - 6616
Опубликована: Янв. 1, 2023
Language
Models
(LMs)
have
shown
impressive
performance
in
various
natural
language
tasks.
However,
when
it
comes
to
reasoning,
LMs
still
face
challenges
such
as
hallucination,
generating
incorrect
intermediate
reasoning
steps,
and
making
mathematical
errors.
Recent
research
has
focused
on
enhancing
through
*self-improvement*
using
feedback.
Nevertheless,
existing
approaches
relying
a
single
generic
feedback
source
fail
address
the
diverse
error
types
found
LM-generated
chains.
In
this
work,
we
propose
**Multi-Aspect
Feedback**,
an
iterative
refinement
framework
that
integrates
multiple
modules,
including
frozen
external
tools,
each
focusing
specific
category.
Our
experimental
results
demonstrate
efficacy
of
our
approach
addressing
several
errors
chain
thus
improving
overall
LM
We
see
improvement
up
20%
Mathematical
Reasoning
18%
Logical
Entailment.
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval,
Год журнала:
2024,
Номер
unknown, С. 3074 - 3074
Transactions of the Association for Computational Linguistics,
Год журнала:
2024,
Номер
12, С. 1417 - 1440
Опубликована: Янв. 1, 2024
Abstract
Self-correction
is
an
approach
to
improving
responses
from
large
language
models
(LLMs)
by
refining
the
using
LLMs
during
inference.
Prior
work
has
proposed
various
self-correction
frameworks
different
sources
of
feedback,
including
self-evaluation
and
external
feedback.
However,
there
still
no
consensus
on
question
when
can
correct
their
own
mistakes,
as
recent
studies
also
report
negative
results.
In
this
work,
we
critically
survey
broad
papers
discuss
conditions
required
for
successful
self-correction.
We
first
find
that
prior
often
do
not
define
research
questions
in
detail
involve
impractical
or
unfair
evaluations
over-evaluate
To
tackle
these
issues,
categorize
provide
a
checklist
designing
appropriate
experiments.
Our
critical
based
newly
categorized
shows
(1)
demonstrates
with
feedback
prompted
LLMs,
except
tasks
are
exceptionally
suited
self-correction,
(2)
works
well
use
reliable
(3)
large-scale
fine-tuning
enables