Identifying open-texture in regulations using LLMs
Artificial Intelligence and Law,
Journal Year:
2025,
Volume and Issue:
unknown
Published: May 6, 2025
Language: Английский
Generative Calibration for In-context Learning
Zhongtao Jiang,
No information about this author
Yuanzhe Zhang,
No information about this author
Liu Cao
No information about this author
et al.
Published: Jan. 1, 2023
As
one
of
the
most
exciting
features
large
language
models
(LLMs),
in-context
learning
is
a
mixed
blessing.
While
it
allows
users
to
fast-prototype
task
solver
with
only
few
training
examples,
performance
generally
sensitive
various
configurations
prompt
such
as
choice
or
order
examples.
In
this
paper,
we
for
first
time
theoretically
and
empirically
identify
that
paradox
mainly
due
label
shift
model
data
distribution,
in
which
LLMs
marginal
p(y)
while
having
good
conditional
p(x|y).
With
understanding,
can
simply
calibrate
predictive
distribution
by
adjusting
marginal,
estimated
via
Monte-Carlo
sampling
over
model,
i.e.,
generation
LLMs.
We
call
our
approach
generative
calibration.
conduct
exhaustive
experiments
12
text
classification
tasks
scaling
from
774M
33B,
find
proposed
method
greatly
consistently
outperforms
ICL
well
state-of-the-art
calibration
methods,
up
27%
absolute
macro-F1.
Meanwhile,
also
stable
under
different
configurations.
Language: Английский
Comparable Demonstrations Are Important In In-Context Learning: A Novel Perspective On Demonstration Selection
Caoyun Fan,
No information about this author
Jidong Tian,
No information about this author
Yitian Li
No information about this author
et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Journal Year:
2024,
Volume and Issue:
unknown, P. 10436 - 10440
Published: March 18, 2024
In-Context
Learning
(ICL)
is
an
important
paradigm
for
adapting
Large
Language
Models
(LLMs)
to
downstream
tasks
through
a
few
demonstrations.
Despite
the
great
success
of
ICL,
limitation
demonstration
number
may
lead
bias,
i.e.
input-label
mapping
induced
by
LLMs
misunderstands
task's
essence.
Inspired
human
experience,
we
attempt
mitigate
such
bias
perspective
inter-demonstration
relationship.
Specifically,
construct
Comparable
Demonstrations
(CDs)
minimally
editing
texts
flip
corresponding
labels,
in
order
highlight
essence
and
eliminate
potential
spurious
correlations
comparison.
Through
series
experiments
on
CDs,
find
that
(1)
does
exist
LLMs,
CDs
can
significantly
reduce
bias;
(2)
exhibit
good
performance
especially
out-of-distribution
scenarios.
In
summary,
this
study
explores
ICL
mechanisms
from
novel
perspective,
providing
deeper
insight
into
selection
strategy
ICL.
Language: Английский
Labeling Radiology Report With GPT-4 Prompt Engineering: Comparative Study of in-Context Prompting (Preprint)
Songsoo Kim,
No information about this author
Donghyun Kim,
No information about this author
Hyunjoo Shin
No information about this author
et al.
Published: March 15, 2024
BACKGROUND
Large
language
models,
such
as
Generative
Pre-trained
Transformer-4
(GPT-4),
utilize
a
method
known
in-context
learning,
which
enhances
the
model's
responses
by
understanding
context
provided
within
input
text.
OBJECTIVE
This
study
aims
to
assess
labeling
efficacy
of
in
radiology
reports
and
validate
performance
enhancement
through
learning.
METHODS
In
this
retrospective
study,
were
obtained
utilizing
Medical
Information
Mart
for
Intensive
Care
III
(MIMIC-III)
database,
manually
labeled
two
radiologists
evaluation.
Two
experimental
prompts
defined
comparison:
“Basic
prompt,”
included
sections
“Task”
“Output,”
“In-context
added
“Context”
section
additional
information.
Labeling
experiments
conducted
on
head
CT
multi-label
classification
ten
predefined
labels
(mass,
hemorrhage,
infarct,
vascular,
white
matter,
volume
loss,
hydrocephalus,
pneumocephalus,
foreign
body,
fracture)
-
Experiment
1.
abdomen
actionable
findings
based
four
different
(gastrointestinal,
genitourinary,
musculoskeletal,
vascular)
2.
Precision,
recall,
F1-scores,
accuracy
compared
between
prompting
scenarios.
RESULTS
1,
most
labels,
In-context
demonstrated
notable
improvement
F1
scores
(up
0.658)
0.155),
except
hemorrhage
pneumocephalus
labels.
Statistically
significant
differences
observed
(vascular,
mass,
body).
For
2,
prompt
significantly
enhanced
(by
up
0.306)
0.107)
across
all
Basic
prompts.
CONCLUSIONS
Our
that
with
engineering
has
commendable
various
tasks
real-world
reports.
It
offers
flexible,
researcher-tailored
approach
using
Language: Английский
Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level
arXiv (Cornell University),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Jan. 1, 2023
Large
language
models
(LLMs)
with
chat-based
capabilities,
such
as
ChatGPT,
are
widely
used
in
various
workflows.
However,
due
to
a
limited
understanding
of
these
large-scale
models,
users
struggle
use
this
technology
and
experience
different
kinds
dissatisfaction.
Researchers
have
introduced
several
methods,
prompt
engineering,
improve
model
responses.
they
focus
on
enhancing
the
model's
performance
specific
tasks,
little
has
been
investigated
how
deal
user
dissatisfaction
resulting
from
Therefore,
ChatGPT
case
study,
we
examine
users'
along
their
strategies
address
After
organizing
LLM
into
seven
categories
based
literature
review,
collected
511
instances
dissatisfactory
responses
107
detailed
recollections
experiences,
which
released
publicly
accessible
dataset.
Our
analysis
reveals
that
most
frequently
when
fails
grasp
intentions,
while
rate
severity
related
accuracy
highest.
We
also
identified
four
tactics
employ
effectiveness.
found
often
do
not
any
dissatisfaction,
even
using
tactics,
72%
remained
unresolved.
Moreover,
low
knowledge
LLMs
tend
face
more
put
minimal
effort
addressing
Based
findings,
propose
design
implications
for
minimizing
usability
LLM.
Language: Английский