Negation
is
the
fundamental
component
in
a
natural
language
that
reverses
semantic
meaning
of
sentence.
It
plays
an
extremely
important
role
across
wide
range
applications,
yet
they
are
underrepresented
pre-trained
models
(LMs),
resulting
often
wrong
inferences.
In
this
work,
we
try
to
improve
underlying
understanding
negation
LMs.
To
augment
understanding,
propose
model
objective
with
weighted
cross-entropy
loss
and
elastic
weight
consolidation
regularization.
We
reduce
mean
top
1
error
rate
for
BERT-base
1.1%,
BERT-large
0.78%,
RoBERTA-base
3.74%,
RoBERTA-large
0.01%
on
negated
LAMA
dataset.
minimizes
BERT
by
margin
8%
also
outperform
existing
models.
provide
empirical
evidences
augmented
classical
original
as
well
benchmarks
inference
tasks.
Transactions of the Association for Computational Linguistics,
Journal Year:
2023,
Volume and Issue:
11, P. 652 - 670
Published: Jan. 1, 2023
Abstract
Current
language
models
can
generate
high-quality
text.
Are
they
simply
copying
text
have
seen
before,
or
learned
generalizable
linguistic
abstractions?
To
tease
apart
these
possibilities,
we
introduce
RAVEN,
a
suite
of
analyses
for
assessing
the
novelty
generated
text,
focusing
on
sequential
structure
(n-grams)
and
syntactic
structure.
We
apply
to
four
neural
trained
English
(an
LSTM,
Transformer,
Transformer-XL,
GPT-2).
For
local
structure—e.g.,
individual
dependencies—text
with
standard
sampling
scheme
is
substantially
less
novel
than
our
baseline
human-generated
from
each
model’s
test
set.
larger-scale
overall
sentence
structure—model-generated
as
even
more
baseline,
but
still
sometimes
copy
substantially,
in
some
cases
duplicating
passages
over
1,000
words
long
training
also
perform
extensive
manual
analysis,
finding
evidence
that
GPT-2
uses
both
compositional
analogical
generalization
mechanisms
showing
GPT-2’s
usually
well-formed
morphologically
syntactically
has
reasonably
frequent
semantic
issues
(e.g.,
being
self-contradictory).
ACS Central Science,
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 15, 2024
Efficient
prioritization
of
bioactive
compounds
from
high
throughput
screening
campaigns
is
a
fundamental
challenge
for
accelerating
drug
development
efforts.
In
this
study,
we
present
the
first
data-driven
approach
to
simultaneously
detect
assay
interferents
and
prioritize
true
compounds.
By
analyzing
learning
dynamics
during
training
gradient
boosting
model
on
noisy
data
using
novel
formulation
sample
influence,
are
able
distinguish
between
exhibiting
desired
biological
response
those
producing
artifacts.
Therefore,
our
method
enables
false
positive
detection
without
relying
prior
screens
or
interference
mechanisms,
making
it
applicable
any
campaign.
We
demonstrate
that
consistently
excludes
with
different
mechanisms
prioritizes
biologically
relevant
more
efficiently
than
all
tested
baselines,
including
retrospective
case
study
simulating
its
use
in
real
discovery
Finally,
tool
extremely
computationally
efficient,
requiring
less
30
s
per
low-resource
hardware.
As
such,
findings
show
an
ideal
addition
existing
tools
can
be
used
guide
further
pharmacological
optimization
after
campaigns.
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
Journal Year:
2024,
Volume and Issue:
unknown, P. 6437 - 6447
Published: Aug. 24, 2024
With
the
rapid
advancements
of
large
language
models
(LLMs),
information
retrieval
(IR)
systems,
such
as
search
engines
and
recommender
have
undergone
a
significant
paradigm
shift.
This
evolution,
while
heralding
new
opportunities,
introduces
emerging
challenges,
particularly
in
terms
biases
unfairness,
which
may
threaten
ecosystem.
In
this
paper,
we
present
comprehensive
survey
existing
works
on
pressing
bias
unfairness
issues
IR
systems
when
integration
LLMs.
We
first
unify
distribution
mismatch
problems,
providing
groundwork
for
categorizing
various
mitigation
strategies
through
alignment.
Subsequently,
systematically
delve
into
specific
arising
from
three
critical
stages
LLMs
systems:
data
collection,
model
development,
result
evaluation.
doing
so,
meticulously
review
analyze
recent
literature,
focusing
definitions,
characteristics,
corresponding
associated
with
these
issues.
Finally,
identify
highlight
some
open
problems
challenges
future
work,
aiming
to
inspire
researchers
stakeholders
field
beyond
better
understand
mitigate
LLM
era.
also
consistently
maintain
GitHub
repository
relevant
papers
resources
rising
direction
at
https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey.
AI & Society,
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 1, 2024
Abstract
The
growing
capabilities
of
artificial
intelligence
(AI)
word
processing
models
have
demonstrated
exceptional
potential
to
impact
language
related
tasks
and
functions.
Their
fast
pace
adoption
probable
effect
has
also
given
rise
controversy
within
certain
fields.
Models,
such
as
GPT-3,
are
a
particular
concern
for
professionals
engaged
in
writing,
particularly
their
engagement
with
these
technologies
is
limited
due
lack
ability
control
output.
Most
efforts
maximize
output
rely
on
process
known
prompt
engineering,
the
construction
modification
inputted
expectation
outputted
or
desired
text.
Consequently,
engineering
emerged
an
important
consideration
research
practice.
Previous
conceptions
largely
focused
technical
logistic
modifications
back-end
processing,
remaining
inaccessible
and,
still,
most
users.
In
this
paper,
we
look
communication
field
its
methods
text
generation—the
rhetorical
situation—to
conceptualize
more
comprehensible
way
users
by
considering
context
rhetoric.
We
introduce
framework,
consisting
formula,
which
demands
all
components
situation
be
present
prompt.
discussions
future
AI
writing
use
both
professional
educational
settings.
Ultimately,
discussion
findings
aim
provide
means
integrating
agency
writer-centric
tools
advance
human-in-the-loop
approach.
As
generative
especially
NLP-based
become
common
across
societal
functions,
will
play
crucial
role
not
just
technology,
but
productive
responsible
use.
Proceedings of the National Academy of Sciences,
Journal Year:
2025,
Volume and Issue:
122(19)
Published: May 9, 2025
What
mechanisms
underlie
linguistic
generalization
in
large
language
models
(LLMs)?
This
question
has
attracted
considerable
attention,
with
most
studies
analyzing
the
extent
to
which
skills
of
LLMs
resemble
rules.
As
yet,
it
is
not
known
whether
could
equally
well
be
explained
as
result
analogy.
A
key
shortcoming
prior
research
its
focus
on
regular
phenomena,
for
rule-based
and
analogical
approaches
make
same
predictions.
Here,
we
instead
examine
derivational
morphology,
specifically
English
adjective
nominalization,
displays
notable
variability.
We
introduce
a
method
investigating
LLMs:
Focusing
GPT-J,
fit
cognitive
that
instantiate
learning
LLM
training
data
compare
their
predictions
set
nonce
adjectives
those
LLM,
allowing
us
draw
direct
conclusions
regarding
underlying
mechanisms.
expected,
explain
GPT-J
nominalization
patterns.
However,
variable
patterns,
model
provides
much
better
match.
Furthermore,
GPT-J’s
behavior
sensitive
individual
word
frequencies,
even
forms,
consistent
an
account
but
one.
These
findings
refute
hypothesis
involves
rules,
suggesting
analogy
mechanism.
Overall,
our
study
suggests
processes
play
bigger
role
than
previously
thought.
JEADV Clinical Practice,
Journal Year:
2023,
Volume and Issue:
3(1), P. 258 - 265
Published: Oct. 27, 2023
Abstract
Background
The
potential
applications
of
artificial
intelligence
(AI)
in
dermatology
are
evolving
rapidly.
Chatbots
an
emerging
trend
healthcare
that
rely
on
large
language
models
(LLMs)
to
generate
answers
prompts
from
users.
However,
the
factuality
and
user
experience
(UX)
such
chatbots
remain
be
evaluated
context
dermato‐oncology.
Objectives
To
examine
Chat
Generative
Pretrained
Transformer
(ChatGPT)
as
a
reliable
source
information
actinic
keratosis
(AK)
evaluate
clinicians'
attitudes
UX
with
regard
chatbot.
Methods
A
set
38
clinical
questions
were
compiled
entered
natural
queries
separate,
individual
conversation
threads
ChatGPT
(OpenAI,
default
GPT
3.5).
Questions
pertain
patient
education,
diagnosis,
treatment.
ChatGPT's
responses
presented
panel
7
dermatologists
for
rating
factual
accuracy,
currency
information,
completeness
response.
Attitudes
towards
ChatGTP
explored
qualitatively
quantitatively
using
validated
questionnaire
(UEQ).
Results
answered
12
(31.6%)
accurate,
current,
complete
performed
best
including
pathogenesis
AK
risk
factors,
but
struggled
diagnosis
Major
deficits
seen
grading
AK,
providing
up‐to‐date
treatment
guidance,
asserting
incorrect
unwarranted
confidence.
Further,
considered
verbose
average
word
count
198
(SD
55)
overly
alarming
malignant
transformation.
Based
UEQ
responses,
expert
attractive
efficient
tool,
scoring
highest
speed
retrieval,
deemed
chatbot
inaccurate
verbose,
lowest
clarity
.
Conclusions
While
rated
high
UX,
underlying
LLMs
enable
require
further
development
guarantee
accuracy
concision
required
setting.
2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2023,
Volume and Issue:
unknown, P. 7158 - 7169
Published: Oct. 1, 2023
While
large
text-to-image
models
are
able
to
synthesize
"novel"
images,
these
images
necessarily
a
reflection
of
the
training
data.
The
problem
data
attribution
in
such
–
which
set
most
responsible
for
appearance
given
generated
image
is
difficult
yet
important
one.
As
an
initial
step
toward
this
problem,
we
evaluate
through
"customization"
methods,
tune
existing
large-scale
model
exemplar
object
or
style.
Our
key
insight
that
allow
us
efficiently
create
synthetic
computationally
influenced
by
construction.
With
our
new
dataset
exemplar-influenced
various
algorithms
and
different
possible
feature
spaces.
Furthermore,
on
dataset,
can
standard
models,
as
DINO,
CLIP,
ViT,
problem.
Even
though
procedure
tuned
towards
small
sets,
show
generalization
larger
sets.
Finally,
taking
into
account
inherent
uncertainty
assign
soft
scores
over
images.
Large
language
models
(LLMs)
often
make
factually
incorrect
responses
despite
their
success
in
various
applications.
In
this
paper,
we
hypothesize
that
relying
heavily
on
simple
co-occurrence
statistics
of
the
pre-training
corpora
is
one
main
factors
cause
factual
errors.
Our
results
reveal
LLMs
are
vulnerable
to
bias,
defined
as
preferring
frequently
co-occurred
words
over
correct
answer.
Consequently,
struggle
recall
facts
whose
subject
and
object
rarely
co-occur
dataset
although
they
seen
during
finetuning.
We
show
bias
remains
scaling
up
model
sizes
or
Therefore,
suggest
finetuning
a
debiased
mitigate
by
filtering
out
biased
samples
subject-object
count
high.
Although
allows
memorize
rare
training
set,
it
not
effective
recalling
unseen
Further
research
mitigation
will
help
build
reliable
preventing
potential
The
code
available
at
https://github.com/CheongWoong/impact_of_cooccurrence.