ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Journal Year:
2024,
Volume and Issue:
unknown, P. 4680 - 4684
Published: March 18, 2024
Backdoor
attacks
pose
a
serious
security
threat
for
natural
language
processing
(NLP).
Backdoored
NLP
models
perform
normally
on
clean
text,
but
predict
the
attacker-specified
target
labels
text
containing
triggers.
Existing
word-level
textual
backdoor
rely
either
word
insertion
or
substitution.
Word-insertion
can
be
easily
detected
by
simple
defenses.
Meanwhile,
word-substitution
tend
to
substantially
degrade
fluency
and
semantic
consistency
of
poisoned
text.
In
this
paper,
we
propose
more
substitution
method
implement
covert
attacks.
Specifically,
combine
three
different
ways
construct
diverse
synonym
thesaurus
We
then
train
learnable
selector
producing
using
composite
loss
function
poison
fidelity
terms.
This
enables
automated
selection
minimal
critical
substitutions
necessary
induce
backdoor.
Experiments
demonstrate
our
achieves
high
attack
performance
with
less
impact
semantics.
hope
work
raise
awareness
regarding
subtle,
fluent
Backdoor
attacks
have
become
an
emerging
threat
to
NLP
systems.
By
providing
poisoned
training
data,
the
adversary
can
embed
a
"backdoor"
into
victim
model,
which
allows
input
instances
satisfying
certain
textual
patterns
(e.g.,
containing
keyword)
be
predicted
as
target
label
of
adversary's
choice.
In
this
paper,
we
demonstrate
that
it
is
possible
design
backdoor
attack
both
stealthy
(i.e.,
hard
notice)
and
effective
has
high
success
rate).
We
propose
BITE,
poisons
data
establish
strong
correlations
between
set
"trigger
words".
These
trigger
words
are
iteratively
identified
injected
target-label
through
natural
word-level
perturbations.
The
instruct
model
predict
on
inputs
words,
forming
backdoor.
Experiments
four
text
classification
datasets
show
our
proposed
significantly
more
than
baseline
methods
while
maintaining
decent
stealthiness,
raising
alarm
usage
untrusted
data.
further
defense
method
named
DeBITE
based
potential
word
removal,
outperforms
existing
in
defending
against
BITE
generalizes
well
handling
other
attacks.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Journal Year:
2022,
Volume and Issue:
unknown, P. 72 - 88
Published: Jan. 1, 2022
Recent
advances
in
federated
learning
have
demonstrated
its
promising
capability
to
learn
on
decentralized
datasets.
However,
a
considerable
amount
of
work
has
raised
concerns
due
the
potential
risks
adversaries
participating
framework
poison
global
model
for
an
adversarial
purpose.
This
paper
investigates
feasibility
poisoning
backdoor
attacks
through
rare
word
embeddings
NLP
models.
In
text
classification,
less
than
1%
adversary
clients
suffices
manipulate
output
without
any
drop
performance
clean
sentences.
For
complex
dataset,
mere
0.1%
is
enough
effectively.
We
also
propose
technique
specialized
scheme
called
gradient
ensemble,
which
enhances
all
experimental
settings.
IEEE Open Journal of the Computer Society,
Journal Year:
2023,
Volume and Issue:
4, P. 134 - 146
Published: Jan. 1, 2023
Backdoor
attacks
have
severely
threatened
deep
neural
network
(DNN)
models
in
the
past
several
years.
In
backdoor
attacks,
attackers
try
to
plant
hidden
backdoors
into
DNN
models,
either
training
or
inference
stage,
mislead
output
of
model
when
input
contains
some
specified
triggers
without
affecting
prediction
normal
inputs
not
containing
triggers.
As
a
rapidly
developing
topic,
numerous
works
on
designing
various
and
techniques
defend
against
such
been
proposed
recent
However,
comprehensive
holistic
overview
countermeasures
is
still
missing.
this
paper,
we
provide
systematic
design
defense
strategies
covering
latest
published
works.
We
review
representative
both
computer
vision
domain
other
domains,
discuss
their
pros
cons,
make
comparisons
among
them.
outline
key
challenges
be
addressed
potential
research
directions
future.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Journal Year:
2024,
Volume and Issue:
unknown, P. 4680 - 4684
Published: March 18, 2024
Backdoor
attacks
pose
a
serious
security
threat
for
natural
language
processing
(NLP).
Backdoored
NLP
models
perform
normally
on
clean
text,
but
predict
the
attacker-specified
target
labels
text
containing
triggers.
Existing
word-level
textual
backdoor
rely
either
word
insertion
or
substitution.
Word-insertion
can
be
easily
detected
by
simple
defenses.
Meanwhile,
word-substitution
tend
to
substantially
degrade
fluency
and
semantic
consistency
of
poisoned
text.
In
this
paper,
we
propose
more
substitution
method
implement
covert
attacks.
Specifically,
combine
three
different
ways
construct
diverse
synonym
thesaurus
We
then
train
learnable
selector
producing
using
composite
loss
function
poison
fidelity
terms.
This
enables
automated
selection
minimal
critical
substitutions
necessary
induce
backdoor.
Experiments
demonstrate
our
achieves
high
attack
performance
with
less
impact
semantics.
hope
work
raise
awareness
regarding
subtle,
fluent