2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Год журнала:
2022,
Номер
unknown, С. 14983 - 14993
Опубликована: Июнь 1, 2022
Many
existing
backdoor
scanners
work
by
finding
a
small
and
fixed
trigger.
However,
advanced
attacks
have
large
pervasive
triggers,
rendering
less
effective.
We
develop
new
detection
method.
It
first
uses
trigger
inversion
technique
to
generate
namely,
universal
input
patterns
flipping
victim
class
samples
target
class.
then
checks
if
any
such
is
composed
of
features
that
are
not
natural
distinctive
between
the
classes.
based
on
novel
symmetric
feature
differencing
method
identifies
separating
two
sets
(e.g.,
from
respective
classes).
evaluate
number
including
composite
attack,
reflection
hidden
filter
also
traditional
patch
attack.
The
evaluation
thousands
models,
both
clean
trojaned
with
various
architectures.
compare
three
state-of-the-art
scanners.
Our
can
achieve
80-88%
accuracy
while
baselines
only
50-70%
complex
attacks.
results
TrojAI
competition
rounds
2–4,
which
backdoors
backdoors,
show
may
produce
hundreds
false
positives
(i.e.,
models
recognized
as
trojaned),
our
removes
78-100%
them
increase
negatives
0-30%,
leading
17-41%
overall
improvement.
This
allows
us
top
performance
leaderboard.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Год журнала:
2022,
Номер
unknown, С. 13337 - 13347
Опубликована: Июнь 1, 2022
One
major
goal
of
the
AI
security
community
is
to
securely
and
reliably
produce
deploy
deep
learning
models
for
real-world
applications.
To
this
end,
data
poisoning
based
backdoor
attacks
on
neural
networks
(DNNs)
in
production
stage
(or
training
stage)
corresponding
defenses
are
extensively
explored
recent
years.
Ironically,
deployment
stage,
which
can
often
happen
unprofessional
users'
devices
thus
arguably
far
more
threatening
scenarios,
draw
much
less
attention
community.
We
attribute
imbalance
vigilance
weak
practicality
existing
deployment-stage
attack
algorithms
insufficiency
demonstrations.
fill
blank,
work,
we
study
realistic
threat
DNNs.
base
our
a
commonly
used
paradigm
-
adversarial
weight
attack,
where
adversaries
selectively
modify
model
weights
embed
into
deployed
approach
practicality,
propose
first
gray-box
physically
realizable
algorithm
injection,
namely
subnet
replacement
(SRA),
only
requires
architecture
information
victim
support
physical
triggers
real
world.
Extensive
experimental
simulations
system-level
real-
world
demonstrations
conducted.
Our
results
not
suggest
effectiveness
proposed
algorithm,
but
also
reveal
practical
risk
novel
type
computer
virus
that
may
widely
spread
stealthily
inject
DNN
user
devices.
By
study,
call
vulnerability
DNNs
stage.
Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2023,
Номер
37(4), С. 5257 - 5265
Опубликована: Июнь 26, 2023
The
frustratingly
fragile
nature
of
neural
network
models
make
current
natural
language
generation
(NLG)
systems
prone
to
backdoor
attacks
and
generate
malicious
sequences
that
could
be
sexist
or
offensive.
Unfortunately,
little
effort
has
been
invested
how
can
affect
NLG
defend
against
these
attacks.
In
this
work,
by
giving
a
formal
definition
attack
defense,
we
investigate
problem
on
two
important
tasks,
machine
translation
dialog
generation.
Tailored
the
inherent
(e.g.,
producing
sequence
coherent
words
given
contexts),
design
defending
strategies
We
find
testing
backward
probability
generating
sources
targets
yields
effective
defense
performance
all
different
types
attacks,
is
able
handle
one-to-many
issue
in
many
tasks
such
as
hope
work
raise
awareness
risks
concealed
deep
inspire
more
future
(both
defense)
towards
direction.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Год журнала:
2022,
Номер
unknown, С. 14983 - 14993
Опубликована: Июнь 1, 2022
Many
existing
backdoor
scanners
work
by
finding
a
small
and
fixed
trigger.
However,
advanced
attacks
have
large
pervasive
triggers,
rendering
less
effective.
We
develop
new
detection
method.
It
first
uses
trigger
inversion
technique
to
generate
namely,
universal
input
patterns
flipping
victim
class
samples
target
class.
then
checks
if
any
such
is
composed
of
features
that
are
not
natural
distinctive
between
the
classes.
based
on
novel
symmetric
feature
differencing
method
identifies
separating
two
sets
(e.g.,
from
respective
classes).
evaluate
number
including
composite
attack,
reflection
hidden
filter
also
traditional
patch
attack.
The
evaluation
thousands
models,
both
clean
trojaned
with
various
architectures.
compare
three
state-of-the-art
scanners.
Our
can
achieve
80-88%
accuracy
while
baselines
only
50-70%
complex
attacks.
results
TrojAI
competition
rounds
2–4,
which
backdoors
backdoors,
show
may
produce
hundreds
false
positives
(i.e.,
models
recognized
as
trojaned),
our
removes
78-100%
them
increase
negatives
0-30%,
leading
17-41%
overall
improvement.
This
allows
us
top
performance
leaderboard.