Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2024,
Номер
38(10), С. 10847 - 10855
Опубликована: Март 24, 2024
Diffusion
models
(DM)
have
become
state-of-the-art
generative
because
of
their
capability
generating
high-quality
images
from
noises
without
adversarial
training.
However,
they
are
vulnerable
to
backdoor
attacks
as
reported
by
recent
studies.
When
a
data
input
(e.g.,
some
Gaussian
noise)
is
stamped
with
trigger
white
patch),
the
backdoored
model
always
generates
target
image
an
improper
photo).
effective
defense
strategies
mitigate
backdoors
DMs
underexplored.
To
bridge
this
gap,
we
propose
first
detection
and
removal
framework
for
DMs.
We
evaluate
our
Elijah
on
over
hundreds
3
types
including
DDPM,
NCSN
LDM,
13
samplers
against
existing
attacks.
Extensive
experiments
show
that
approach
can
close
100%
accuracy
reduce
effects
zero
significantly
sacrificing
utility.
Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2020,
Номер
34(07), С. 11957 - 11965
Опубликована: Апрель 3, 2020
With
the
success
of
deep
learning
algorithms
in
various
domains,
studying
adversarial
attacks
to
secure
models
real
world
applications
has
become
an
important
research
topic.
Backdoor
are
a
form
on
networks
where
attacker
provides
poisoned
data
victim
train
model
with,
and
then
activates
attack
by
showing
specific
small
trigger
pattern
at
test
time.
Most
state-of-the-art
backdoor
either
provide
mislabeled
poisoning
that
is
possible
identify
visual
inspection,
reveal
data,
or
use
noise
hide
trigger.
We
propose
novel
look
natural
with
correct
labels
also
more
importantly,
hides
keeps
secret
until
perform
extensive
study
image
classification
settings
show
our
can
fool
pasting
random
locations
unseen
images
although
performs
well
clean
data.
proposed
cannot
be
easily
defended
using
defense
algorithm
for
attacks.
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security,
Год журнала:
2019,
Номер
unknown, С. 1265 - 1282
Опубликована: Ноя. 6, 2019
This
paper
presents
a
technique
to
scan
neural
network
based
AI
models
determine
if
they
are
trojaned.
Pre-trained
may
contain
back-doors
that
injected
through
training
or
by
transforming
inner
neuron
weights.
These
trojaned
operate
normally
when
regular
inputs
provided,
and
mis-classify
specific
output
label
the
input
is
stamped
with
some
special
pattern
called
trojan
trigger.
We
develop
novel
analyzes
behaviors
determining
how
activations
change
we
introduce
different
levels
of
stimulation
neuron.
The
neurons
substantially
elevate
activation
particular
regardless
provided
considered
potentially
compromised.
Trojan
trigger
then
reverse-engineered
an
optimization
procedure
using
analysis
results,
confirm
truly
evaluate
our
system
ABS
on
177
various
attack
methods
target
both
space
feature
space,
have
sizes
shapes,
together
144
benign
trained
data
initial
weight
values.
belong
7
model
structures
6
datasets,
including
complex
ones
such
as
ImageNet,
VGG-Face
ResNet110.
Our
results
show
highly
effective,
can
achieve
over
90%
detection
rate
for
most
cases
(and
many
100%),
only
one
sample
each
label.
It
out-performs
state-of-the-art
Neural
Cleanse
requires
lot
samples
small
triggers
good
performance.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Год журнала:
2020,
Номер
unknown, С. 298 - 307
Опубликована: Июнь 1, 2020
The
unprecedented
success
of
deep
neural
networks
in
many
applications
has
made
these
a
prime
target
for
adversarial
exploitation.
In
this
paper,
we
introduce
benchmark
technique
detecting
backdoor
attacks
(aka
Trojan
attacks)
on
convolutional
(CNNs).
We
the
concept
Universal
Litmus
Patterns
(ULPs),
which
enable
one
to
reveal
by
feeding
universal
patterns
network
and
analyzing
output
(i.e.,
classifying
as
`clean'
or
`corrupted').
This
detection
is
fast
because
it
requires
only
few
forward
passes
through
CNN.
demonstrate
effectiveness
ULPs
thousands
with
different
architectures
trained
four
datasets,
namely
German
Traffic
Sign
Recognition
Benchmark
(GTSRB),
MNIST,
CIFAR10,
Tiny-ImageNet.
codes
train/test
models
paper
can
be
found
here:
https://umbcvision.github.io/Universal-Litmus-Patterns/.
arXiv (Cornell University),
Год журнала:
2019,
Номер
unknown
Опубликована: Янв. 1, 2019
A
trojan
backdoor
is
a
hidden
pattern
typically
implanted
in
deep
neural
network.
It
could
be
activated
and
thus
forces
that
infected
model
behaving
abnormally
only
when
an
input
data
sample
with
particular
trigger
present
fed
to
model.
As
such,
given
network
clean
samples,
it
very
challenging
inspect
determine
the
existence
of
backdoor.
Recently,
researchers
design
develop
several
pioneering
solutions
address
this
acute
problem.
They
demonstrate
proposed
techniques
have
great
potential
detection.
However,
we
show
none
these
existing
completely
On
one
hand,
they
mostly
work
under
unrealistic
assumption
(e.g.
assuming
availability
contaminated
training
database).
other
cannot
accurately
detect
backdoors,
nor
restore
high-fidelity
images,
especially
triggers
pertaining
vary
size,
shape
position.
In
work,
propose
TABOR,
new
detection
technique.
Conceptually,
formalizes
task
as
non-convex
optimization
problem,
resolving
through
objective
function.
Different
from
technique
also
modeling
TABOR
designs
function--under
guidance
explainable
AI
well
heuristics--that
guide
identify
more
effective
fashion.
addition,
defines
metric
measure
quality
identified.
Using
anomaly
method,
better
facilitate
intentionally
injected
filter
out
false
alarms......
Annual Computer Security Applications Conference,
Год журнала:
2021,
Номер
unknown
Опубликована: Дек. 6, 2021
Deep
neural
networks
(DNNs)
have
progressed
rapidly
during
the
past
decade
and
been
deployed
in
various
real-world
applications.
Meanwhile,
DNN
models
shown
to
be
vulnerable
security
privacy
attacks.
One
such
attack
that
has
attracted
a
great
deal
of
attention
recently
is
backdoor
attack.
Specifically,
adversary
poisons
target
model's
training
set
mislead
any
input
with
an
added
secret
trigger
class.
Previous
attacks
predominantly
focus
on
computer
vision
(CV)
applications,
as
image
classification.
In
this
paper,
we
perform
systematic
investigation
NLP
models,
propose
BadNL,
general
framework
including
novel
methods.
three
methods
construct
triggers,
namely
BadChar,
BadWord,
BadSentence,
basic
semantic-preserving
variants.
Our
achieve
almost
perfect
success
rate
negligible
effect
original
utility.
For
instance,
using
our
achieves
98.9%
yielding
utility
improvement
1.5%
SST-5
dataset
when
only
poisoning
3%
set.
Moreover,
conduct
user
study
prove
triggers
can
well
preserve
semantics
from
humans
perspective.
Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2021,
Номер
35(2), С. 1148 - 1156
Опубликована: Май 18, 2021
Trojan
(backdoor)
attack
is
a
form
of
adversarial
on
deep
neural
networks
where
the
attacker
provides
victims
with
model
trained/retrained
malicious
data.
The
backdoor
can
be
activated
when
normal
input
stamped
certain
pattern
called
trigger,
causing
misclassification.
Many
existing
trojan
attacks
have
their
triggers
being
space
patches/objects
(e.g.,
polygon
solid
color)
or
simple
transformations
such
as
Instagram
filters.
These
are
susceptible
to
recent
detection
algorithms.
We
propose
novel
feature
five
characteristics:
effectiveness,
stealthiness,
controllability,
robustness
and
reliance
features.
conduct
extensive
experiments
9
image
classifiers
various
datasets
including
ImageNet
demonstrate
these
properties
show
that
our
evade
state-of-the-art
defense.
Annual Computer Security Applications Conference,
Год журнала:
2020,
Номер
unknown
Опубликована: Дек. 7, 2020
We
propose
Februus;
a
new
idea
to
neutralize
highly
potent
and
insidious
Trojan
attacks
on
Deep
Neural
Network
(DNN)
systems
at
run-time.
In
attacks,
an
adversary
activates
backdoor
crafted
in
deep
neural
network
model
using
secret
trigger,
Trojan,
applied
any
input
alter
the
model's
decision
target
prediction---a
determined
by
only
known
attacker.
Februus
sanitizes
incoming
surgically
removing
potential
trigger
artifacts
restoring
for
classification
task.
enables
effective
mitigation
sanitizing
inputs
with
no
loss
of
performance
sanitized
inputs,
Trojaned
or
benign.
Our
extensive
evaluations
multiple
infected
models
based
four
popular
datasets
across
three
contrasting
vision
applications
types
demonstrate
high
efficacy
Februus.
dramatically
reduced
attack
success
rates
from
100%
near
0%
all
cases
(achieving
cases)
evaluated
generalizability
defend
against
complex
adaptive
attacks;
notably,
we
realized
first
defense
advanced
partial
attack.
To
best
our
knowledge,
is
method
operation
run-time
capable
without
requiring
anomaly
detection
methods,
retraining
costly
labeled
data.
IEEE Transactions on Image Processing,
Год журнала:
2022,
Номер
31, С. 5691 - 5705
Опубликована: Янв. 1, 2022
Recent
research
shows
deep
neural
networks
are
vulnerable
to
different
types
of
attacks,
such
as
adversarial
attack,
data
poisoning
attack
and
backdoor
attack.
Among
them,
is
the
most
cunning
one
can
occur
in
almost
every
stage
learning
pipeline.
Therefore,
has
attracted
lots
interests
from
both
academia
industry.
However,
existing
methods
either
visible
or
fragile
some
effortless
pre-processing
common
transformations.
To
address
these
limitations,
we
propose
a
robust
invisible
called
"Poison
Ink".
Concretely,
first
leverage
image
structures
target
areas,
fill
them
with
poison
ink
(information)
generate
trigger
pattern.
As
structure
keep
its
semantic
meaning
during
transformation,
pattern
inherently
Then
injection
network
embed
into
cover
achieve
stealthiness.
Compared
popular
methods,
Poison
Ink
outperforms
stealthiness
robustness.
Through
extensive
experiments,
demonstrate
not
only
general
datasets
architectures,
but
also
flexible
for
scenarios.
Besides,
it
very
strong
resistance
against
many
state-of-the-art
defense
techniques.
Neural
networks
have
become
increasingly
prevalent
in
many
real-world
applications
including
security
critical
ones.
Due
to
the
high
hardware
requirement
and
time
consumption
train
high-performance
neural
network
models,
users
often
outsource
training
a
machine-learning-as-a-service
(MLaaS)
provider.
This
puts
integrity
of
trained
model
at
risk.
In
2017,
Liu
et
al.
found
that,
by
mixing
data
with
few
malicious
samples
certain
trigger
pattern,
hidden
functionality
can
be
embedded
which
evoked
pattern
[33].
We
refer
this
kind
as
Trojans.
paper,
we
survey
myriad
Trojan
attack
defense
techniques
that
been
proposed
over
last
years.
insertion
attack,
attacker
MLaaS
provider
itself
or
third
party
capable
adding
tampering
data.
most
research
on
attacks,
selects
Trojan's
set
input
patterns
will
Trojan.
Training
poisoning
is
common
way
make
acquire
functionality.
embedding
methods
modify
algorithm
directly
interfere
network's
execution
binary
level
also
studied.
Defense
include
detecting
Trojans
and/or
patterns,
erasing
from
model,
bypassing
It
was
shown
carefully
crafted
used
mitigate
other
types
attacks.
systematize
above
approaches
paper.