2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2022,
Volume and Issue:
unknown, P. 13358 - 13368
Published: June 1, 2022
Backdoor
attacks
aim
to
cause
misclassification
of
a
subject
model
by
stamping
trigger
inputs.
Backdoors
could
be
injected
through
malicious
training
and
naturally
exist.
Deriving
backdoor
for
is
critical
both
attack
defense.
A
popular
inversion
method
optimization.
Existing
methods
are
based
on
finding
smallest
that
can
uniformly
flip
set
input
samples
minimizing
mask.
The
mask
defines
the
pixels
ought
perturbed.
We
develop
new
optimization
directly
minimizes
individual
pixel
changes,
without
using
Our
experiments
show
compared
existing
methods,
one
generate
triggers
require
smaller
number
perturbed,
have
higher
success
rate,
more
robust.
They
hence
desirable
when
used
in
real-world
effective
also
cost-effective.
arXiv (Cornell University),
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
A
recent
trojan
attack
on
deep
neural
network
(DNN)
models
is
one
insidious
variant
of
data
poisoning
attacks.
Trojan
attacks
exploit
an
effective
backdoor
created
in
a
DNN
model
by
leveraging
the
difficulty
interpretability
learned
to
misclassify
any
inputs
signed
with
attacker's
chosen
trigger.
Since
trigger
secret
guarded
and
exploited
attacker,
detecting
such
challenge,
especially
at
run-time
when
are
active
operation.
This
work
builds
STRong
Intentional
Perturbation
(STRIP)
based
detection
system
focuses
vision
system.
We
intentionally
perturb
incoming
input,
for
instance
superimposing
various
image
patterns,
observe
randomness
predicted
classes
perturbed
from
given
deployed
model---malicious
or
benign.
low
entropy
violates
input-dependence
property
benign
implies
presence
malicious
input---a
characteristic
trojaned
input.
The
high
efficacy
our
method
validated
through
case
studies
three
popular
contrasting
datasets:
MNIST,
CIFAR10
GTSRB.
achieve
overall
false
acceptance
rate
(FAR)
less
than
1%,
preset
rejection
(FRR)
different
types
triggers.
Using
GTSRB,
we
have
empirically
achieved
result
0%
both
FRR
FAR.
also
evaluated
STRIP
robustness
against
number
variants
adaptive
Proceedings of the AAAI Conference on Artificial Intelligence,
Journal Year:
2021,
Volume and Issue:
35(2), P. 1148 - 1156
Published: May 18, 2021
Trojan
(backdoor)
attack
is
a
form
of
adversarial
on
deep
neural
networks
where
the
attacker
provides
victims
with
model
trained/retrained
malicious
data.
The
backdoor
can
be
activated
when
normal
input
stamped
certain
pattern
called
trigger,
causing
misclassification.
Many
existing
trojan
attacks
have
their
triggers
being
space
patches/objects
(e.g.,
polygon
solid
color)
or
simple
transformations
such
as
Instagram
filters.
These
are
susceptible
to
recent
detection
algorithms.
We
propose
novel
feature
five
characteristics:
effectiveness,
stealthiness,
controllability,
robustness
and
reliance
features.
conduct
extensive
experiments
9
image
classifiers
various
datasets
including
ImageNet
demonstrate
these
properties
show
that
our
evade
state-of-the-art
defense.
We
propose
Februus;
a
new
idea
to
neutralize
highly
potent
and
insidious
Trojan
attacks
on
Deep
Neural
Network
(DNN)
systems
at
run-time.
In
attacks,
an
adversary
activates
backdoor
crafted
in
deep
neural
network
model
using
secret
trigger,
Trojan,
applied
any
input
alter
the
model's
decision
target
prediction---a
determined
by
only
known
attacker.
Februus
sanitizes
incoming
surgically
removing
potential
trigger
artifacts
restoring
for
classification
task.
enables
effective
mitigation
sanitizing
inputs
with
no
loss
of
performance
sanitized
inputs,
Trojaned
or
benign.
Our
extensive
evaluations
multiple
infected
models
based
four
popular
datasets
across
three
contrasting
vision
applications
types
demonstrate
high
efficacy
Februus.
dramatically
reduced
attack
success
rates
from
100%
near
0%
all
cases
(achieving
cases)
evaluated
generalizability
defend
against
complex
adaptive
attacks;
notably,
we
realized
first
defense
advanced
partial
attack.
To
best
our
knowledge,
is
method
operation
run-time
capable
without
requiring
anomaly
detection
methods,
retraining
costly
labeled
data.
2021 IEEE/CVF International Conference on Computer Vision (ICCV),
Journal Year:
2021,
Volume and Issue:
unknown
Published: Oct. 1, 2021
Although
deep
neural
networks
(DNNs)
have
made
rapid
progress
in
recent
years,
they
are
vulnerable
adversarial
environments.
A
malicious
backdoor
could
be
embedded
a
model
by
poisoning
the
training
dataset,
whose
intention
is
to
make
infected
give
wrong
predictions
during
inference
when
specific
trigger
appears.
To
mitigate
potential
threats
of
attacks,
various
detection
and
defense
methods
been
proposed.
However,
existing
techniques
usually
require
poisoned
data
or
access
white-box
model,
which
commonly
unavailable
practice.
In
this
paper,
we
propose
black-box
(B3D)
method
identify
attacks
with
only
query
model.
We
introduce
gradient-free
optimization
algorithm
reverse-engineer
for
each
class,
helps
reveal
existence
attacks.
addition
detection,
also
simple
strategy
reliable
using
identified
backdoored
models.
Extensive
experiments
on
hundreds
DNN
models
trained
several
datasets
corroborate
effectiveness
our
under
setting
against
IEEE Transactions on Image Processing,
Journal Year:
2022,
Volume and Issue:
31, P. 5691 - 5705
Published: Jan. 1, 2022
Recent
research
shows
deep
neural
networks
are
vulnerable
to
different
types
of
attacks,
such
as
adversarial
attack,
data
poisoning
attack
and
backdoor
attack.
Among
them,
is
the
most
cunning
one
can
occur
in
almost
every
stage
learning
pipeline.
Therefore,
has
attracted
lots
interests
from
both
academia
industry.
However,
existing
methods
either
visible
or
fragile
some
effortless
pre-processing
common
transformations.
To
address
these
limitations,
we
propose
a
robust
invisible
called
"Poison
Ink".
Concretely,
first
leverage
image
structures
target
areas,
fill
them
with
poison
ink
(information)
generate
trigger
pattern.
As
structure
keep
its
semantic
meaning
during
transformation,
pattern
inherently
Then
injection
network
embed
into
cover
achieve
stealthiness.
Compared
popular
methods,
Poison
Ink
outperforms
stealthiness
robustness.
Through
extensive
experiments,
demonstrate
not
only
general
datasets
architectures,
but
also
flexible
for
scenarios.
Besides,
it
very
strong
resistance
against
many
state-of-the-art
defense
techniques.
arXiv (Cornell University),
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
Deep
neural
networks
have
achieved
state-of-the-art
performance
on
various
tasks.
However,
lack
of
interpretability
and
transparency
makes
it
easier
for
malicious
attackers
to
inject
trojan
backdoor
into
the
networks,
which
will
make
model
behave
abnormally
when
a
sample
with
specific
trigger
is
input.
In
this
paper,
we
propose
NeuronInspect,
framework
detect
backdoors
in
deep
via
output
explanation
techniques.
NeuronInspect
first
identifies
existence
attack
targets
by
generating
heatmap
layer.
We
observe
that
generated
heatmaps
from
clean
backdoored
models
different
characteristics.
Therefore
extract
features
measure
attributes
explanations
an
attacked
namely:
sparse,
smooth
persistent.
combine
these
use
outlier
detection
figure
out
outliers,
set
targets.
demonstrate
effectiveness
efficiency
MNIST
digit
recognition
dataset
GTSRB
traffic
sign
dataset.
extensively
evaluate
scenarios
prove
better
robustness
over
techniques
Neural
Cleanse
great
margin.
arXiv (Cornell University),
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
Training
machine
learning
(ML)
models
is
expensive
in
terms
of
computational
power,
amounts
labeled
data
and
human
expertise.
Thus,
ML
constitute
intellectual
property
(IP)
business
value
for
their
owners.
Embedding
digital
watermarks
during
model
training
allows
a
owner
to
later
identify
case
theft
or
misuse.
However,
functionality
can
also
be
stolen
via
extraction,
where
an
adversary
trains
surrogate
using
results
returned
from
prediction
API
the
original
model.
Recent
work
has
shown
that
extraction
realistic
threat.
Existing
watermarking
schemes
are
ineffective
against
IP
since
it
who
In
this
paper,
we
introduce
DAWN
(Dynamic
Adversarial
Watermarking
Neural
Networks),
first
approach
use
deter
theft.
Unlike
prior
schemes,
does
not
impose
changes
process
but
operates
at
protected
model,
by
dynamically
changing
responses
small
subset
queries
(e.g.,
<0.5%)
clients.
This
set
watermark
will
embedded
client
uses
its
train
We
show
resilient
two
state-of-the-art
attacks,
effectively
all
extracted
models,
allowing
owners
reliably
demonstrate
ownership
(with
confidence
$>1-
2^{-64}$),
incurring
negligible
loss
accuracy
(0.03-0.5%).
arXiv (Cornell University),
Journal Year:
2020,
Volume and Issue:
unknown
Published: Jan. 1, 2020
Backdoor
attack
is
a
severe
security
threat
to
deep
neural
networks
(DNNs).
We
envision
that,
like
adversarial
examples,
there
will
be
cat-and-mouse
game
for
backdoor
attacks,
i.e.,
new
empirical
defenses
are
developed
defend
against
attacks
but
they
soon
broken
by
strong
adaptive
attacks.
To
prevent
such
game,
we
take
the
first
step
towards
certified
Specifically,
in
this
work,
study
feasibility
and
effectiveness
of
certifying
robustness
using
recent
technique
called
randomized
smoothing.
Randomized
smoothing
was
originally
certify
examples.
generalize
Our
results
show
theoretical
However,
also
find
that
existing
methods
have
limited
at
defending
which
highlight
needs
theory
Neural
networks
have
become
increasingly
prevalent
in
many
real-world
applications
including
security
critical
ones.
Due
to
the
high
hardware
requirement
and
time
consumption
train
high-performance
neural
network
models,
users
often
outsource
training
a
machine-learning-as-a-service
(MLaaS)
provider.
This
puts
integrity
of
trained
model
at
risk.
In
2017,
Liu
et
al.
found
that,
by
mixing
data
with
few
malicious
samples
certain
trigger
pattern,
hidden
functionality
can
be
embedded
which
evoked
pattern
[33].
We
refer
this
kind
as
Trojans.
paper,
we
survey
myriad
Trojan
attack
defense
techniques
that
been
proposed
over
last
years.
insertion
attack,
attacker
MLaaS
provider
itself
or
third
party
capable
adding
tampering
data.
most
research
on
attacks,
selects
Trojan's
set
input
patterns
will
Trojan.
Training
poisoning
is
common
way
make
acquire
functionality.
embedding
methods
modify
algorithm
directly
interfere
network's
execution
binary
level
also
studied.
Defense
include
detecting
Trojans
and/or
patterns,
erasing
from
model,
bypassing
It
was
shown
carefully
crafted
used
mitigate
other
types
attacks.
systematize
above
approaches
paper.
Training
deep
neural
networks
from
scratch
could
be
computationally
expensive
and
requires
a
lot
of
training
data.
Recent
work
has
explored
different
watermarking
techniques
to
protect
the
pre-trained
potential
copyright
infringements.
However,
these
vulnerable
watermark
removal
attacks.
In
this
work,
we
propose
REFIT,
unified
framework
based
on
fine-tuning,
which
does
not
rely
knowledge
watermarks,
is
effective
against
wide
range
schemes.
particular,
conduct
comprehensive
study
realistic
attack
scenario
where
adversary
limited
data,
been
emphasized
in
prior
attacks
To
effectively
remove
watermarks
without
compromising
model
functionality
under
weak
threat
model,
two
that
are
incorporated
into
our
fine-tuning
framework:
(1)
an
adaption
elastic
weight
consolidation
(EWC)
algorithm,
originally
proposed
for
mitigating
catastrophic
forgetting
phenomenon;
(2)
unlabeled
data
augmentation
(AU),
leverage
auxiliary
other
sources.
Our
extensive
evaluation
shows
effectiveness
REFIT
diverse
embedding
both
EWC
AU
significantly
decrease
amount
labeled
needed
removal,
samples
used
do
necessarily
need
drawn
same
distribution
as
benign
evaluation.
The
experimental
results
demonstrate
pose
real
threats
models,
thus
highlight
importance
further
investigating
problem
proposing
more
robust
schemes