Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2024,
Номер
38(10), С. 10847 - 10855
Опубликована: Март 24, 2024
Diffusion
models
(DM)
have
become
state-of-the-art
generative
because
of
their
capability
generating
high-quality
images
from
noises
without
adversarial
training.
However,
they
are
vulnerable
to
backdoor
attacks
as
reported
by
recent
studies.
When
a
data
input
(e.g.,
some
Gaussian
noise)
is
stamped
with
trigger
white
patch),
the
backdoored
model
always
generates
target
image
an
improper
photo).
effective
defense
strategies
mitigate
backdoors
DMs
underexplored.
To
bridge
this
gap,
we
propose
first
detection
and
removal
framework
for
DMs.
We
evaluate
our
Elijah
on
over
hundreds
3
types
including
DDPM,
NCSN
LDM,
13
samplers
against
existing
attacks.
Extensive
experiments
show
that
approach
can
close
100%
accuracy
reduce
effects
zero
significantly
sacrificing
utility.
Backdoor
attacks
have
become
a
major
security
threat
for
deploying
machine
learning
models
in
security-critical
applications.Existing
research
endeavors
proposed
many
defenses
against
backdoor
attacks.Despite
demonstrating
certain
empirical
defense
efficacy,
none
of
these
techniques
could
provide
formal
and
provable
guarantee
arbitrary
attacks.As
result,
they
can
be
easily
broken
by
strong
adaptive
attacks,
as
shown
our
evaluation.In
this
work,
we
propose
TextGuard,
the
first
on
text
classification.In
particular,
TextGuard
divides
(backdoored)
training
data
into
sub-training
sets,
achieved
splitting
each
sentence
sub-sentences.This
partitioning
ensures
that
majority
sets
do
not
contain
trigger.Subsequently,
base
classifier
is
trained
from
set,
their
ensemble
provides
final
prediction.We
theoretically
prove
when
length
trigger
falls
within
threshold,
guarantees
its
prediction
will
remain
unaffected
presence
triggers
testing
inputs.In
evaluation,
demonstrate
effectiveness
three
benchmark
classification
tasks,
surpassing
certification
accuracy
existing
certified
attacks.Furthermore,
additional
strategies
to
enhance
performance
TextGuard.Comparisons
with
state-ofthe-art
validate
superiority
countering
multiple
attacks.
arXiv (Cornell University),
Год журнала:
2019,
Номер
unknown
Опубликована: Янв. 1, 2019
As
companies
continue
to
invest
heavily
in
larger,
more
accurate
and
robust
deep
learning
models,
they
are
exploring
approaches
monetize
their
models
while
protecting
intellectual
property.
Model
licensing
is
promising,
but
requires
a
tool
for
owners
claim
ownership
of
i.e.
watermark.
Unfortunately,
current
designs
have
not
been
able
address
piracy
attacks,
where
third
parties
falsely
model
by
embedding
own
"pirate
watermarks"
into
an
already-watermarked
model.
We
observe
that
resistance
attacks
fundamentally
at
odds
with
the
use
incremental
training
embed
watermarks
models.
In
this
work,
we
propose
null
embedding,
new
way
build
piracy-resistant
DNNs
can
only
take
place
model's
initial
training.
A
takes
bit
string
(watermark
value)
as
input,
builds
strong
dependencies
between
normal
classification
accuracy
result,
attackers
cannot
remove
embedded
watermark
via
tuning
or
training,
add
pirate
already
watermarked
empirically
show
our
proposed
achieve
other
properties,
over
wide
range
tasks
Finally,
explore
number
adaptive
counter-measures,
remains
against
variety
modifications,
including
fine-tuning,
compression,
existing
methods
detect/remove
backdoors.
Our
also
amenable
transfer
without
losing
properties.
Palgrave studies in digital business & enabling technologies,
Год журнала:
2020,
Номер
unknown, С. 95 - 122
Опубликована: Янв. 1, 2020
Rapid
growth
in
the
amount
of
data
produced
by
IoT
sensors
and
devices
has
led
to
advent
edge
computing
wherein
is
processed
at
a
point
or
near
its
origin.
This
facilitates
lower
latency,
as
well
security
privacy
keeping
localized
node.
However,
due
issues
resource-constrained
hardware
software
heterogeneities,
most
systems
are
prone
large
variety
attacks.
Furthermore,
recent
trend
incorporating
intelligence
own
such
model
poisoning,
evasion
chapter
presents
discussion
on
pertinent
threats
intelligence.
Countermeasures
deal
with
then
discussed.
Lastly,
avenues
for
future
research
highlighted.
Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2023,
Номер
37(1), С. 506 - 515
Опубликована: Июнь 26, 2023
Vision
Transformers
(ViTs)
have
a
radically
different
architecture
with
significantly
less
inductive
bias
than
Convolutional
Neural
Networks.
Along
the
improvement
in
performance,
security
and
robustness
of
ViTs
are
also
great
importance
to
study.
In
contrast
many
recent
works
that
exploit
against
adversarial
examples,
this
paper
investigates
representative
causative
attack,
i.e.,
backdoor.
We
first
examine
vulnerability
various
backdoor
attacks
find
quite
vulnerable
existing
attacks.
However,
we
observe
clean-data
accuracy
attack
success
rate
respond
distinctively
patch
transformations
before
positional
encoding.
Then,
based
on
finding,
propose
an
effective
method
for
defend
both
patch-based
blending-based
trigger
via
processing.
The
performances
evaluated
several
benchmark
datasets,
including
CIFAR10,
GTSRB,
TinyImageNet,
which
show
proposedds
defense
is
very
successful
mitigating
ViTs.
To
best
our
knowledge,
presents
defensive
strategy
utilizes
unique
characteristic
Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2024,
Номер
38(10), С. 10847 - 10855
Опубликована: Март 24, 2024
Diffusion
models
(DM)
have
become
state-of-the-art
generative
because
of
their
capability
generating
high-quality
images
from
noises
without
adversarial
training.
However,
they
are
vulnerable
to
backdoor
attacks
as
reported
by
recent
studies.
When
a
data
input
(e.g.,
some
Gaussian
noise)
is
stamped
with
trigger
white
patch),
the
backdoored
model
always
generates
target
image
an
improper
photo).
effective
defense
strategies
mitigate
backdoors
DMs
underexplored.
To
bridge
this
gap,
we
propose
first
detection
and
removal
framework
for
DMs.
We
evaluate
our
Elijah
on
over
hundreds
3
types
including
DDPM,
NCSN
LDM,
13
samplers
against
existing
attacks.
Extensive
experiments
show
that
approach
can
close
100%
accuracy
reduce
effects
zero
significantly
sacrificing
utility.