Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2024,
Номер
38(10), С. 10847 - 10855
Опубликована: Март 24, 2024
Diffusion
models
(DM)
have
become
state-of-the-art
generative
because
of
their
capability
generating
high-quality
images
from
noises
without
adversarial
training.
However,
they
are
vulnerable
to
backdoor
attacks
as
reported
by
recent
studies.
When
a
data
input
(e.g.,
some
Gaussian
noise)
is
stamped
with
trigger
white
patch),
the
backdoored
model
always
generates
target
image
an
improper
photo).
effective
defense
strategies
mitigate
backdoors
DMs
underexplored.
To
bridge
this
gap,
we
propose
first
detection
and
removal
framework
for
DMs.
We
evaluate
our
Elijah
on
over
hundreds
3
types
including
DDPM,
NCSN
LDM,
13
samplers
against
existing
attacks.
Extensive
experiments
show
that
approach
can
close
100%
accuracy
reduce
effects
zero
significantly
sacrificing
utility.
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security,
Год журнала:
2021,
Номер
unknown, С. 3141 - 3158
Опубликована: Ноя. 12, 2021
Pre-trained
general-purpose
language
models
have
been
a
dominating
component
in
enabling
real-world
natural
processing
(NLP)
applications.
However,
pre-trained
model
with
backdoor
can
be
severe
threat
to
the
Most
existing
attacks
NLP
are
conducted
fine-tuning
phase
by
introducing
malicious
triggers
targeted
class,
thus
relying
greatly
on
prior
knowledge
of
task.
In
this
paper,
we
propose
new
approach
map
inputs
containing
directly
predefined
output
representation
models,
e.g.,
for
classification
token
BERT,
instead
target
label.
It
introduce
wide
range
downstream
tasks
without
any
knowledge.
Additionally,
light
unique
properties
NLP,
two
metrics
measure
performance
terms
both
effectiveness
and
stealthiness.
Our
experiments
various
types
show
that
our
method
is
widely
applicable
different
(classification
named
entity
recognition)
(such
as
XLNet,
BART),
which
poses
threat.
Furthermore,
collaborating
popular
online
repository
Hugging
Face,
brought
has
confirmed.
Finally,
analyze
factors
may
affect
attack
share
insights
causes
success
attack.
Wenkai
Yang,
Yankai
Lin,
Peng
Li,
Jie
Zhou,
Xu
Sun.
Proceedings
of
the
59th
Annual
Meeting
Association
for
Computational
Linguistics
and
11th
International
Joint
Conference
on
Natural
Language
Processing
(Volume
1:
Long
Papers).
2021.
Training
deep
neural
networks
from
scratch
could
be
computationally
expensive
and
requires
a
lot
of
training
data.
Recent
work
has
explored
different
watermarking
techniques
to
protect
the
pre-trained
potential
copyright
infringements.
However,
these
vulnerable
watermark
removal
attacks.
In
this
work,
we
propose
REFIT,
unified
framework
based
on
fine-tuning,
which
does
not
rely
knowledge
watermarks,
is
effective
against
wide
range
schemes.
particular,
conduct
comprehensive
study
realistic
attack
scenario
where
adversary
limited
data,
been
emphasized
in
prior
attacks
To
effectively
remove
watermarks
without
compromising
model
functionality
under
weak
threat
model,
two
that
are
incorporated
into
our
fine-tuning
framework:
(1)
an
adaption
elastic
weight
consolidation
(EWC)
algorithm,
originally
proposed
for
mitigating
catastrophic
forgetting
phenomenon;
(2)
unlabeled
data
augmentation
(AU),
leverage
auxiliary
other
sources.
Our
extensive
evaluation
shows
effectiveness
REFIT
diverse
embedding
both
EWC
AU
significantly
decrease
amount
labeled
needed
removal,
samples
used
do
necessarily
need
drawn
same
distribution
as
benign
evaluation.
The
experimental
results
demonstrate
pose
real
threats
models,
thus
highlight
importance
further
investigating
problem
proposing
more
robust
schemes
Proceedings of the AAAI Conference on Artificial Intelligence,
Год журнала:
2024,
Номер
38(3), С. 1851 - 1859
Опубликована: Март 24, 2024
Backdoor
attacks
pose
serious
security
threats
to
deep
neural
networks
(DNNs).
Backdoored
models
make
arbitrarily
(targeted)
incorrect
predictions
on
inputs
containing
well-designed
triggers,
while
behaving
normally
clean
inputs.
Prior
researches
have
explored
the
invisibility
of
backdoor
triggers
enhance
attack
stealthiness.
However,
most
them
only
focus
in
spatial
domain,
neglecting
generation
invisible
frequency
domain.
This
limitation
renders
generated
poisoned
images
easily
detectable
by
recent
defense
methods.
To
address
this
issue,
we
propose
a
DUal
stealthy
BAckdoor
method
named
DUBA,
which
simultaneously
considers
both
and
domains,
achieve
desirable
performance,
ensuring
strong
Specifically,
first
use
Wavelet
Transform
embed
high-frequency
information
trigger
image
into
ensure
effectiveness.
Then,
attain
stealthiness,
incorporate
Fourier
Cosine
mix
Moreover,
DUBA
adopts
novel
strategy,
training
model
with
weak
attacking
further
performance
is
evaluated
extensively
four
datasets
against
popular
classifiers,
showing
significant
superiority
over
state-of-the-art
success
rate
2021 IEEE International Conference on Data Mining (ICDM),
Год журнала:
2020,
Номер
unknown, С. 162 - 171
Опубликована: Ноя. 1, 2020
A
trojan
backdoor
is
a
hidden
pattern
typically
implanted
in
deep
neural
network
(DNN).
It
could
be
activated
and
thus
forces
that
infected
model
to
behave
abnormally
when
an
input
sample
with
particular
trigger
fed
model.
As
such,
given
DNN
clean
samples,
it
challenging
inspect
determine
the
existence
of
backdoor.
Recently,
researchers
design
develop
several
pioneering
solutions
address
this
problem.
They
demonstrate
proposed
techniques
have
great
potential
detection.
However,
we
show
none
these
existing
completely
On
one
hand,
they
mostly
work
under
unrealistic
assumption
assuming
availability
contaminated
training
database.
other
can
neither
accurately
detect
backdoors,
nor
restore
high-fidelity
triggers,
especially
models
are
trained
high-dimensional
data,
triggers
pertaining
vary
size,
shape,
position.
In
work,
propose
TABOR,
new
detection
technique.
Conceptually,
formalizes
as
solving
optimization
objective
function.
Different
from
technique
which
also
problem,
TABOR
first
designs
function
guide
identify
more
correctly
accurately.
Second,
borrows
idea
interpretable
AI
further
prune
restored
triggers.
Last,
anomaly
method,
not
only
facilitate
identification
intentionally
injected
but
filter
out
false
alarms
(i.e.,
detected
uninfected
model).
We
train
112
DNNs
on
five
datasets
infect
two
attacks.
evaluate
by
using
models,
has
much
better
performance
restoration,
detection,
elimination
than
Neural
Cleanse,
state-of-the-art
Machine Learning and Knowledge Extraction,
Год журнала:
2021,
Номер
3(2), С. 333 - 356
Опубликована: Март 29, 2021
A
common
privacy
issue
in
traditional
machine
learning
is
that
data
needs
to
be
disclosed
for
the
training
procedures.
In
situations
with
highly
sensitive
such
as
healthcare
records,
accessing
this
information
challenging
and
often
prohibited.
Luckily,
privacy-preserving
technologies
have
been
developed
overcome
hurdle
by
distributing
computation
of
ensuring
their
owners.
The
distribution
multiple
participating
entities
introduces
new
complications
risks.
paper,
we
present
a
decentralised
workflow
facilitates
trusted
federated
among
participants.
Our
proof-of-concept
defines
trust
framework
instantiated
using
identity
being
under
Hyperledger
projects
Aries/Indy/Ursa.
Only
possession
Verifiable
Credentials
issued
from
appropriate
authorities
are
able
establish
secure,
authenticated
communication
channels
authorised
participate
related
mental
health
data.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,
Год журнала:
2021,
Номер
unknown
Опубликована: Янв. 1, 2021
Backdoor
attacks,
which
maliciously
control
a
well-trained
model’s
outputs
of
the
instances
with
specific
triggers,
are
recently
shown
to
be
serious
threats
safety
reusing
deep
neural
networks
(DNNs).
In
this
work,
we
propose
an
efficient
online
defense
mechanism
based
on
robustness-aware
perturbations.
Specifically,
by
analyzing
backdoor
training
process,
point
out
that
there
exists
big
gap
robustness
between
poisoned
and
clean
samples.
Motivated
observation,
construct
word-based
perturbation
distinguish
samples
from
defend
against
attacks
natural
language
processing
(NLP)
models.
Moreover,
give
theoretical
analysis
about
feasibility
our
perturbation-based
method.
Experimental
results
sentiment
toxic
detection
tasks
show
method
achieves
better
defending
performance
much
lower
computational
costs
than
existing
methods.
Our
code
is
available
at
https://github.com/lancopku/RAP.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Год журнала:
2022,
Номер
unknown, С. 13358 - 13368
Опубликована: Июнь 1, 2022
Backdoor
attacks
aim
to
cause
misclassification
of
a
subject
model
by
stamping
trigger
inputs.
Backdoors
could
be
injected
through
malicious
training
and
naturally
exist.
Deriving
backdoor
for
is
critical
both
attack
defense.
A
popular
inversion
method
optimization.
Existing
methods
are
based
on
finding
smallest
that
can
uniformly
flip
set
input
samples
minimizing
mask.
The
mask
defines
the
pixels
ought
perturbed.
We
develop
new
optimization
directly
minimizes
individual
pixel
changes,
without
using
Our
experiments
show
compared
existing
methods,
one
generate
triggers
require
smaller
number
perturbed,
have
higher
success
rate,
more
robust.
They
hence
desirable
when
used
in
real-world
effective
also
cost-effective.
Dataset
distillation
has
emerged
as
a
prominent
technique
to
improve
data
efficiency
when
training
machine
learning
models.It
encapsulates
the
knowledge
from
large
dataset
into
smaller
synthetic
dataset.A
model
trained
on
this
distilled
can
attain
comparable
performance
original
dataset.However,
existing
techniques
mainly
aim
at
achieving
best
trade-off
between
resource
usage
and
utility.The
security
risks
stemming
them
have
not
been
explored.This
study
performs
first
backdoor
attack
against
models
by
in
image
domain.Concretely,
we
inject
triggers
during
procedure
rather
than
stage,
where
all
previous
attacks
are
performed.We
propose
two
types
of
attacks,
namely
NAIVEATTACK
DOORPING.NAIVEATTACK
simply
adds
raw
initial
phase,
while
DOORPING
iteratively
updates
entire
procedure.We
conduct
extensive
evaluations
multiple
datasets,
architectures,
techniques.Empirical
evaluation
shows
that
achieves
decent
success
rate
(ASR)
scores
some
cases,
reaches
higher
ASR
(close
1.0)
cases.Furthermore,
comprehensive
ablation
analyze
factors
may
affect
performance.Finally,
evaluate
defense
mechanisms
our
show
practically
circumvent
these
mechanisms.
Proceedings on Privacy Enhancing Technologies,
Год журнала:
2022,
Номер
2022(3), С. 268 - 290
Опубликована: Июль 1, 2022
The
right
to
be
forgotten,
also
known
as
the
erasure,
is
of
individuals
have
their
data
erased
from
an
entity
storing
it.
status
this
long
held
notion
was
legally
solidified
recently
by
General
Data
Protection
Regulation
(GDPR)
in
European
Union.
As
a
consequence,
there
need
for
mechanisms
whereby
users
can
verify
if
service
providers
comply
with
deletion
requests.
In
work,
we
take
first
step
proposing
formal
framework,
called
Athena,
study
design
such
verification
requests
–
machine
unlearning
context
systems
that
provide
learning
(MLaaS).
Athena
allows
rigorous
quantification
any
mechanism
based
on
hypothesis
testing.
Furthermore,
propose
novel
leverages
backdoors
and
demonstrate
its
effectiveness
certifying
high
confidence,
thus
providing
basis
quantitatively
inferring
unlearning.
We
evaluate
our
approach
over
range
network
architectures
multi-layer
perceptrons
(MLP),
convolutional
neural
networks
(CNN),
residual
(ResNet),
short-term
memory
(LSTM)
6
different
datasets.
that:
(1)
has
minimal
effect
accuracy
ML
but
provides
confidence
unlearning,
even
multiple
employ
system
ascertain
compliance
requests,
(2)
robust
against
servers
deploying
state-of-the-art
backdoor
defense
methods.
Overall,
foundation
quantitative
analysis
verifying
which
support
legal
regulatory
frameworks
pertaining
users’