Deep
Neural
Networks
(DNNs)
are
known
to
be
vulnerable
backdoor
attacks.
In
Natural
Language
Processing
(NLP),
DNNs
often
backdoored
during
the
fine-tuning
process
of
a
large-scale
Pre-trained
Model
(PLM)
with
poisoned
samples.
Although
clean
weights
PLMs
readily
available,
existing
methods
have
ignored
this
information
in
defending
NLP
models
against
work,
we
take
first
step
exploit
pre-trained
(unfine-tuned)
mitigate
backdoors
fine-tuned
language
models.
Specifically,
leverage
via
two
complementary
techniques:
(1)
two-step
Fine-mixing
technique,
which
mixes
(fine-tuned
on
data)
weights,
then
fine-tunes
mixed
small
subset
data;
(2)
an
Embedding
Purification
(E-PUR)
mitigates
potential
word
embeddings.
We
compare
typical
mitigation
three
single-sentence
sentiment
classification
tasks
and
sentence-pair
show
that
it
outperforms
baselines
by
considerable
margin
all
scenarios.
also
our
E-PUR
method
can
benefit
methods.
Our
work
establishes
simple
but
strong
baseline
defense
for
secure
IEEE Transactions on Neural Networks and Learning Systems,
Journal Year:
2022,
Volume and Issue:
35(7), P. 8726 - 8746
Published: Nov. 10, 2022
As
data
are
increasingly
being
stored
in
different
silos
and
societies
becoming
more
aware
of
privacy
issues,
the
traditional
centralized
training
artificial
intelligence
(AI)
models
is
facing
efficiency
challenges.
Recently,
federated
learning
(FL)
has
emerged
as
an
alternative
solution
continues
to
thrive
this
new
reality.
Existing
FL
protocol
designs
have
been
shown
be
vulnerable
adversaries
within
or
outside
system,
compromising
system
robustness.
Besides
powerful
global
models,
it
paramount
importance
design
systems
that
guarantees
resistant
types
adversaries.
In
article,
we
conduct
a
comprehensive
survey
on
robustness
over
past
five
years.
Through
concise
introduction
concept
unique
taxonomy
covering:
1)
threat
models;
2)
attacks
defenses;
3)
poisoning
defenses,
provide
accessible
review
important
topic.
We
highlight
intuitions,
key
techniques,
fundamental
assumptions
adopted
by
various
defenses.
Finally,
discuss
promising
future
research
directions
toward
robust
privacy-preserving
FL,
their
interplays
with
multidisciplinary
goals
FL.
Recent
studies
have
revealed
that
Backdoor
Attacks
can
threaten
the
safety
of
natural
language
processing
(NLP)
models.
Investigating
strategies
backdoor
attacks
will
help
to
understand
model's
vulnerability.
Most
existing
textual
focus
on
generating
stealthy
triggers
or
modifying
model
weights.
In
this
paper,
we
directly
target
interior
structure
neural
networks
and
mechanism.
We
propose
a
novel
Trojan
Attention
Loss
(TAL),
which
enhances
behavior
by
manipulating
attention
patterns.
Our
loss
be
applied
different
attacking
methods
boost
their
attack
efficacy
in
terms
successful
rates
poisoning
rates.
It
applies
not
only
traditional
dirty-label
attacks,
but
also
more
challenging
clean-label
attacks.
validate
our
method
backbone
models
(BERT,
RoBERTa,
DistilBERT)
various
tasks
(Sentiment
Analysis,
Toxic
Detection,
Topic
Classification).
IEEE Internet of Things Journal,
Journal Year:
2024,
Volume and Issue:
11(15), P. 25543 - 25557
Published: Feb. 19, 2024
There
is
an
urgent
need
to
address
the
effective
diagnosis
of
multiple
diseases
across
various
medical
institutions
while
ensuring
privacy
data
in
IoT
environments.
This
requires
model
have
ability
zero-shot
generalization,
which
can
not
be
satisfied
by
existing
models.
To
this
issue,
we
propose
a
two-stage
for
image
diagnosis,
based
on
decoupling
and
decorrelating.
An
adversarial
architecture
built
using
gradient
reversal
discriminator
improve
model's
robustness.
further
mixed
correlation
within
domain-invariant
features
achieved
disentanglement,
mitigate
feature
dependency
through
sample
weighting.
The
effectiveness
validated
both
diabetic
retinopathy
skin
lesion
datasets.
For
cross-dataset
experiment,
select
two
datasets
symmetric
reserve
remaining
dataset
as
test
set.
analogous
real-world
scenarios,
where
all
samples
labels
are
completely
unknown
model.
experiments
show
that
achieves
excellent
performance
outperforms
baselines
most
metrics,
demonstrate
our
approach
issue
multi-center
IoT,
with
focus
enhancing
diagnostic
accuracy
security.
IEEE Transactions on Information Forensics and Security,
Journal Year:
2024,
Volume and Issue:
19, P. 5852 - 5866
Published: Jan. 1, 2024
Deep
neural
networks
(DNNs)
have
been
widely
and
successfully
adopted
deployed
in
various
applications
of
speech
recognition.
Recently,
a
few
works
revealed
that
these
models
are
vulnerable
to
backdoor
attacks,
where
the
adversaries
can
implant
malicious
prediction
behaviors
into
victim
by
poisoning
their
training
process.
In
this
paper,
we
revisit
poison-only
attacks
against
We
reveal
existing
methods
not
stealthy
since
trigger
patterns
perceptible
humans
or
machine
detection.
This
limitation
is
mostly
because
simple
noises
separable
distinctive
clips.
Motivated
findings,
propose
exploit
elements
sound
(
e.g
.,
pitch
timbre)
design
more
yet
effective
attacks.
Specifically,
insert
short-duration
high-pitched
signal
as
increase
remaining
audio
clips
'mask'
it
for
designing
pitch-based
triggers.
manipulate
timbre
features
timbre-based
attack
voiceprint
selection
module
facilitate
multi-backdoor
attack.
Our
generate
'natural'
poisoned
samples
therefore
stealthy.
Extensive
experiments
conducted
on
benchmark
datasets,
which
verify
effectiveness
our
under
different
settings
all-to-one,
all-to-all,
clean-label,
physical,
settings)
stealthiness.
achieve
success
rates
over
95%
most
cases
nearly
undetectable.
The
code
reproducing
main
available
at
https://github.com/HanboCai/BadSpeech_SoE.
ACM Computing Surveys,
Journal Year:
2024,
Volume and Issue:
57(4), P. 1 - 35
Published: Nov. 15, 2024
Since
the
emergence
of
security
concerns
in
artificial
intelligence
(AI),
there
has
been
significant
attention
devoted
to
examination
backdoor
attacks.
Attackers
can
utilize
attacks
manipulate
model
predictions,
leading
potential
harm.
However,
current
research
on
and
defenses
both
theoretical
practical
fields
still
many
shortcomings.
To
systematically
analyze
these
shortcomings
address
lack
comprehensive
reviews,
this
article
presents
a
systematic
summary
targeting
multi-domain
AI
models.
Simultaneously,
based
design
principles
shared
characteristics
triggers
different
domains
implementation
stages
defense,
proposes
new
classification
method
for
defenses.
We
use
extensively
review
computer
vision
natural
language
processing,
we
also
examine
applications
audio
recognition,
video
action
multimodal
tasks,
time
series
generative
learning,
reinforcement
while
critically
analyzing
open
problems
various
attack
techniques
defense
strategies.
Finally,
builds
upon
analysis
state
further
explore
future
directions
IEEE Transactions on Information Forensics and Security,
Journal Year:
2024,
Volume and Issue:
19, P. 2356 - 2369
Published: Jan. 1, 2024
Deep
learning
models
with
backdoors
act
maliciously
when
triggered
but
seem
normal
otherwise.
This
risk,
often
increased
by
model
outsourcing,
challenges
their
secure
use.
Although
countermeasures
exist,
defense
against
adaptive
attacks
is
under-examined,
possibly
leading
to
security
misjudgments.
study
the
first
intricate
examination
illustrating
difficulty
of
detecting
in
outsourced
models,
especially
attackers
adjust
strategies,
even
if
capabilities
are
significantly
limited.
It
relatively
straightforward
for
circumvent
detection
trivially
violating
its
threat
(e.g.,
using
advanced
backdoor
types
or
trigger
designs
not
covered
detection).
However,
this
research
highlights
that
various
defenses
can
simultaneously
be
evaded
simple
under
defined
and
limited
adversary
easily
detectable
triggers
while
maintaining
a
high
attack
success
rate).
To
more
specific,
introduces
novel
methodology
employs
specificity
enhancement
training
regulation
symbiotic
manner.
approach
allows
us
evade
multiple
simultaneously,
including
Neural
Cleanse
(Oakland
19'),
ABS
(CCS
MNTD
21').
These
were
tools
selected
Evasive
Trojans
Track
2022
NeurIPS
Trojan
Detection
Challenge.
Even
applied
conjunction
these
stringent
conditions,
such
as
rate
(>
97%)
restricted
use
simplest
(small
white
square),
our
method
garnered
second
prize
Notably,
time,
successfully
other
recent
state-of-the-art
defenses,
FeatureRE
(NeurIPS
22')
Beatrix
(NDSS
23').
suggests
existing
outsourcing
remain
vulnerable
attacks,
thus,
third-party
should
avoided
whenever
possible.