
Artificial Intelligence Review, Journal Year: 2025, Volume and Issue: 58(8)
Published: May 3, 2025
Language: Английский
Artificial Intelligence Review, Journal Year: 2025, Volume and Issue: 58(8)
Published: May 3, 2025
Language: Английский
IEEE Transactions on Neural Networks and Learning Systems, Journal Year: 2022, Volume and Issue: 35(1), P. 5 - 22
Published: June 22, 2022
Backdoor
attack
intends
to
embed
hidden
backdoors
into
deep
neural
networks
(DNNs),
so
that
the
attacked
models
perform
well
on
benign
samples,
whereas
their
predictions
will
be
maliciously
changed
if
backdoor
is
activated
by
attacker-specified
triggers.
This
threat
could
happen
when
training
process
not
fully
controlled,
such
as
third-party
datasets
or
adopting
models,
which
poses
a
new
and
realistic
threat.
Although
learning
an
emerging
rapidly
growing
research
area,
there
still
no
comprehensive
timely
review
of
it.
In
this
article,
we
present
first
survey
realm.
We
summarize
categorize
existing
attacks
defenses
based
characteristics,
provide
unified
framework
for
analyzing
poisoning-based
attacks.
Besides,
also
analyze
relation
between
relevant
fields
(i.e.,
adversarial
data
poisoning),
widely
adopted
benchmark
datasets.
Finally,
briefly
outline
certain
future
directions
relying
upon
reviewed
works.
A
curated
list
backdoor-related
resources
available
at
Language: Английский
Citations
344Cluster Computing, Journal Year: 2023, Volume and Issue: 27(1), P. 1 - 26
Published: Nov. 27, 2023
Language: Английский
Citations
58High-Confidence Computing, Journal Year: 2025, Volume and Issue: unknown, P. 100300 - 100300
Published: Feb. 1, 2025
Language: Английский
Citations
6Deleted Journal, Journal Year: 2023, Volume and Issue: 20(2), P. 180 - 193
Published: March 2, 2023
Abstract The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from Internet and fine-tune them on downstream datasets, while downloaded may suffer backdoor attacks. Different previous attacks aiming at a target task, we show that backdoored model can behave maliciously various tasks without foreknowing task information. Attackers restrict output representations (the values of neurons) trigger-embedded samples arbitrary predefined through additional training, namely neuron-level attack (NeuBA). Since fine-tuning little effect parameters, fine-tuned will retain functionality predict specific label embedded with same trigger. To provoke multiple labels attackers introduce several triggers contrastive values. In experiments both natural language processing (NLP) computer vision (CV), NeuBA well control predictions instances different trigger designs. Our findings sound red alarm wide use models. Finally, apply defense methods find pruning is promising technique resist by omitting neurons.
Language: Английский
Citations
27IEEE Network, Journal Year: 2024, Volume and Issue: 38(6), P. 211 - 218
Published: Feb. 20, 2024
Language: Английский
Citations
18Published: Jan. 1, 2023
Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating strategies backdoor attacks will help to understand model's vulnerability. Most existing textual focus on generating stealthy triggers or modifying model weights. In this paper, we directly target interior structure neural networks and mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances behavior by manipulating attention patterns. Our loss be applied different attacking methods boost their attack efficacy in terms successful rates poisoning rates. It applies not only traditional dirty-label attacks, but also more challenging clean-label attacks. validate our method backbone models (BERT, RoBERTa, DistilBERT) various tasks (Sentiment Analysis, Toxic Detection, Topic Classification).
Language: Английский
Citations
20Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2023, Volume and Issue: unknown, P. 12303 - 12317
Published: Jan. 1, 2023
The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, is vulnerable to backdoor attacks. Textual attacks are designed introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection label modification. However, they suffer from flaws such as abnormal natural language expressions resulting incorrect labeling poisoned samples. In this study, we propose ProAttack, novel efficient method for performing clean-label based prompt, uses prompt itself trigger. Our does not require external triggers ensures correct samples, improving stealthy nature attack. With extensive experiments rich-resource text classification empirically validate ProAttack's competitive textual Notably, setting, ProAttack attack success rates benchmark without triggers.
Language: Английский
Citations
182021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2023, Volume and Issue: unknown, P. 4561 - 4573
Published: Oct. 1, 2023
While text-to-image synthesis currently enjoys great popularity among researchers and the general public, security of these models has been neglected so far. Many text-guided image generation rely on pre-trained text encoders from external sources, their users trust that retrieved will behave as promised. Unfortunately, this might not be case. We introduce backdoor attacks against generative demonstrate pose a major tampering risk. Our only slightly alter an encoder no suspicious model behavior is apparent for generations with clean prompts. By then inserting single character trigger into prompt, e.g., non-Latin or emoji, adversary can to either generate images pre-defined attributes following hidden, potentially malicious description. empirically high effectiveness our Stable Diffusion highlight injection process takes less than two minutes. Besides phrasing approach solely attack, it also force forget phrases related certain concepts, such nudity violence, help make safer. source code available at https://github.com/LukasStruppek/Rickrolling-the-Artist.
Language: Английский
Citations
18IEEE/ACM Transactions on Audio Speech and Language Processing, Journal Year: 2024, Volume and Issue: 32, P. 3014 - 3024
Published: Jan. 1, 2024
Language: Английский
Citations
7Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Journal Year: 2022, Volume and Issue: unknown
Published: Jan. 1, 2022
Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, Chun Fan. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.
Language: Английский
Citations
27