
Artificial Intelligence Review, Год журнала: 2025, Номер 58(8)
Опубликована: Май 3, 2025
Язык: Английский
Artificial Intelligence Review, Год журнала: 2025, Номер 58(8)
Опубликована: Май 3, 2025
Язык: Английский
IEEE Transactions on Neural Networks and Learning Systems, Год журнала: 2022, Номер 35(1), С. 5 - 22
Опубликована: Июнь 22, 2022
Backdoor
attack
intends
to
embed
hidden
backdoors
into
deep
neural
networks
(DNNs),
so
that
the
attacked
models
perform
well
on
benign
samples,
whereas
their
predictions
will
be
maliciously
changed
if
backdoor
is
activated
by
attacker-specified
triggers.
This
threat
could
happen
when
training
process
not
fully
controlled,
such
as
third-party
datasets
or
adopting
models,
which
poses
a
new
and
realistic
threat.
Although
learning
an
emerging
rapidly
growing
research
area,
there
still
no
comprehensive
timely
review
of
it.
In
this
article,
we
present
first
survey
realm.
We
summarize
categorize
existing
attacks
defenses
based
characteristics,
provide
unified
framework
for
analyzing
poisoning-based
attacks.
Besides,
also
analyze
relation
between
relevant
fields
(i.e.,
adversarial
data
poisoning),
widely
adopted
benchmark
datasets.
Finally,
briefly
outline
certain
future
directions
relying
upon
reviewed
works.
A
curated
list
backdoor-related
resources
available
at
Язык: Английский
Процитировано
344Cluster Computing, Год журнала: 2023, Номер 27(1), С. 1 - 26
Опубликована: Ноя. 27, 2023
Язык: Английский
Процитировано
58High-Confidence Computing, Год журнала: 2025, Номер unknown, С. 100300 - 100300
Опубликована: Фев. 1, 2025
Язык: Английский
Процитировано
6Deleted Journal, Год журнала: 2023, Номер 20(2), С. 180 - 193
Опубликована: Март 2, 2023
Abstract The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from Internet and fine-tune them on downstream datasets, while downloaded may suffer backdoor attacks. Different previous attacks aiming at a target task, we show that backdoored model can behave maliciously various tasks without foreknowing task information. Attackers restrict output representations (the values of neurons) trigger-embedded samples arbitrary predefined through additional training, namely neuron-level attack (NeuBA). Since fine-tuning little effect parameters, fine-tuned will retain functionality predict specific label embedded with same trigger. To provoke multiple labels attackers introduce several triggers contrastive values. In experiments both natural language processing (NLP) computer vision (CV), NeuBA well control predictions instances different trigger designs. Our findings sound red alarm wide use models. Finally, apply defense methods find pruning is promising technique resist by omitting neurons.
Язык: Английский
Процитировано
27IEEE Network, Год журнала: 2024, Номер 38(6), С. 211 - 218
Опубликована: Фев. 20, 2024
Язык: Английский
Процитировано
18Опубликована: Янв. 1, 2023
Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating strategies backdoor attacks will help to understand model's vulnerability. Most existing textual focus on generating stealthy triggers or modifying model weights. In this paper, we directly target interior structure neural networks and mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances behavior by manipulating attention patterns. Our loss be applied different attacking methods boost their attack efficacy in terms successful rates poisoning rates. It applies not only traditional dirty-label attacks, but also more challenging clean-label attacks. validate our method backbone models (BERT, RoBERTa, DistilBERT) various tasks (Sentiment Analysis, Toxic Detection, Topic Classification).
Язык: Английский
Процитировано
20Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Год журнала: 2023, Номер unknown, С. 12303 - 12317
Опубликована: Янв. 1, 2023
The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, is vulnerable to backdoor attacks. Textual attacks are designed introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection label modification. However, they suffer from flaws such as abnormal natural language expressions resulting incorrect labeling poisoned samples. In this study, we propose ProAttack, novel efficient method for performing clean-label based prompt, uses prompt itself trigger. Our does not require external triggers ensures correct samples, improving stealthy nature attack. With extensive experiments rich-resource text classification empirically validate ProAttack's competitive textual Notably, setting, ProAttack attack success rates benchmark without triggers.
Язык: Английский
Процитировано
182021 IEEE/CVF International Conference on Computer Vision (ICCV), Год журнала: 2023, Номер unknown, С. 4561 - 4573
Опубликована: Окт. 1, 2023
While text-to-image synthesis currently enjoys great popularity among researchers and the general public, security of these models has been neglected so far. Many text-guided image generation rely on pre-trained text encoders from external sources, their users trust that retrieved will behave as promised. Unfortunately, this might not be case. We introduce backdoor attacks against generative demonstrate pose a major tampering risk. Our only slightly alter an encoder no suspicious model behavior is apparent for generations with clean prompts. By then inserting single character trigger into prompt, e.g., non-Latin or emoji, adversary can to either generate images pre-defined attributes following hidden, potentially malicious description. empirically high effectiveness our Stable Diffusion highlight injection process takes less than two minutes. Besides phrasing approach solely attack, it also force forget phrases related certain concepts, such nudity violence, help make safer. source code available at https://github.com/LukasStruppek/Rickrolling-the-Artist.
Язык: Английский
Процитировано
18IEEE/ACM Transactions on Audio Speech and Language Processing, Год журнала: 2024, Номер 32, С. 3014 - 3024
Опубликована: Янв. 1, 2024
Язык: Английский
Процитировано
7Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Год журнала: 2022, Номер unknown
Опубликована: Янв. 1, 2022
Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, Chun Fan. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.
Язык: Английский
Процитировано
27