NWS: Natural Textual Backdoor Attacks Via Word Substitution DOI Open Access
Wei Du,

Tongxin Yuan,

Haodong Zhao

et al.

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Journal Year: 2024, Volume and Issue: unknown, P. 4680 - 4684

Published: March 18, 2024

Backdoor attacks pose a serious security threat for natural language processing (NLP). Backdoored NLP models perform normally on clean text, but predict the attacker-specified target labels text containing triggers. Existing word-level textual backdoor rely either word insertion or substitution. Word-insertion can be easily detected by simple defenses. Meanwhile, word-substitution tend to substantially degrade fluency and semantic consistency of poisoned text. In this paper, we propose more substitution method implement covert attacks. Specifically, combine three different ways construct diverse synonym thesaurus We then train learnable selector producing using composite loss function poison fidelity terms. This enables automated selection minimal critical substitutions necessary induce backdoor. Experiments demonstrate our achieves high attack performance with less impact semantics. hope work raise awareness regarding subtle, fluent

Language: Английский

BITE: Textual Backdoor Attacks with Iterative Trigger Injection DOI Creative Commons
Jun Yan, Vansh Gupta, Xiang Ren

et al.

Published: Jan. 1, 2023

Backdoor attacks have become an emerging threat to NLP systems. By providing poisoned training data, the adversary can embed a "backdoor" into victim model, which allows input instances satisfying certain textual patterns (e.g., containing keyword) be predicted as target label of adversary's choice. In this paper, we demonstrate that it is possible design backdoor attack both stealthy (i.e., hard notice) and effective has high success rate). We propose BITE, poisons data establish strong correlations between set "trigger words". These trigger words are iteratively identified injected target-label through natural word-level perturbations. The instruct model predict on inputs words, forming backdoor. Experiments four text classification datasets show our proposed significantly more than baseline methods while maintaining decent stealthiness, raising alarm usage untrusted data. further defense method named DeBITE based potential word removal, outperforms existing in defending against BITE generalizes well handling other attacks.

Language: Английский

Citations

10

Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics DOI
Xiaoxing Mo, Yechao Zhang, Leo Yu Zhang

et al.

2022 IEEE Symposium on Security and Privacy (SP), Journal Year: 2024, Volume and Issue: 4, P. 2048 - 2066

Published: May 19, 2024

Language: Английский

Citations

4

Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling DOI Creative Commons

Ki Yoon Yoo,

Nojun Kwak

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2022, Volume and Issue: unknown, P. 72 - 88

Published: Jan. 1, 2022

Recent advances in federated learning have demonstrated its promising capability to learn on decentralized datasets. However, a considerable amount of work has raised concerns due the potential risks adversaries participating framework poison global model for an adversarial purpose. This paper investigates feasibility poisoning backdoor attacks through rare word embeddings NLP models. In text classification, less than 1% adversary clients suffices manipulate output without any drop performance clean sentences. For complex dataset, mere 0.1% is enough effectively. We also propose technique specialized scheme called gradient ensemble, which enhances all experimental settings.

Language: Английский

Citations

15

Backdoor Attacks to Deep Learning Models and Countermeasures: A Survey DOI Creative Commons
Yudong Li, Shigeng Zhang, Weiping Wang

et al.

IEEE Open Journal of the Computer Society, Journal Year: 2023, Volume and Issue: 4, P. 134 - 146

Published: Jan. 1, 2023

Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years. In backdoor attacks, attackers try to plant hidden backdoors into DNN models, either training or inference stage, mislead output of model when input contains some specified triggers without affecting prediction normal inputs not containing triggers. As a rapidly developing topic, numerous works on designing various and techniques defend against such been proposed recent However, comprehensive holistic overview countermeasures is still missing. this paper, we provide systematic design defense strategies covering latest published works. We review representative both computer vision domain other domains, discuss their pros cons, make comparisons among them. outline key challenges be addressed potential research directions future.

Language: Английский

Citations

9

NWS: Natural Textual Backdoor Attacks Via Word Substitution DOI Open Access
Wei Du,

Tongxin Yuan,

Haodong Zhao

et al.

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Journal Year: 2024, Volume and Issue: unknown, P. 4680 - 4684

Published: March 18, 2024

Backdoor attacks pose a serious security threat for natural language processing (NLP). Backdoored NLP models perform normally on clean text, but predict the attacker-specified target labels text containing triggers. Existing word-level textual backdoor rely either word insertion or substitution. Word-insertion can be easily detected by simple defenses. Meanwhile, word-substitution tend to substantially degrade fluency and semantic consistency of poisoned text. In this paper, we propose more substitution method implement covert attacks. Specifically, combine three different ways construct diverse synonym thesaurus We then train learnable selector producing using composite loss function poison fidelity terms. This enables automated selection minimal critical substitutions necessary induce backdoor. Experiments demonstrate our achieves high attack performance with less impact semantics. hope work raise awareness regarding subtle, fluent

Language: Английский

Citations

3