Cited by Adversarial machine learning: a review of methods, tools, and critical industry sectors

Yiming Li, Yong Jiang, Zhifeng Li

et al.

IEEE Transactions on Neural Networks and Learning Systems, Journal Year: 2022, Volume and Issue: 35(1), P. 5 - 22

Published: June 22, 2022

Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if backdoor is activated by attacker-specified triggers. This threat could happen when training process not fully controlled, such as third-party datasets or adopting models, which poses a new and realistic threat. Although learning an emerging rapidly growing research area, there still no comprehensive timely review of it. In this article, we present first survey realm. We summarize categorize existing attacks defenses based characteristics, provide unified framework for analyzing poisoning-based attacks. Besides, also analyze relation between relevant fields (i.e., adversarial data poisoning), widely adopted benchmark datasets. Finally, briefly outline certain future directions relying upon reviewed works. A curated list backdoor-related resources available at https://github.com/THUYimingLi/backdoor-learning-resources .

Language: Английский

Citations

344

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts DOI

Devon Myers,

Rami Mohawesh,

Venkata Ishwarya Chellaboina

et al.

Cluster Computing, Journal Year: 2023, Volume and Issue: 27(1), P. 1 - 26

Published: Nov. 27, 2023

Language: Английский

Citations

On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review DOI

Biwei Yan, Kun Li, Minghui Xu

et al.

High-Confidence Computing, Journal Year: 2025, Volume and Issue: unknown, P. 100300 - 100300

Published: Feb. 1, 2025

Language: Английский

Citations

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks DOI

Zhengyan Zhang, Guangxuan Xiao, Yongwei Li

et al.

Deleted Journal, Journal Year: 2023, Volume and Issue: 20(2), P. 180 - 193

Published: March 2, 2023

Abstract The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from Internet and fine-tune them on downstream datasets, while downloaded may suffer backdoor attacks. Different previous attacks aiming at a target task, we show that backdoored model can behave maliciously various tasks without foreknowing task information. Attackers restrict output representations (the values of neurons) trigger-embedded samples arbitrary predefined through additional training, namely neuron-level attack (NeuBA). Since fine-tuning little effect parameters, fine-tuned will retain functionality predict specific label embedded with same trigger. To provoke multiple labels attackers introduce several triggers contrastive values. In experiments both natural language processing (NLP) computer vision (CV), NeuBA well control predictions instances different trigger designs. Our findings sound red alarm wide use models. Finally, apply defense methods find pruning is promising technique resist by omitting neurons.

Language: Английский

Citations

A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks DOI

Haomiao Yang, Kunlan Xiang,

Mengyu Ge

et al.

IEEE Network, Journal Year: 2024, Volume and Issue: 38(6), P. 211 - 218

Published: Feb. 20, 2024

Language: Английский

Citations

Attention-Enhancing Backdoor Attacks Against BERT-based Models DOI

Weimin Lyu,

Songzhu Zheng,

Lu Pang

et al.

Published: Jan. 1, 2023

Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating strategies backdoor attacks will help to understand model's vulnerability. Most existing textual focus on generating stealthy triggers or modifying model weights. In this paper, we directly target interior structure neural networks and mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances behavior by manipulating attention patterns. Our loss be applied different attacking methods boost their attack efficacy in terms successful rates poisoning rates. It applies not only traditional dirty-label attacks, but also more challenging clean-label attacks. validate our method backbone models (BERT, RoBERTa, DistilBERT) various tasks (Sentiment Analysis, Toxic Detection, Topic Classification).

Language: Английский

Citations

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models DOI

Shuai Zhao, Jinming Wen,

Anh Luu

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2023, Volume and Issue: unknown, P. 12303 - 12317

Published: Jan. 1, 2023

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, is vulnerable to backdoor attacks. Textual attacks are designed introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection label modification. However, they suffer from flaws such as abnormal natural language expressions resulting incorrect labeling poisoned samples. In this study, we propose ProAttack, novel efficient method for performing clean-label based prompt, uses prompt itself trigger. Our does not require external triggers ensures correct samples, improving stealthy nature attack. With extensive experiments rich-resource text classification empirically validate ProAttack's competitive textual Notably, setting, ProAttack attack success rates benchmark without triggers.

Language: Английский

Citations

Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis DOI

Lukas Struppek,

Dominik Hintersdorf,

Kristian Kersting

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2023, Volume and Issue: unknown, P. 4561 - 4573

Published: Oct. 1, 2023

While text-to-image synthesis currently enjoys great popularity among researchers and the general public, security of these models has been neglected so far. Many text-guided image generation rely on pre-trained text encoders from external sources, their users trust that retrieved will behave as promised. Unfortunately, this might not be case. We introduce backdoor attacks against generative demonstrate pose a major tampering risk. Our only slightly alter an encoder no suspicious model behavior is apparent for generations with clean prompts. By then inserting single character trigger into prompt, e.g., non-Latin or emoji, adversary can to either generate images pre-defined attributes following hidden, potentially malicious description. empirically high effectiveness our Stable Diffusion highlight injection process takes less than two minutes. Besides phrasing approach solely attack, it also force forget phrases related certain concepts, such nudity violence, help make safer. source code available at https://github.com/LukasStruppek/Rickrolling-the-Artist.

Language: Английский

Citations

Exploring Clean Label Backdoor Attacks and Defense in Language Models DOI

Shuai Zhao, Luu Anh Tuan, Jie Fu

et al.

IEEE/ACM Transactions on Audio Speech and Language Processing, Journal Year: 2024, Volume and Issue: 32, P. 3014 - 3024

Published: Jan. 1, 2024

Language: Английский

Citations

Triggerless Backdoor Attack for NLP Tasks with Clean Labels DOI

Leilei Gan, Jiwei Li, Tianwei Zhang

et al.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Journal Year: 2022, Volume and Issue: unknown

Published: Jan. 1, 2022

Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, Chun Fan. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

Language: Английский

Citations