Cited by Adversarial machine learning: a review of methods, tools, and critical industry sectors

Yiming Li, Yong Jiang, Zhifeng Li

и другие.

IEEE Transactions on Neural Networks and Learning Systems, Год журнала: 2022, Номер 35(1), С. 5 - 22

Опубликована: Июнь 22, 2022

Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if backdoor is activated by attacker-specified triggers. This threat could happen when training process not fully controlled, such as third-party datasets or adopting models, which poses a new and realistic threat. Although learning an emerging rapidly growing research area, there still no comprehensive timely review of it. In this article, we present first survey realm. We summarize categorize existing attacks defenses based characteristics, provide unified framework for analyzing poisoning-based attacks. Besides, also analyze relation between relevant fields (i.e., adversarial data poisoning), widely adopted benchmark datasets. Finally, briefly outline certain future directions relying upon reviewed works. A curated list backdoor-related resources available at https://github.com/THUYimingLi/backdoor-learning-resources .

Язык: Английский

Процитировано

344

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts DOI

Devon Myers,

Rami Mohawesh,

Venkata Ishwarya Chellaboina

и другие.

Cluster Computing, Год журнала: 2023, Номер 27(1), С. 1 - 26

Опубликована: Ноя. 27, 2023

Язык: Английский

Процитировано

On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review DOI

Biwei Yan, Kun Li, Minghui Xu

и другие.

High-Confidence Computing, Год журнала: 2025, Номер unknown, С. 100300 - 100300

Опубликована: Фев. 1, 2025

Язык: Английский

Процитировано

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks DOI

Zhengyan Zhang, Guangxuan Xiao, Yongwei Li

и другие.

Deleted Journal, Год журнала: 2023, Номер 20(2), С. 180 - 193

Опубликована: Март 2, 2023

Abstract The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from Internet and fine-tune them on downstream datasets, while downloaded may suffer backdoor attacks. Different previous attacks aiming at a target task, we show that backdoored model can behave maliciously various tasks without foreknowing task information. Attackers restrict output representations (the values of neurons) trigger-embedded samples arbitrary predefined through additional training, namely neuron-level attack (NeuBA). Since fine-tuning little effect parameters, fine-tuned will retain functionality predict specific label embedded with same trigger. To provoke multiple labels attackers introduce several triggers contrastive values. In experiments both natural language processing (NLP) computer vision (CV), NeuBA well control predictions instances different trigger designs. Our findings sound red alarm wide use models. Finally, apply defense methods find pruning is promising technique resist by omitting neurons.

Язык: Английский

Процитировано

A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks DOI

Haomiao Yang, Kunlan Xiang,

Mengyu Ge

и другие.

IEEE Network, Год журнала: 2024, Номер 38(6), С. 211 - 218

Опубликована: Фев. 20, 2024

Язык: Английский

Процитировано

Attention-Enhancing Backdoor Attacks Against BERT-based Models DOI

Weimin Lyu,

Songzhu Zheng,

Lu Pang

и другие.

Опубликована: Янв. 1, 2023

Recent studies have revealed that Backdoor Attacks can threaten the safety of natural language processing (NLP) models. Investigating strategies backdoor attacks will help to understand model's vulnerability. Most existing textual focus on generating stealthy triggers or modifying model weights. In this paper, we directly target interior structure neural networks and mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances behavior by manipulating attention patterns. Our loss be applied different attacking methods boost their attack efficacy in terms successful rates poisoning rates. It applies not only traditional dirty-label attacks, but also more challenging clean-label attacks. validate our method backbone models (BERT, RoBERTa, DistilBERT) various tasks (Sentiment Analysis, Toxic Detection, Topic Classification).

Язык: Английский

Процитировано

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models DOI

Shuai Zhao, Jinming Wen,

Anh Luu

и другие.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Год журнала: 2023, Номер unknown, С. 12303 - 12317

Опубликована: Янв. 1, 2023

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, is vulnerable to backdoor attacks. Textual attacks are designed introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection label modification. However, they suffer from flaws such as abnormal natural language expressions resulting incorrect labeling poisoned samples. In this study, we propose ProAttack, novel efficient method for performing clean-label based prompt, uses prompt itself trigger. Our does not require external triggers ensures correct samples, improving stealthy nature attack. With extensive experiments rich-resource text classification empirically validate ProAttack's competitive textual Notably, setting, ProAttack attack success rates benchmark without triggers.

Язык: Английский

Процитировано

Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis DOI

Lukas Struppek,

Dominik Hintersdorf,

Kristian Kersting

и другие.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Год журнала: 2023, Номер unknown, С. 4561 - 4573

Опубликована: Окт. 1, 2023

While text-to-image synthesis currently enjoys great popularity among researchers and the general public, security of these models has been neglected so far. Many text-guided image generation rely on pre-trained text encoders from external sources, their users trust that retrieved will behave as promised. Unfortunately, this might not be case. We introduce backdoor attacks against generative demonstrate pose a major tampering risk. Our only slightly alter an encoder no suspicious model behavior is apparent for generations with clean prompts. By then inserting single character trigger into prompt, e.g., non-Latin or emoji, adversary can to either generate images pre-defined attributes following hidden, potentially malicious description. empirically high effectiveness our Stable Diffusion highlight injection process takes less than two minutes. Besides phrasing approach solely attack, it also force forget phrases related certain concepts, such nudity violence, help make safer. source code available at https://github.com/LukasStruppek/Rickrolling-the-Artist.

Язык: Английский

Процитировано

Exploring Clean Label Backdoor Attacks and Defense in Language Models DOI

Shuai Zhao, Luu Anh Tuan, Jie Fu

и другие.

IEEE/ACM Transactions on Audio Speech and Language Processing, Год журнала: 2024, Номер 32, С. 3014 - 3024

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

Triggerless Backdoor Attack for NLP Tasks with Clean Labels DOI

Leilei Gan, Jiwei Li, Tianwei Zhang

и другие.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Год журнала: 2022, Номер unknown

Опубликована: Янв. 1, 2022

Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, Chun Fan. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

Язык: Английский

Процитировано