Complex Backdoor Detection by Symmetric Feature Differencing DOI
Yingqi Liu,

Guangyu Shen,

Guanhong Tao

и другие.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2022, Номер unknown, С. 14983 - 14993

Опубликована: Июнь 1, 2022

Many existing backdoor scanners work by finding a small and fixed trigger. However, advanced attacks have large pervasive triggers, rendering less effective. We develop new detection method. It first uses trigger inversion technique to generate namely, universal input patterns flipping victim class samples target class. then checks if any such is composed of features that are not natural distinctive between the classes. based on novel symmetric feature differencing method identifies separating two sets (e.g., from respective classes). evaluate number including composite attack, reflection hidden filter also traditional patch attack. The evaluation thousands models, both clean trojaned with various architectures. compare three state-of-the-art scanners. Our can achieve 80-88% accuracy while baselines only 50-70% complex attacks. results TrojAI competition rounds 2–4, which backdoors backdoors, show may produce hundreds false positives (i.e., models recognized as trojaned), our removes 78-100% them increase negatives 0-30%, leading 17-41% overall improvement. This allows us top performance leaderboard.

Язык: Английский

Backdoor Learning: A Survey DOI
Yiming Li, Yong Jiang, Zhifeng Li

и другие.

IEEE Transactions on Neural Networks and Learning Systems, Год журнала: 2022, Номер 35(1), С. 5 - 22

Опубликована: Июнь 22, 2022

Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if backdoor is activated by attacker-specified triggers. This threat could happen when training process not fully controlled, such as third-party datasets or adopting models, which poses a new and realistic threat. Although learning an emerging rapidly growing research area, there still no comprehensive timely review of it. In this article, we present first survey realm. We summarize categorize existing attacks defenses based characteristics, provide unified framework for analyzing poisoning-based attacks. Besides, also analyze relation between relevant fields (i.e., adversarial data poisoning), widely adopted benchmark datasets. Finally, briefly outline certain future directions relying upon reviewed works. A curated list backdoor-related resources available at https://github.com/THUYimingLi/backdoor-learning-resources .

Язык: Английский

Процитировано

343

Privacy and Robustness in Federated Learning: Attacks and Defenses DOI
Lingjuan Lyu, Han Yu, Xingjun Ma

и другие.

IEEE Transactions on Neural Networks and Learning Systems, Год журнала: 2022, Номер 35(7), С. 8726 - 8746

Опубликована: Ноя. 10, 2022

As data are increasingly being stored in different silos and societies becoming more aware of privacy issues, the traditional centralized training artificial intelligence (AI) models is facing efficiency challenges. Recently, federated learning (FL) has emerged as an alternative solution continues to thrive this new reality. Existing FL protocol designs have been shown be vulnerable adversaries within or outside system, compromising system robustness. Besides powerful global models, it paramount importance design systems that guarantees resistant types adversaries. In article, we conduct a comprehensive survey on robustness over past five years. Through concise introduction concept unique taxonomy covering: 1) threat models; 2) attacks defenses; 3) poisoning defenses, provide accessible review important topic. We highlight intuitions, key techniques, fundamental assumptions adopted by various defenses. Finally, discuss promising future research directions toward robust privacy-preserving FL, their interplays with multidisciplinary goals FL.

Язык: Английский

Процитировано

228

Backdoor Attacks Against Deep Learning Systems in the Physical World DOI
Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji

и другие.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2021, Номер unknown

Опубликована: Июнь 1, 2021

Backdoor attacks embed hidden malicious behaviors into deep learning models, which only activate and cause misclassifications on model inputs containing a specific "trigger." Existing works backdoor defenses, however, mostly focus digital that apply digitally generated patterns as triggers. A critical question remains unanswered: "can succeed using physical objects triggers, thus making them credible threat against systems in the real world?"We conduct detailed empirical study to explore this for facial recognition, task. Using 7 we collect custom dataset of 3205 images 10 volunteers use it feasibility "physical" under variety real-world conditions. Our reveals two key findings. First, can be highly successful if they are carefully configured overcome constraints imposed by objects. In particular, placement triggers is largely constrained target model's dependence features. Second, four today's state-of-the-art defenses (digital) backdoors ineffective backdoors, because breaks core assumptions used construct these defenses.Our confirms (physical) not hypothetical phenomenon but rather pose serious classification tasks. We need new more robust world.

Язык: Английский

Процитировано

132

LIRA: Learnable, Imperceptible and Robust Backdoor Attacks DOI
Khoa D. Doan, Yingjie Lao, Weijie Zhao

и другие.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Год журнала: 2021, Номер unknown, С. 11946 - 11956

Опубликована: Окт. 1, 2021

Recently, machine learning models have demonstrated to be vulnerable backdoor attacks, primarily due the lack of transparency in black-box such as deep neural networks. A third-party model can poisoned that it works adequately normal conditions but behaves maliciously on samples with specific trigger patterns. However, injection function is manually defined most existing attack methods, e.g., placing a small patch pixels an image or slightly deforming before poisoning model. This results two-stage approach sub-optimal success rate and complete stealthiness under human inspection.In this paper, we propose novel stealthy framework, LIRA, which jointly learns optimal, poisons We formulate objective non-convex, constrained optimization problem. Under generator will learn manipulate input imperceptible noise preserve performance clean data maximize data. Then, solve challenging problem efficient, stochastic procedure. Finally, proposed framework achieves 100% rates several benchmark datasets, including MNIST, CIFAR10, GTSRB, T-ImageNet, while simultaneously bypassing defense methods inspection.

Язык: Английский

Процитировано

127

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks DOI Creative Commons
Yige Li,

Xixiang Lyu,

Nodens Koren

и другие.

arXiv (Cornell University), Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Deep neural networks (DNNs) are known vulnerable to backdoor attacks, a training time attack that injects trigger pattern into small proportion of data so as control the model's prediction at test time. Backdoor attacks notably dangerous since they do not affect performance on clean examples, yet can fool model make incorrect whenever appears during testing. In this paper, we propose novel defense framework Neural Attention Distillation (NAD) erase triggers from backdoored DNNs. NAD utilizes teacher network guide finetuning student subset such intermediate-layer attention aligns with network. The be obtained by an independent process same subset. We empirically show, against 6 state-of-the-art effectively using only 5\% without causing obvious degradation examples. Code is available in https://github.com/bboylyg/NAD.

Язык: Английский

Процитировано

122

Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning DOI Open Access
Antonio Emanuele Cinà, Kathrin Grosse, Ambra Demontis

и другие.

ACM Computing Surveys, Год журнала: 2023, Номер 55(13s), С. 1 - 39

Опубликована: Март 1, 2023

The success of machine learning is fueled by the increasing availability computing power and large training datasets. data used to learn new models or update existing ones, assuming that it sufficiently representative will be encountered at test time. This assumption challenged threat poisoning, an attack manipulates compromise model’s performance Although poisoning has been acknowledged as a relevant in industry applications, variety different attacks defenses have proposed so far, complete systematization critical review field still missing. In this survey, we provide comprehensive learning, reviewing more than 100 papers published past 15 years. We start categorizing current then organize accordingly. While focus mostly on computer-vision argue our also encompasses state-of-the-art for other modalities. Finally, discuss resources research shed light limitations open questions field.

Язык: Английский

Процитировано

68

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review DOI Creative Commons
Yansong Gao, Bao Gia Doan, Zhi Zhang

и другие.

arXiv (Cornell University), Год журнала: 2020, Номер unknown

Опубликована: Янв. 1, 2020

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to attacker's capability affected stage machine learning pipeline, attack surfaces are recognized be wide then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative post-deployment. Accordingly, under each categorization combed. The categorized four general classes: blind removal, offline inspection, online post removal. we countermeasures, compare analyze their advantages disadvantages. We have also reviewed flip side attacks, which explored for i) protecting intellectual property models, ii) acting as honeypot catch adversarial example iii) verifying deletion requested by contributor.Overall, research defense is far behind attack, there no single that can prevent all types attacks. In some cases, an attacker intelligently bypass existing defenses adaptive attack. Drawing insights from systematic review, present key areas future backdoor, such empirical security evaluations physical trigger in particular, more efficient practical solicited.

Язык: Английский

Процитировано

122

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification DOI Open Access
Siyuan Cheng, Yingqi Liu, Shiqing Ma

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2021, Номер 35(2), С. 1148 - 1156

Опубликована: Май 18, 2021

Trojan (backdoor) attack is a form of adversarial on deep neural networks where the attacker provides victims with model trained/retrained malicious data. The backdoor can be activated when normal input stamped certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being space patches/objects (e.g., polygon solid color) or simple transformations such as Instagram filters. These are susceptible to recent detection algorithms. We propose novel feature five characteristics: effectiveness, stealthiness, controllability, robustness and reliance features. conduct extensive experiments 9 image classifiers various datasets including ImageNet demonstrate these properties show that our evade state-of-the-art defense.

Язык: Английский

Процитировано

83

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models DOI Creative Commons

Wenkai Yang,

Lei Li, Zhiyuan Zhang

и другие.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Год журнала: 2021, Номер unknown, С. 2048 - 2058

Опубликована: Янв. 1, 2021

Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, Bin He. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

Язык: Английский

Процитировано

74

Anti-Backdoor Learning: Training Clean Models on Poisoned Data DOI Creative Commons
Yige Li,

Xixiang Lyu,

Nodens Koren

и другие.

arXiv (Cornell University), Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training can be devised prevent the backdoor triggers being injected into trained model in first place. In this paper, we introduce concept of \emph{anti-backdoor learning}, aiming train \emph{clean} models given backdoor-poisoned data. We frame overall learning process dual-task and \emph{backdoor} portions From view, identify two inherent characteristics attacks their weaknesses: 1) learn backdoored data much faster than with clean data, stronger converges data; 2) task tied specific class (the target class). Based these weaknesses, propose general scheme, Anti-Backdoor Learning (ABL), automatically during training. ABL introduces two-stage \emph{gradient ascent} mechanism for standard help isolate examples at an early stage, break correlation between later stage. Through extensive experiments multiple benchmark datasets against 10 state-of-the-art attacks, empirically show that ABL-trained achieve same performance they were purely Code available \url{https://github.com/bboylyg/ABL}.

Язык: Английский

Процитировано

73