Cited by Better Trigger Inversion Optimization in Backdoor Scanning

Yiming Li, Yong Jiang, Zhifeng Li

et al.

IEEE Transactions on Neural Networks and Learning Systems, Journal Year: 2022, Volume and Issue: 35(1), P. 5 - 22

Published: June 22, 2022

Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if backdoor is activated by attacker-specified triggers. This threat could happen when training process not fully controlled, such as third-party datasets or adopting models, which poses a new and realistic threat. Although learning an emerging rapidly growing research area, there still no comprehensive timely review of it. In this article, we present first survey realm. We summarize categorize existing attacks defenses based characteristics, provide unified framework for analyzing poisoning-based attacks. Besides, also analyze relation between relevant fields (i.e., adversarial data poisoning), widely adopted benchmark datasets. Finally, briefly outline certain future directions relying upon reviewed works. A curated list backdoor-related resources available at https://github.com/THUYimingLi/backdoor-learning-resources .

Language: Английский

Citations

343

Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization DOI

Shaofeng Li, Minhui Xue, Benjamin Zi Hao Zhao

et al.

IEEE Transactions on Dependable and Secure Computing, Journal Year: 2020, Volume and Issue: unknown, P. 1 - 1

Published: Jan. 1, 2020

Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained a normal model, which is only activated by some specific input (called triggers), trick the model into producing unexpected behavior. In this article, we create covert and scattered triggers for invisible backdoors, can fool both DNN models human inspection. We apply our backdoors through two state-of-the-art methods of embedding attacks. The first approach on Badnets embeds trigger DNNs steganography. second trojan attack uses types additional regularization terms generate with irregular shape size. use Attack Success Rate Functionality measure performance introduce novel definitions invisibility perception; one conceptualized Perceptual Adversarial Similarity Score (PASS) other Learned Image Patch (LPIPS). show that proposed be fairly effective across various as well four datasets MNIST, CIFAR-10, CIFAR-100, GTSRB, measuring their success rates adversary, functionality users, scores administrators. finally argue attacks effectively thwart detection approaches.

Language: Английский

Citations

203

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses DOI

Micah Goldblum,

Dimitris Tsipras,

Chulin Xie

et al.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2022, Volume and Issue: 45(2), P. 1563 - 1580

Published: March 25, 2022

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of order achieve state-of-the-art performance. The absence trustworthy human supervision over collection process exposes organizations security vulnerabilities; can be manipulated control degrade downstream behaviors learned models. goal this work is systematically categorize discuss a wide range dataset vulnerabilities exploits, approaches for defending against these threats, an array open problems space.

Language: Английский

Citations

174

Backdoor Attacks Against Deep Learning Systems in the Physical World DOI

Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2021, Volume and Issue: unknown

Published: June 1, 2021

Backdoor attacks embed hidden malicious behaviors into deep learning models, which only activate and cause misclassifications on model inputs containing a specific "trigger." Existing works backdoor defenses, however, mostly focus digital that apply digitally generated patterns as triggers. A critical question remains unanswered: "can succeed using physical objects triggers, thus making them credible threat against systems in the real world?"We conduct detailed empirical study to explore this for facial recognition, task. Using 7 we collect custom dataset of 3205 images 10 volunteers use it feasibility "physical" under variety real-world conditions. Our reveals two key findings. First, can be highly successful if they are carefully configured overcome constraints imposed by objects. In particular, placement triggers is largely constrained target model's dependence features. Second, four today's state-of-the-art defenses (digital) backdoors ineffective backdoors, because breaks core assumptions used construct these defenses.Our confirms (physical) not hypothetical phenomenon but rather pose serious classification tasks. We need new more robust world.

Language: Английский

Citations

132

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning DOI

Jinyuan Jia, Yupei Liu, Neil Zhenqiang Gong

et al.

2022 IEEE Symposium on Security and Privacy (SP), Journal Year: 2022, Volume and Issue: unknown, P. 2043 - 2059

Published: May 1, 2022

Self-supervised learning in computer vision aims to pre-train an image encoder using a large amount of unlabeled images or (image, text) pairs. The pre-trained can then be used as feature extractor build downstream classifiers for many tasks with small no labeled training data. In this work, we propose BadEncoder, the first backdoor attack self-supervised learning. particular, our BadEncoder injects backdoors into such that built based on backdoored different simultaneously inherit behavior. We formulate optimization problem and gradient descent method solve it, which produces from clean one. Our extensive empirical evaluation results multiple datasets show achieves high success rates while preserving accuracy classifiers. also effectiveness two publicly available, real-world encoders, i.e., Google's ImageNet OpenAI's Contrastive Language-Image Pre-training (CLIP) 400 million pairs collected Internet. Moreover, consider defenses including Neural Cleanse MNTD (empirical defenses) well PatchGuard (a provable defense). these are insufficient defend against highlighting needs new BadEncoder. code is available at: https://github.com/jjy1994/BadEncoder.

Language: Английский

Citations

Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning DOI

Antonio Emanuele Cinà, Kathrin Grosse, Ambra Demontis

et al.

ACM Computing Surveys, Journal Year: 2023, Volume and Issue: 55(13s), P. 1 - 39

Published: March 1, 2023

The success of machine learning is fueled by the increasing availability computing power and large training datasets. data used to learn new models or update existing ones, assuming that it sufficiently representative will be encountered at test time. This assumption challenged threat poisoning, an attack manipulates compromise model’s performance Although poisoning has been acknowledged as a relevant in industry applications, variety different attacks defenses have proposed so far, complete systematization critical review field still missing. In this survey, we provide comprehensive learning, reviewing more than 100 papers published past 15 years. We start categorizing current then organize accordingly. While focus mostly on computer-vision argue our also encompasses state-of-the-art for other modalities. Finally, discuss resources research shed light limitations open questions field.

Language: Английский

Citations

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks DOI

Tilman Räuker,

Anson Ho, Stephen Casper

et al.

Published: Feb. 1, 2023

The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed the real world. However, they difficult to analyze, raising concerns about using them without a rigorous understanding how function. Effective tools for interpreting will be important building more trustworthy AI by helping identify problems, fix bugs, improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining internal components DNNs, well-suited developing mechanistic understanding, guiding manual modifications, reverse engineering solutions. Much recent work focused DNN interpretability, rapid progress thus far made thorough systematization methods difficult. this survey, we review over 300 works with inner tools. We introduce taxonomy that classifies what part network help explain (weights, neurons, subnetworks, or latent representations) whether implemented during (intrinsic) after (post hoc) training. To our knowledge, also first survey number connections between research adversarial robustness, continual learning, modularity, compression, studying human visual system. discuss key challenges argue status quo is largely unproductive. Finally, highlight importance future emphasizes diagnostics, debugging, adversaries, benchmarking order make useful engineers practical applications.

Language: Английский

Citations

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review DOI

Yansong Gao, Bao Gia Doan, Zhi Zhang

et al.

arXiv (Cornell University), Journal Year: 2020, Volume and Issue: unknown

Published: Jan. 1, 2020

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to attacker's capability affected stage machine learning pipeline, attack surfaces are recognized be wide then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative post-deployment. Accordingly, under each categorization combed. The categorized four general classes: blind removal, offline inspection, online post removal. we countermeasures, compare analyze their advantages disadvantages. We have also reviewed flip side attacks, which explored for i) protecting intellectual property models, ii) acting as honeypot catch adversarial example iii) verifying deletion requested by contributor.Overall, research defense is far behind attack, there no single that can prevent all types attacks. In some cases, an attacker intelligently bypass existing defenses adaptive attack. Drawing insights from systematic review, present key areas future backdoor, such empirical security evaluations physical trigger in particular, more efficient practical solicited.

Language: Английский

Citations

122

Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective DOI

Yi Zeng, Won Park, Z. Morley Mao

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2021, Volume and Issue: unknown

Published: Oct. 1, 2021

Backdoor attacks have been considered a severe security threat to deep learning. Such can make models perform abnormally on inputs with predefined triggers and still retain state-of-the-art performance clean data. While backdoor thoroughly investigated in the image domain from both attackers' defenders' sides, an analysis frequency has missing thus far.This paper first revisits existing perspective performs comprehensive analysis. Our results show that many current exhibit high-frequency artifacts, which persist across different datasets resolutions. We further demonstrate these artifacts enable simple way detect at detection rate of 98.50% without prior knowledge attack details target model. Acknowledging previous attacks' weaknesses, we propose practical create smooth study their detectability. defense works benefit by incorporating into design consideration. Moreover, detector tuned over stronger generalize well unseen weak triggers. In short, our work emphasizes importance considering when designing defenses

Language: Английский

Citations

104

Rethinking the Trigger of Backdoor Attack DOI

Yiming Li,

Tongqing Zhai,

Baoyuan Wu

et al.

arXiv (Cornell University), Journal Year: 2020, Volume and Issue: unknown

Published: Jan. 1, 2020

Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs), such that prediction of infected model will be maliciously changed if is activated by attacker-defined trigger, while it performs well on benign samples. Currently, most existing attacks adopted setting \emph{static} $i.e.,$ triggers across training and testing images follow same appearance are located in area. In this paper, we revisit paradigm analyzing characteristics static trigger. We demonstrate an vulnerable when trigger not consistent with one used for training. further explore how utilize property defense, discuss alleviate vulnerability attacks.

Language: Английский

Citations