Cited by Black-box Detection of Backdoor Attacks with Limited Information and Data

Poisoning Web-Scale Training Datasets is Practical DOI

Nicholas Carlini,

Matthew Jagielski,

Christopher A. Choquette-Choo

et al.

2022 IEEE Symposium on Security and Privacy (SP), Journal Year: 2024, Volume and Issue: 29, P. 407 - 425

Published: May 19, 2024

Language: Английский

Citations

Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive Survey DOI

Yichen Wan, Youyang Qu, Wei Ni

et al.

IEEE Communications Surveys & Tutorials, Journal Year: 2024, Volume and Issue: 26(3), P. 1861 - 1897

Published: Jan. 1, 2024

Due to the greatly improved capabilities of devices, massive data, and increasing concern about data privacy, Federated Learning (FL) has been increasingly considered for applications wireless communication networks (WCNs). Wireless FL (WFL) is a distributed method training global deep learning model in which large number participants each train local on their datasets then upload updates central server. However, general, nonindependent identically (non-IID) WCNs raises concerns robustness, as malicious participant could potentially inject "backdoor" into by uploading poisoned or models over WCN. This cause misclassify inputs specific target class while behaving normally with benign inputs. survey provides comprehensive review latest backdoor attacks defense mechanisms. It classifies them according targets (data poisoning poisoning), attack phase (local collection, training, aggregation), stage before aggregation, during after aggregation). The strengths limitations existing strategies mechanisms are analyzed detail. Comparisons methods designs carried out, pointing noteworthy findings, open challenges, potential future research directions related security privacy WFL.

Language: Английский

Citations

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review DOI

Yansong Gao, Bao Gia Doan, Zhi Zhang

et al.

arXiv (Cornell University), Journal Year: 2020, Volume and Issue: unknown

Published: Jan. 1, 2020

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to attacker's capability affected stage machine learning pipeline, attack surfaces are recognized be wide then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative post-deployment. Accordingly, under each categorization combed. The categorized four general classes: blind removal, offline inspection, online post removal. we countermeasures, compare analyze their advantages disadvantages. We have also reviewed flip side attacks, which explored for i) protecting intellectual property models, ii) acting as honeypot catch adversarial example iii) verifying deletion requested by contributor.Overall, research defense is far behind attack, there no single that can prevent all types attacks. In some cases, an attacker intelligently bypass existing defenses adaptive attack. Drawing insights from systematic review, present key areas future backdoor, such empirical security evaluations physical trigger in particular, more efficient practical solicited.

Language: Английский

Citations

122

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements DOI

Xiaoyi Chen, Ahmed Salem, Dingfan Chen

et al.

Annual Computer Security Applications Conference, Journal Year: 2021, Volume and Issue: unknown

Published: Dec. 6, 2021

Deep neural networks (DNNs) have progressed rapidly during the past decade and been deployed in various real-world applications. Meanwhile, DNN models shown to be vulnerable security privacy attacks. One such attack that has attracted a great deal of attention recently is backdoor attack. Specifically, adversary poisons target model's training set mislead any input with an added secret trigger class. Previous attacks predominantly focus on computer vision (CV) applications, as image classification. In this paper, we perform systematic investigation NLP models, propose BadNL, general framework including novel methods. three methods construct triggers, namely BadChar, BadWord, BadSentence, basic semantic-preserving variants. Our achieve almost perfect success rate negligible effect original utility. For instance, using our achieves 98.9% yielding utility improvement 1.5% SST-5 dataset when only poisoning 3% set. Moreover, conduct user study prove triggers can well preserve semantics from humans perspective.

Language: Английский

Citations

104

Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective DOI

Yi Zeng, Won Park, Z. Morley Mao

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2021, Volume and Issue: unknown

Published: Oct. 1, 2021

Backdoor attacks have been considered a severe security threat to deep learning. Such can make models perform abnormally on inputs with predefined triggers and still retain state-of-the-art performance clean data. While backdoor thoroughly investigated in the image domain from both attackers' defenders' sides, an analysis frequency has missing thus far.This paper first revisits existing perspective performs comprehensive analysis. Our results show that many current exhibit high-frequency artifacts, which persist across different datasets resolutions. We further demonstrate these artifacts enable simple way detect at detection rate of 98.50% without prior knowledge attack details target model. Acknowledging previous attacks' weaknesses, we propose practical create smooth study their detectability. defense works benefit by incorporating into design consideration. Moreover, detector tuned over stronger generalize well unseen weak triggers. In short, our work emphasizes importance considering when designing defenses

Language: Английский

Citations

104

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger DOI

Fanchao Qi,

Mukai Li,

Yangyi Chen

et al.

Published: Jan. 1, 2021

Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, Maosong Sun. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

Language: Английский

Citations

100

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification DOI

Siyuan Cheng, Yingqi Liu, Shiqing Ma

et al.

Proceedings of the AAAI Conference on Artificial Intelligence, Journal Year: 2021, Volume and Issue: 35(2), P. 1148 - 1156

Published: May 18, 2021

Trojan (backdoor) attack is a form of adversarial on deep neural networks where the attacker provides victims with model trained/retrained malicious data. The backdoor can be activated when normal input stamped certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being space patches/objects (e.g., polygon solid color) or simple transformations such as Instagram filters. These are susceptible to recent detection algorithms. We propose novel feature five characteristics: effectiveness, stealthiness, controllability, robustness and reliance features. conduct extensive experiments 9 image classifiers various datasets including ImageNet demonstrate these properties show that our evade state-of-the-art defense.

Language: Английский

Citations

Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems DOI

Bao Gia Doan,

Ehsan Abbasnejad,

Damith C. Ranasinghe

et al.

Annual Computer Security Applications Conference, Journal Year: 2020, Volume and Issue: unknown

Published: Dec. 7, 2020

We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In attacks, an adversary activates backdoor crafted in deep neural network model using secret trigger, Trojan, applied any input alter the model's decision target prediction---a determined by only known attacker. Februus sanitizes incoming surgically removing potential trigger artifacts restoring for classification task. enables effective mitigation sanitizing inputs with no loss of performance sanitized inputs, Trojaned or benign. Our extensive evaluations multiple infected models based four popular datasets across three contrasting vision applications types demonstrate high efficacy Februus. dramatically reduced attack success rates from 100% near 0% all cases (achieving cases) evaluated generalizability defend against complex adaptive attacks; notably, we realized first defense advanced partial attack. To best our knowledge, is method operation run-time capable without requiring anomaly detection methods, retraining costly labeled data.

Language: Английский

Citations

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution DOI

Fanchao Qi, Yuan Yao,

Sophia Xu

et al.

Published: Jan. 1, 2021

Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, Maosong Sun. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

Language: Английский

Citations

Concealed Data Poisoning Attacks on NLP Models DOI

Eric Wallace,

Tony Z. Zhao,

Shi Feng

et al.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Journal Year: 2021, Volume and Issue: unknown

Published: Jan. 1, 2021

Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is much less understood whether, and how, can be manipulated with small, concealed changes to the training data. In this work, we develop a new data poisoning attack that allows an adversary control whenever desired trigger phrase present in input. For instance, insert 50 poison examples into sentiment model’s set causes frequently predict Positive input contains “James Bond”. Crucially, craft these using gradient-based procedure so they do not mention phrase. We also apply our language modeling (“Apple iPhone” triggers negative generations) machine translation (“iced coffee” mistranslated as “hot coffee”). conclude proposing three defenses mitigate at some cost prediction accuracy or extra human annotation.

Language: Английский

Citations