Black-box Detection of Backdoor Attacks with Limited Information and Data DOI
Yinpeng Dong, Xiao Yang,

Zhijie Deng

и другие.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Год журнала: 2021, Номер unknown

Опубликована: Окт. 1, 2021

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable adversarial environments. A malicious backdoor could be embedded a model by poisoning the training dataset, whose intention is to make infected give wrong predictions during inference when specific trigger appears. To mitigate potential threats of attacks, various detection and defense methods been proposed. However, existing techniques usually require poisoned data or access white-box model, which commonly unavailable practice. In this paper, we propose black-box (B3D) method identify attacks with only query model. We introduce gradient-free optimization algorithm reverse-engineer for each class, helps reveal existence attacks. addition detection, also simple strategy reliable using identified backdoored models. Extensive experiments on hundreds DNN models trained several datasets corroborate effectiveness our under setting against

Язык: Английский

MetaPoison: Practical General-purpose Clean-label Data Poisoning DOI Creative Commons
Wei Huang,

Jonas Geiping,

Liam Fowl

и другие.

arXiv (Cornell University), Год журнала: 2020, Номер unknown

Опубликована: Янв. 1, 2020

Data poisoning -- the process by which an attacker takes control of a model making imperceptible changes to subset training data is emerging threat in context neural networks. Existing attacks for networks have relied on hand-crafted heuristics, because solving problem directly via bilevel optimization generally thought as intractable deep models. We propose MetaPoison, first-order method that approximates meta-learning and crafts poisons fool MetaPoison effective: it outperforms previous clean-label methods large margin. robust: poisoned made one transfer variety victim models with unknown settings architectures. general-purpose, works not only fine-tuning scenarios, but also end-to-end from scratch, till now hasn't been feasible nets. can achieve arbitrary adversary goals like using class make target image don label another arbitrarily chosen class. Finally, real-world. demonstrate first time successful trained black-box Google Cloud AutoML API. Code premade are provided at https://github.com/wronnyhuang/metapoison

Язык: Английский

Процитировано

76

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models DOI Creative Commons

Wenkai Yang,

Lei Li, Zhiyuan Zhang

и другие.

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Год журнала: 2021, Номер unknown, С. 2048 - 2058

Опубликована: Янв. 1, 2021

Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, Bin He. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

Язык: Английский

Процитировано

73

Anti-Backdoor Learning: Training Clean Models on Poisoned Data DOI Creative Commons
Yige Li,

Xixiang Lyu,

Nodens Koren

и другие.

arXiv (Cornell University), Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training can be devised prevent the backdoor triggers being injected into trained model in first place. In this paper, we introduce concept of \emph{anti-backdoor learning}, aiming train \emph{clean} models given backdoor-poisoned data. We frame overall learning process dual-task and \emph{backdoor} portions From view, identify two inherent characteristics attacks their weaknesses: 1) learn backdoored data much faster than with clean data, stronger converges data; 2) task tied specific class (the target class). Based these weaknesses, propose general scheme, Anti-Backdoor Learning (ABL), automatically during training. ABL introduces two-stage \emph{gradient ascent} mechanism for standard help isolate examples at an early stage, break correlation between later stage. Through extensive experiments multiple benchmark datasets against 10 state-of-the-art attacks, empirically show that ABL-trained achieve same performance they were purely Code available \url{https://github.com/bboylyg/ABL}.

Язык: Английский

Процитировано

73

Adversarial Attacks and Defenses in Deep Learning: From a Perspective of Cybersecurity DOI
Shuai Zhou, Chi Liu, Dayong Ye

и другие.

ACM Computing Surveys, Год журнала: 2022, Номер 55(8), С. 1 - 39

Опубликована: Июль 18, 2022

The outstanding performance of deep neural networks has promoted learning applications in a broad set domains. However, the potential risks caused by adversarial samples have hindered large-scale deployment learning. In these scenarios, perturbations, imperceptible to human eyes, significantly decrease model’s final performance. Many papers been published on attacks and their countermeasures realm Most focus evasion attacks, where examples are found at test time, as opposed poisoning poisoned data is inserted into training data. Further, it difficult evaluate real threat or robustness model, there no standard evaluation methods. Hence, with this article, we review literature date. Additionally, attempt offer first analysis framework for systematic understanding attacks. built from perspective cybersecurity provide lifecycle defenses.

Язык: Английский

Процитировано

67

Black-box Detection of Backdoor Attacks with Limited Information and Data DOI
Yinpeng Dong, Xiao Yang,

Zhijie Deng

и другие.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Год журнала: 2021, Номер unknown

Опубликована: Окт. 1, 2021

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable adversarial environments. A malicious backdoor could be embedded a model by poisoning the training dataset, whose intention is to make infected give wrong predictions during inference when specific trigger appears. To mitigate potential threats of attacks, various detection and defense methods been proposed. However, existing techniques usually require poisoned data or access white-box model, which commonly unavailable practice. In this paper, we propose black-box (B3D) method identify attacks with only query model. We introduce gradient-free optimization algorithm reverse-engineer for each class, helps reveal existence attacks. addition detection, also simple strategy reliable using identified backdoored models. Extensive experiments on hundreds DNN models trained several datasets corroborate effectiveness our under setting against

Язык: Английский

Процитировано

64