Better Trigger Inversion Optimization in Backdoor Scanning DOI
Guanhong Tao,

Guangyu Shen,

Yingqi Liu

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2022, Volume and Issue: unknown, P. 13358 - 13368

Published: June 1, 2022

Backdoor attacks aim to cause misclassification of a subject model by stamping trigger inputs. Backdoors could be injected through malicious training and naturally exist. Deriving backdoor for is critical both attack defense. A popular inversion method optimization. Existing methods are based on finding smallest that can uniformly flip set input samples minimizing mask. The mask defines the pixels ought perturbed. We develop new optimization directly minimizes individual pixel changes, without using Our experiments show compared existing methods, one generate triggers require smaller number perturbed, have higher success rate, more robust. They hence desirable when used in real-world effective also cost-effective.

Language: Английский

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks DOI Creative Commons
Yansong Gao, Chang Xu, Derui Wang

et al.

arXiv (Cornell University), Journal Year: 2019, Volume and Issue: unknown

Published: Jan. 1, 2019

A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty interpretability learned to misclassify any inputs signed with attacker's chosen trigger. Since trigger secret guarded and exploited attacker, detecting such challenge, especially at run-time when are active operation. This work builds STRong Intentional Perturbation (STRIP) based detection system focuses vision system. We intentionally perturb incoming input, for instance superimposing various image patterns, observe randomness predicted classes perturbed from given deployed model---malicious or benign. low entropy violates input-dependence property benign implies presence malicious input---a characteristic trojaned input. The high efficacy our method validated through case studies three popular contrasting datasets: MNIST, CIFAR10 GTSRB. achieve overall false acceptance rate (FAR) less than 1%, preset rejection (FRR) different types triggers. Using GTSRB, we have empirically achieved result 0% both FRR FAR. also evaluated STRIP robustness against number variants adaptive

Language: Английский

Citations

83

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification DOI Open Access
Siyuan Cheng, Yingqi Liu, Shiqing Ma

et al.

Proceedings of the AAAI Conference on Artificial Intelligence, Journal Year: 2021, Volume and Issue: 35(2), P. 1148 - 1156

Published: May 18, 2021

Trojan (backdoor) attack is a form of adversarial on deep neural networks where the attacker provides victims with model trained/retrained malicious data. The backdoor can be activated when normal input stamped certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being space patches/objects (e.g., polygon solid color) or simple transformations such as Instagram filters. These are susceptible to recent detection algorithms. We propose novel feature five characteristics: effectiveness, stealthiness, controllability, robustness and reliance features. conduct extensive experiments 9 image classifiers various datasets including ImageNet demonstrate these properties show that our evade state-of-the-art defense.

Language: Английский

Citations

83

Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems DOI Open Access
Bao Gia Doan,

Ehsan Abbasnejad,

Damith C. Ranasinghe

et al.

Annual Computer Security Applications Conference, Journal Year: 2020, Volume and Issue: unknown

Published: Dec. 7, 2020

We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In attacks, an adversary activates backdoor crafted in deep neural network model using secret trigger, Trojan, applied any input alter the model's decision target prediction---a determined by only known attacker. Februus sanitizes incoming surgically removing potential trigger artifacts restoring for classification task. enables effective mitigation sanitizing inputs with no loss of performance sanitized inputs, Trojaned or benign. Our extensive evaluations multiple infected models based four popular datasets across three contrasting vision applications types demonstrate high efficacy Februus. dramatically reduced attack success rates from 100% near 0% all cases (achieving cases) evaluated generalizability defend against complex adaptive attacks; notably, we realized first defense advanced partial attack. To best our knowledge, is method operation run-time capable without requiring anomaly detection methods, retraining costly labeled data.

Language: Английский

Citations

79

Black-box Detection of Backdoor Attacks with Limited Information and Data DOI
Yinpeng Dong, Xiao Yang,

Zhijie Deng

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2021, Volume and Issue: unknown

Published: Oct. 1, 2021

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable adversarial environments. A malicious backdoor could be embedded a model by poisoning the training dataset, whose intention is to make infected give wrong predictions during inference when specific trigger appears. To mitigate potential threats of attacks, various detection and defense methods been proposed. However, existing techniques usually require poisoned data or access white-box model, which commonly unavailable practice. In this paper, we propose black-box (B3D) method identify attacks with only query model. We introduce gradient-free optimization algorithm reverse-engineer for each class, helps reveal existence attacks. addition detection, also simple strategy reliable using identified backdoored models. Extensive experiments on hundreds DNN models trained several datasets corroborate effectiveness our under setting against

Language: Английский

Citations

64

Poison Ink: Robust and Invisible Backdoor Attack DOI
Jie Zhang, Dongdong Chen, Qidong Huang

et al.

IEEE Transactions on Image Processing, Journal Year: 2022, Volume and Issue: 31, P. 5691 - 5705

Published: Jan. 1, 2022

Recent research shows deep neural networks are vulnerable to different types of attacks, such as adversarial attack, data poisoning attack and backdoor attack. Among them, is the most cunning one can occur in almost every stage learning pipeline. Therefore, has attracted lots interests from both academia industry. However, existing methods either visible or fragile some effortless pre-processing common transformations. To address these limitations, we propose a robust invisible called "Poison Ink". Concretely, first leverage image structures target areas, fill them with poison ink (information) generate trigger pattern. As structure keep its semantic meaning during transformation, pattern inherently Then injection network embed into cover achieve stealthiness. Compared popular methods, Poison Ink outperforms stealthiness robustness. Through extensive experiments, demonstrate not only general datasets architectures, but also flexible for scenarios. Besides, it very strong resistance against many state-of-the-art defense techniques.

Language: Английский

Citations

60

NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations DOI Creative Commons
Xijie Huang, Moustafa Alzantot, Mani Srivastava

et al.

arXiv (Cornell University), Journal Year: 2019, Volume and Issue: unknown

Published: Jan. 1, 2019

Deep neural networks have achieved state-of-the-art performance on various tasks. However, lack of interpretability and transparency makes it easier for malicious attackers to inject trojan backdoor into the networks, which will make model behave abnormally when a sample with specific trigger is input. In this paper, we propose NeuronInspect, framework detect backdoors in deep via output explanation techniques. NeuronInspect first identifies existence attack targets by generating heatmap layer. We observe that generated heatmaps from clean backdoored models different characteristics. Therefore extract features measure attributes explanations an attacked namely: sparse, smooth persistent. combine these use outlier detection figure out outliers, set targets. demonstrate effectiveness efficiency MNIST digit recognition dataset GTSRB traffic sign dataset. extensively evaluate scenarios prove better robustness over techniques Neural Cleanse great margin.

Language: Английский

Citations

67

DAWN: Dynamic Adversarial Watermarking of Neural Networks DOI Creative Commons

Sebastian Szyller,

Buse Gul Atli,

Samuel Marchal

et al.

arXiv (Cornell University), Journal Year: 2019, Volume and Issue: unknown

Published: Jan. 1, 2019

Training machine learning (ML) models is expensive in terms of computational power, amounts labeled data and human expertise. Thus, ML constitute intellectual property (IP) business value for their owners. Embedding digital watermarks during model training allows a owner to later identify case theft or misuse. However, functionality can also be stolen via extraction, where an adversary trains surrogate using results returned from prediction API the original model. Recent work has shown that extraction realistic threat. Existing watermarking schemes are ineffective against IP since it who In this paper, we introduce DAWN (Dynamic Adversarial Watermarking Neural Networks), first approach use deter theft. Unlike prior schemes, does not impose changes process but operates at protected model, by dynamically changing responses small subset queries (e.g., <0.5%) clients. This set watermark will embedded client uses its train We show resilient two state-of-the-art attacks, effectively all extracted models, allowing owners reliably demonstrate ownership (with confidence $>1- 2^{-64}$), incurring negligible loss accuracy (0.03-0.5%).

Language: Английский

Citations

61

On Certifying Robustness against Backdoor Attacks via Randomized Smoothing DOI Creative Commons
Binghui Wang, Xiaoyu Cao, Jinyuan Jia

et al.

arXiv (Cornell University), Journal Year: 2020, Volume and Issue: unknown

Published: Jan. 1, 2020

Backdoor attack is a severe security threat to deep neural networks (DNNs). We envision that, like adversarial examples, there will be cat-and-mouse game for backdoor attacks, i.e., new empirical defenses are developed defend against attacks but they soon broken by strong adaptive attacks. To prevent such game, we take the first step towards certified Specifically, in this work, study feasibility and effectiveness of certifying robustness using recent technique called randomized smoothing. Randomized smoothing was originally certify examples. generalize Our results show theoretical However, also find that existing methods have limited at defending which highlight needs theory

Language: Английский

Citations

55

A Survey on Neural Trojans DOI
Yuntao Liu, Ankit Mondal, Abhishek Chakraborty

et al.

Published: March 1, 2020

Neural networks have become increasingly prevalent in many real-world applications including security critical ones. Due to the high hardware requirement and time consumption train high-performance neural network models, users often outsource training a machine-learning-as-a-service (MLaaS) provider. This puts integrity of trained model at risk. In 2017, Liu et al. found that, by mixing data with few malicious samples certain trigger pattern, hidden functionality can be embedded which evoked pattern [33]. We refer this kind as Trojans. paper, we survey myriad Trojan attack defense techniques that been proposed over last years. insertion attack, attacker MLaaS provider itself or third party capable adding tampering data. most research on attacks, selects Trojan's set input patterns will Trojan. Training poisoning is common way make acquire functionality. embedding methods modify algorithm directly interfere network's execution binary level also studied. Defense include detecting Trojans and/or patterns, erasing from model, bypassing It was shown carefully crafted used mitigate other types attacks. systematize above approaches paper.

Language: Английский

Citations

54

REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data DOI Open Access
Xinyun Chen, Wenxiao Wang,

Chris Bender

et al.

Published: May 24, 2021

Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained potential copyright infringements. However, these vulnerable watermark removal attacks. In this work, we propose REFIT, unified framework based on fine-tuning, which does not rely knowledge watermarks, is effective against wide range schemes. particular, conduct comprehensive study realistic attack scenario where adversary limited data, been emphasized in prior attacks To effectively remove watermarks without compromising model functionality under weak threat model, two that are incorporated into our fine-tuning framework: (1) an adaption elastic weight consolidation (EWC) algorithm, originally proposed for mitigating catastrophic forgetting phenomenon; (2) unlabeled data augmentation (AU), leverage auxiliary other sources. Our extensive evaluation shows effectiveness REFIT diverse embedding both EWC AU significantly decrease amount labeled needed removal, samples used do necessarily need drawn same distribution as benign evaluation. The experimental results demonstrate pose real threats models, thus highlight importance further investigating problem proposing more robust schemes

Language: Английский

Citations

45