Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift DOI Open Access

Shengwei An,

Sheng-Yen Chou,

Kaiyuan Zhang

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2024, Номер 38(10), С. 10847 - 10855

Опубликована: Март 24, 2024

Diffusion models (DM) have become state-of-the-art generative because of their capability generating high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with trigger white patch), the backdoored model always generates target image an improper photo). effective defense strategies mitigate backdoors DMs underexplored. To bridge this gap, we propose first detection and removal framework for DMs. We evaluate our Elijah on over hundreds 3 types including DDPM, NCSN LDM, 13 samplers against existing attacks. Extensive experiments show that approach can close 100% accuracy reduce effects zero significantly sacrificing utility.

Язык: Английский

Hidden Trigger Backdoor Attacks DOI Open Access
Aniruddha Saha,

Akshayvarun Subramanya,

Hamed Pirsiavash

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2020, Номер 34(07), С. 11957 - 11965

Опубликована: Апрель 3, 2020

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure models real world applications has become an important research topic. Backdoor are a form on networks where attacker provides poisoned data victim train model with, and then activates attack by showing specific small trigger pattern at test time. Most state-of-the-art backdoor either provide mislabeled poisoning that is possible identify visual inspection, reveal data, or use noise hide trigger. We propose novel look natural with correct labels also more importantly, hides keeps secret until perform extensive study image classification settings show our can fool pasting random locations unseen images although performs well clean data. proposed cannot be easily defended using defense algorithm for attacks.

Язык: Английский

Процитировано

405

ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation DOI Open Access
Yingqi Liu, Wen‐Chuan Lee, Guanhong Tao

и другие.

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Год журнала: 2019, Номер unknown, С. 1265 - 1282

Опубликована: Ноя. 6, 2019

This paper presents a technique to scan neural network based AI models determine if they are trojaned. Pre-trained may contain back-doors that injected through training or by transforming inner neuron weights. These trojaned operate normally when regular inputs provided, and mis-classify specific output label the input is stamped with some special pattern called trojan trigger. We develop novel analyzes behaviors determining how activations change we introduce different levels of stimulation neuron. The neurons substantially elevate activation particular regardless provided considered potentially compromised. Trojan trigger then reverse-engineered an optimization procedure using analysis results, confirm truly evaluate our system ABS on 177 various attack methods target both space feature space, have sizes shapes, together 144 benign trained data initial weight values. belong 7 model structures 6 datasets, including complex ones such as ImageNet, VGG-Face ResNet110. Our results show highly effective, can achieve over 90% detection rate for most cases (and many 100%), only one sample each label. It out-performs state-of-the-art Neural Cleanse requires lot samples small triggers good performance.

Язык: Английский

Процитировано

321

Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs DOI
Soheil Kolouri,

Aniruddha Saha,

Hamed Pirsiavash

и другие.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2020, Номер unknown, С. 298 - 307

Опубликована: Июнь 1, 2020

The unprecedented success of deep neural networks in many applications has made these a prime target for adversarial exploitation. In this paper, we introduce benchmark technique detecting backdoor attacks (aka Trojan attacks) on convolutional (CNNs). We the concept Universal Litmus Patterns (ULPs), which enable one to reveal by feeding universal patterns network and analyzing output (i.e., classifying as `clean' or `corrupted'). This detection is fast because it requires only few forward passes through CNN. demonstrate effectiveness ULPs thousands with different architectures trained four datasets, namely German Traffic Sign Recognition Benchmark (GTSRB), MNIST, CIFAR10, Tiny-ImageNet. codes train/test models paper can be found here: https://umbcvision.github.io/Universal-Litmus-Patterns/.

Язык: Английский

Процитировано

151

TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems DOI Creative Commons
Wenbo Guo, Lun Wang, Xinyu Xing

и другие.

arXiv (Cornell University), Год журнала: 2019, Номер unknown

Опубликована: Янв. 1, 2019

A trojan backdoor is a hidden pattern typically implanted in deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with particular trigger present fed to model. As such, given network clean samples, it very challenging inspect determine the existence of backdoor. Recently, researchers design develop several pioneering solutions address this acute problem. They demonstrate proposed techniques have great potential detection. However, we show none these existing completely On one hand, they mostly work under unrealistic assumption (e.g. assuming availability contaminated training database). other cannot accurately detect backdoors, nor restore high-fidelity images, especially triggers pertaining vary size, shape position. In work, propose TABOR, new detection technique. Conceptually, formalizes task as non-convex optimization problem, resolving through objective function. Different from technique also modeling TABOR designs function--under guidance explainable AI well heuristics--that guide identify more effective fashion. addition, defines metric measure quality identified. Using anomaly method, better facilitate intentionally injected filter out false alarms......

Язык: Английский

Процитировано

138

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements DOI
Xiaoyi Chen, Ahmed Salem, Dingfan Chen

и другие.

Annual Computer Security Applications Conference, Год журнала: 2021, Номер unknown

Опубликована: Дек. 6, 2021

Deep neural networks (DNNs) have progressed rapidly during the past decade and been deployed in various real-world applications. Meanwhile, DNN models shown to be vulnerable security privacy attacks. One such attack that has attracted a great deal of attention recently is backdoor attack. Specifically, adversary poisons target model's training set mislead any input with an added secret trigger class. Previous attacks predominantly focus on computer vision (CV) applications, as image classification. In this paper, we perform systematic investigation NLP models, propose BadNL, general framework including novel methods. three methods construct triggers, namely BadChar, BadWord, BadSentence, basic semantic-preserving variants. Our achieve almost perfect success rate negligible effect original utility. For instance, using our achieves 98.9% yielding utility improvement 1.5% SST-5 dataset when only poisoning 3% set. Moreover, conduct user study prove triggers can well preserve semantics from humans perspective.

Язык: Английский

Процитировано

102

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification DOI Open Access
Siyuan Cheng, Yingqi Liu, Shiqing Ma

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2021, Номер 35(2), С. 1148 - 1156

Опубликована: Май 18, 2021

Trojan (backdoor) attack is a form of adversarial on deep neural networks where the attacker provides victims with model trained/retrained malicious data. The backdoor can be activated when normal input stamped certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being space patches/objects (e.g., polygon solid color) or simple transformations such as Instagram filters. These are susceptible to recent detection algorithms. We propose novel feature five characteristics: effectiveness, stealthiness, controllability, robustness and reliance features. conduct extensive experiments 9 image classifiers various datasets including ImageNet demonstrate these properties show that our evade state-of-the-art defense.

Язык: Английский

Процитировано

83

Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems DOI Open Access
Bao Gia Doan,

Ehsan Abbasnejad,

Damith C. Ranasinghe

и другие.

Annual Computer Security Applications Conference, Год журнала: 2020, Номер unknown

Опубликована: Дек. 7, 2020

We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In attacks, an adversary activates backdoor crafted in deep neural network model using secret trigger, Trojan, applied any input alter the model's decision target prediction---a determined by only known attacker. Februus sanitizes incoming surgically removing potential trigger artifacts restoring for classification task. enables effective mitigation sanitizing inputs with no loss of performance sanitized inputs, Trojaned or benign. Our extensive evaluations multiple infected models based four popular datasets across three contrasting vision applications types demonstrate high efficacy Februus. dramatically reduced attack success rates from 100% near 0% all cases (achieving cases) evaluated generalizability defend against complex adaptive attacks; notably, we realized first defense advanced partial attack. To best our knowledge, is method operation run-time capable without requiring anomaly detection methods, retraining costly labeled data.

Язык: Английский

Процитировано

79

Poison Ink: Robust and Invisible Backdoor Attack DOI
Jie Zhang, Dongdong Chen, Qidong Huang

и другие.

IEEE Transactions on Image Processing, Год журнала: 2022, Номер 31, С. 5691 - 5705

Опубликована: Янв. 1, 2022

Recent research shows deep neural networks are vulnerable to different types of attacks, such as adversarial attack, data poisoning attack and backdoor attack. Among them, is the most cunning one can occur in almost every stage learning pipeline. Therefore, has attracted lots interests from both academia industry. However, existing methods either visible or fragile some effortless pre-processing common transformations. To address these limitations, we propose a robust invisible called "Poison Ink". Concretely, first leverage image structures target areas, fill them with poison ink (information) generate trigger pattern. As structure keep its semantic meaning during transformation, pattern inherently Then injection network embed into cover achieve stealthiness. Compared popular methods, Poison Ink outperforms stealthiness robustness. Through extensive experiments, demonstrate not only general datasets architectures, but also flexible for scenarios. Besides, it very strong resistance against many state-of-the-art defense techniques.

Язык: Английский

Процитировано

60

The "Beatrix" Resurrections: Robust Backdoor Detection via Gram Matrices DOI Open Access
Wanlun Ma, Derui Wang, Ruoxi Sun

и другие.

Опубликована: Янв. 1, 2023

Язык: Английский

Процитировано

32

A Survey on Neural Trojans DOI
Yuntao Liu, Ankit Mondal, Abhishek Chakraborty

и другие.

Опубликована: Март 1, 2020

Neural networks have become increasingly prevalent in many real-world applications including security critical ones. Due to the high hardware requirement and time consumption train high-performance neural network models, users often outsource training a machine-learning-as-a-service (MLaaS) provider. This puts integrity of trained model at risk. In 2017, Liu et al. found that, by mixing data with few malicious samples certain trigger pattern, hidden functionality can be embedded which evoked pattern [33]. We refer this kind as Trojans. paper, we survey myriad Trojan attack defense techniques that been proposed over last years. insertion attack, attacker MLaaS provider itself or third party capable adding tampering data. most research on attacks, selects Trojan's set input patterns will Trojan. Training poisoning is common way make acquire functionality. embedding methods modify algorithm directly interfere network's execution binary level also studied. Defense include detecting Trojans and/or patterns, erasing from model, bypassing It was shown carefully crafted used mitigate other types attacks. systematize above approaches paper.

Язык: Английский

Процитировано

54