Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift DOI Open Access

Shengwei An,

Sheng-Yen Chou,

Kaiyuan Zhang

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2024, Номер 38(10), С. 10847 - 10855

Опубликована: Март 24, 2024

Diffusion models (DM) have become state-of-the-art generative because of their capability generating high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with trigger white patch), the backdoored model always generates target image an improper photo). effective defense strategies mitigate backdoors DMs underexplored. To bridge this gap, we propose first detection and removal framework for DMs. We evaluate our Elijah on over hundreds 3 types including DDPM, NCSN LDM, 13 samplers against existing attacks. Extensive experiments show that approach can close 100% accuracy reduce effects zero significantly sacrificing utility.

Язык: Английский

Backdoor Pre-trained Models Can Transfer to All DOI

Lujia Shen,

Shouling Ji, Xuhong Zhang

и другие.

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Год журнала: 2021, Номер unknown, С. 3141 - 3158

Опубликована: Ноя. 12, 2021

Pre-trained general-purpose language models have been a dominating component in enabling real-world natural processing (NLP) applications. However, pre-trained model with backdoor can be severe threat to the Most existing attacks NLP are conducted fine-tuning phase by introducing malicious triggers targeted class, thus relying greatly on prior knowledge of task. In this paper, we propose new approach map inputs containing directly predefined output representation models, e.g., for classification token BERT, instead target label. It introduce wide range downstream tasks without any knowledge. Additionally, light unique properties NLP, two metrics measure performance terms both effectiveness and stealthiness. Our experiments various types show that our method is widely applicable different (classification named entity recognition) (such as XLNet, BART), which poses threat. Furthermore, collaborating popular online repository Hugging Face, brought has confirmed. Finally, analyze factors may affect attack share insights causes success attack.

Язык: Английский

Процитировано

53

Rethinking Stealthiness of Backdoor Attack against NLP Models DOI Creative Commons

Wenkai Yang,

Yankai Lin, Peng Li

и другие.

Опубликована: Янв. 1, 2021

Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

Язык: Английский

Процитировано

52

REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data DOI Open Access
Xinyun Chen, Wenxiao Wang,

Chris Bender

и другие.

Опубликована: Май 24, 2021

Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained potential copyright infringements. However, these vulnerable watermark removal attacks. In this work, we propose REFIT, unified framework based on fine-tuning, which does not rely knowledge watermarks, is effective against wide range schemes. particular, conduct comprehensive study realistic attack scenario where adversary limited data, been emphasized in prior attacks To effectively remove watermarks without compromising model functionality under weak threat model, two that are incorporated into our fine-tuning framework: (1) an adaption elastic weight consolidation (EWC) algorithm, originally proposed for mitigating catastrophic forgetting phenomenon; (2) unlabeled data augmentation (AU), leverage auxiliary other sources. Our extensive evaluation shows effectiveness REFIT diverse embedding both EWC AU significantly decrease amount labeled needed removal, samples used do necessarily need drawn same distribution as benign evaluation. The experimental results demonstrate pose real threats models, thus highlight importance further investigating problem proposing more robust schemes

Язык: Английский

Процитировано

45

A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives DOI Open Access
Yudong Gao, Honglong Chen, Peng Sun

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2024, Номер 38(3), С. 1851 - 1859

Опубликована: Март 24, 2024

Backdoor attacks pose serious security threats to deep neural networks (DNNs). Backdoored models make arbitrarily (targeted) incorrect predictions on inputs containing well-designed triggers, while behaving normally clean inputs. Prior researches have explored the invisibility of backdoor triggers enhance attack stealthiness. However, most them only focus in spatial domain, neglecting generation invisible frequency domain. This limitation renders generated poisoned images easily detectable by recent defense methods. To address this issue, we propose a DUal stealthy BAckdoor method named DUBA, which simultaneously considers both and domains, achieve desirable performance, ensuring strong Specifically, first use Wavelet Transform embed high-frequency information trigger image into ensure effectiveness. Then, attain stealthiness, incorporate Fourier Cosine mix Moreover, DUBA adopts novel strategy, training model with weak attacking further performance is evaluated extensively four datasets against popular classifiers, showing significant superiority over state-of-the-art success rate

Язык: Английский

Процитировано

7

Towards Inspecting and Eliminating Trojan Backdoors in Deep Neural Networks DOI
Wenbo Guo, Lun Wang, Yan Xu

и другие.

2021 IEEE International Conference on Data Mining (ICDM), Год журнала: 2020, Номер unknown, С. 162 - 171

Опубликована: Ноя. 1, 2020

A trojan backdoor is a hidden pattern typically implanted in deep neural network (DNN). It could be activated and thus forces that infected model to behave abnormally when an input sample with particular trigger fed model. As such, given DNN clean samples, it challenging inspect determine the existence of backdoor. Recently, researchers design develop several pioneering solutions address this problem. They demonstrate proposed techniques have great potential detection. However, we show none these existing completely On one hand, they mostly work under unrealistic assumption assuming availability contaminated training database. other can neither accurately detect backdoors, nor restore high-fidelity triggers, especially models are trained high-dimensional data, triggers pertaining vary size, shape, position. In work, propose TABOR, new detection technique. Conceptually, formalizes as solving optimization objective function. Different from technique which also problem, TABOR first designs function guide identify more correctly accurately. Second, borrows idea interpretable AI further prune restored triggers. Last, anomaly method, not only facilitate identification intentionally injected but filter out false alarms (i.e., detected uninfected model). We train 112 DNNs on five datasets infect two attacks. evaluate by using models, has much better performance restoration, detection, elimination than Neural Cleanse, state-of-the-art

Язык: Английский

Процитировано

41

Privacy and Trust Redefined in Federated Machine Learning DOI Creative Commons
Pavlos Papadopoulos, Will Abramson,

Adam James Hall

и другие.

Machine Learning and Knowledge Extraction, Год журнала: 2021, Номер 3(2), С. 333 - 356

Опубликована: Март 29, 2021

A common privacy issue in traditional machine learning is that data needs to be disclosed for the training procedures. In situations with highly sensitive such as healthcare records, accessing this information challenging and often prohibited. Luckily, privacy-preserving technologies have been developed overcome hurdle by distributing computation of ensuring their owners. The distribution multiple participating entities introduces new complications risks. paper, we present a decentralised workflow facilitates trusted federated among participants. Our proof-of-concept defines trust framework instantiated using identity being under Hyperledger projects Aries/Indy/Ursa. Only possession Verifiable Credentials issued from appropriate authorities are able establish secure, authenticated communication channels authorised participate related mental health data.

Язык: Английский

Процитировано

39

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models DOI Creative Commons

Wenkai Yang,

Yankai Lin, Peng Li

и другие.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Backdoor attacks, which maliciously control a well-trained model’s outputs of the instances with specific triggers, are recently shown to be serious threats safety reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing backdoor training process, point out that there exists big gap robustness between poisoned and clean samples. Motivated observation, construct word-based perturbation distinguish samples from defend against attacks natural language processing (NLP) models. Moreover, give theoretical analysis about feasibility our perturbation-based method. Experimental results sentiment toxic detection tasks show method achieves better defending performance much lower computational costs than existing methods. Our code is available at https://github.com/lancopku/RAP.

Язык: Английский

Процитировано

39

Better Trigger Inversion Optimization in Backdoor Scanning DOI
Guanhong Tao,

Guangyu Shen,

Yingqi Liu

и другие.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2022, Номер unknown, С. 13358 - 13368

Опубликована: Июнь 1, 2022

Backdoor attacks aim to cause misclassification of a subject model by stamping trigger inputs. Backdoors could be injected through malicious training and naturally exist. Deriving backdoor for is critical both attack defense. A popular inversion method optimization. Existing methods are based on finding smallest that can uniformly flip set input samples minimizing mask. The mask defines the pixels ought perturbed. We develop new optimization directly minimizes individual pixel changes, without using Our experiments show compared existing methods, one generate triggers require smaller number perturbed, have higher success rate, more robust. They hence desirable when used in real-world effective also cost-effective.

Язык: Английский

Процитировано

27

Backdoor Attacks Against Dataset Distillation DOI Open Access

Yugeng Liu,

Zheng Li, Michael Backes

и другие.

Опубликована: Янв. 1, 2023

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models.It encapsulates the knowledge from large dataset into smaller synthetic dataset.A model trained on this distilled can attain comparable performance original dataset.However, existing techniques mainly aim at achieving best trade-off between resource usage and utility.The security risks stemming them have not been explored.This study performs first backdoor attack against models by in image domain.Concretely, we inject triggers during procedure rather than stage, where all previous attacks are performed.We propose two types of attacks, namely NAIVEATTACK DOORPING.NAIVEATTACK simply adds raw initial phase, while DOORPING iteratively updates entire procedure.We conduct extensive evaluations multiple datasets, architectures, techniques.Empirical evaluation shows that achieves decent success rate (ASR) scores some cases, reaches higher ASR (close 1.0) cases.Furthermore, comprehensive ablation analyze factors may affect performance.Finally, evaluate defense mechanisms our show practically circumvent these mechanisms.

Язык: Английский

Процитировано

15

Athena: Probabilistic Verification of Machine Unlearning DOI Creative Commons
David Sommer, Liwei Song,

Sameer Wagh

и другие.

Proceedings on Privacy Enhancing Technologies, Год журнала: 2022, Номер 2022(3), С. 268 - 290

Опубликована: Июль 1, 2022

The right to be forgotten, also known as the erasure, is of individuals have their data erased from an entity storing it. status this long held notion was legally solidified recently by General Data Protection Regulation (GDPR) in European Union. As a consequence, there need for mechanisms whereby users can verify if service providers comply with deletion requests. In work, we take first step proposing formal framework, called Athena, study design such verification requests – machine unlearning context systems that provide learning (MLaaS). Athena allows rigorous quantification any mechanism based on hypothesis testing. Furthermore, propose novel leverages backdoors and demonstrate its effectiveness certifying high confidence, thus providing basis quantitatively inferring unlearning. We evaluate our approach over range network architectures multi-layer perceptrons (MLP), convolutional neural networks (CNN), residual (ResNet), short-term memory (LSTM) 6 different datasets. that: (1) has minimal effect accuracy ML but provides confidence unlearning, even multiple employ system ascertain compliance requests, (2) robust against servers deploying state-of-the-art backdoor defense methods. Overall, foundation quantitative analysis verifying which support legal regulatory frameworks pertaining users’

Язык: Английский

Процитировано

21