Cited by Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

TextGuard: Provable Defense against Backdoor Attacks on Text Classification DOI

Hengzhi Pei, Jinyuan Jia, Wenbo Guo

и другие.

Опубликована: Янв. 1, 2024

Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications.Existing research endeavors proposed many defenses against backdoor attacks.Despite demonstrating certain empirical defense efficacy, none of these techniques could provide formal and provable guarantee arbitrary attacks.As result, they can be easily broken by strong adaptive attacks, as shown our evaluation.In this work, we propose TextGuard, the first on text classification.In particular, TextGuard divides (backdoored) training data into sub-training sets, achieved splitting each sentence sub-sentences.This partitioning ensures that majority sets do not contain trigger.Subsequently, base classifier is trained from set, their ensemble provides final prediction.We theoretically prove when length trigger falls within threshold, guarantees its prediction will remain unaffected presence triggers testing inputs.In evaluation, demonstrate effectiveness three benchmark classification tasks, surpassing certification accuracy existing certified attacks.Furthermore, additional strategies to enhance performance TextGuard.Comparisons with state-ofthe-art validate superiority countering multiple attacks.

Язык: Английский

Процитировано

Piracy Resistant Watermarks for Deep Neural Networks DOI

Huiying Li, Emily Wenger,

Shawn Shan

и другие.

arXiv (Cornell University), Год журнала: 2019, Номер unknown

Опубликована: Янв. 1, 2019

As companies continue to invest heavily in larger, more accurate and robust deep learning models, they are exploring approaches monetize their models while protecting intellectual property. Model licensing is promising, but requires a tool for owners claim ownership of i.e. watermark. Unfortunately, current designs have not been able address piracy attacks, where third parties falsely model by embedding own "pirate watermarks" into an already-watermarked model. We observe that resistance attacks fundamentally at odds with the use incremental training embed watermarks models. In this work, we propose null embedding, new way build piracy-resistant DNNs can only take place model's initial training. A takes bit string (watermark value) as input, builds strong dependencies between normal classification accuracy result, attackers cannot remove embedded watermark via tuning or training, add pirate already watermarked empirically show our proposed achieve other properties, over wide range tasks Finally, explore number adaptive counter-measures, remains against variety modifications, including fine-tuning, compression, existing methods detect/remove backdoors. Our also amenable transfer without losing properties.

Язык: Английский

Процитировано

Security of Distributed Intelligence in Edge Computing: Threats and Countermeasures DOI

Mohd. Samar Ansari, Saeed Hamood Alsamhi, Yuansong Qiao

и другие.

Palgrave studies in digital business & enabling technologies, Год журнала: 2020, Номер unknown, С. 95 - 122

Опубликована: Янв. 1, 2020

Rapid growth in the amount of data produced by IoT sensors and devices has led to advent edge computing wherein is processed at a point or near its origin. This facilitates lower latency, as well security privacy keeping localized node. However, due issues resource-constrained hardware software heterogeneities, most systems are prone large variety attacks. Furthermore, recent trend incorporating intelligence own such model poisoning, evasion chapter presents discussion on pertinent threats intelligence. Countermeasures deal with then discussed. Lastly, avenues for future research highlighted.

Язык: Английский

Процитировано

Defending Backdoor Attacks on Vision Transformer via Patch Processing DOI

Khoa D. Doan, Yingjie Lao, Peng Yang

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2023, Номер 37(1), С. 506 - 515

Опубликована: Июнь 26, 2023

Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along the improvement in performance, security and robustness of ViTs are also great importance to study. In contrast many recent works that exploit against adversarial examples, this paper investigates representative causative attack, i.e., backdoor. We first examine vulnerability various backdoor attacks find quite vulnerable existing attacks. However, we observe clean-data accuracy attack success rate respond distinctively patch transformations before positional encoding. Then, based on finding, propose an effective method for defend both patch-based blending-based trigger via processing. The performances evaluated several benchmark datasets, including CIFAR10, GTSRB, TinyImageNet, which show proposedds defense is very successful mitigating ViTs. To best our knowledge, presents defensive strategy utilizes unique characteristic

Язык: Английский

Процитировано

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift DOI

Shengwei An,

Sheng-Yen Chou,

Kaiyuan Zhang

и другие.

Proceedings of the AAAI Conference on Artificial Intelligence, Год журнала: 2024, Номер 38(10), С. 10847 - 10855

Опубликована: Март 24, 2024

Diffusion models (DM) have become state-of-the-art generative because of their capability generating high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with trigger white patch), the backdoored model always generates target image an improper photo). effective defense strategies mitigate backdoors DMs underexplored. To bridge this gap, we propose first detection and removal framework for DMs. We evaluate our Elijah on over hundreds 3 types including DDPM, NCSN LDM, 13 samplers against existing attacks. Extensive experiments show that approach can close 100% accuracy reduce effects zero significantly sacrificing utility.

Язык: Английский

Процитировано