MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic DOI
Hang Wang, Zhen Xiang,

David J. Miller

et al.

2022 IEEE Symposium on Security and Privacy (SP), Journal Year: 2024, Volume and Issue: 9, P. 1994 - 2012

Published: May 19, 2024

Language: Английский

Backdoor Learning: A Survey DOI
Yiming Li, Yong Jiang, Zhifeng Li

et al.

IEEE Transactions on Neural Networks and Learning Systems, Journal Year: 2022, Volume and Issue: 35(1), P. 5 - 22

Published: June 22, 2022

Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if backdoor is activated by attacker-specified triggers. This threat could happen when training process not fully controlled, such as third-party datasets or adopting models, which poses a new and realistic threat. Although learning an emerging rapidly growing research area, there still no comprehensive timely review of it. In this article, we present first survey realm. We summarize categorize existing attacks defenses based characteristics, provide unified framework for analyzing poisoning-based attacks. Besides, also analyze relation between relevant fields (i.e., adversarial data poisoning), widely adopted benchmark datasets. Finally, briefly outline certain future directions relying upon reviewed works. A curated list backdoor-related resources available at https://github.com/THUYimingLi/backdoor-learning-resources .

Language: Английский

Citations

343

Dynamic Backdoor Attacks Against Machine Learning Models DOI
Ahmed Salem, Rui Wen, Michael Backes

et al.

Published: June 1, 2022

Machine learning (ML) has made tremendous progress during the past decade and is being adopted in various critical real-world applications. However, recent research shown that ML models are vulnerable to multiple security privacy attacks. In particular, backdoor attacks against have recently raised a lot of awareness. A successful attack can cause severe consequences, such as allowing an adversary bypass authentication systems. Current backdooring techniques rely on adding static triggers (with fixed patterns locations) model inputs which prone detection by current mechanisms. this paper, we propose first class dynamic deep neural networks (DNN), namely Random Backdoor, Backdoor Generating Network (BaN), conditional (c-BaN). Triggers generated our random locations, reduce efficacy BaN c-BaN based novel generative network two schemes algorithmically generate triggers. Moreover, technique given target label, it target-specific trigger. Both essentially general framework renders flexibility for further customizing We extensively evaluate three benchmark datasets: MNIST, CelebA, CIFAR-10. Our achieve almost perfect performance back-doored data with negligible utility loss. show state-of-the-art defense mechanisms attacks, including ABS, Februus, MNTD, Neural Cleanse, STRIP.

Language: Английский

Citations

144

Adversarial machine learning : DOI Open Access
Apostol Vassilev,

Alina Oprea,

Alie Jean Fordyce

et al.

Published: Jan. 2, 2024

This NIST AI report develops a taxonomy of concepts and defines terminology in the field adversarial machine learning (AML). The is built on survey AML literature arranged conceptual hierarchy that includes key types ML methods lifecycle stage attack, attacker goals objectives, capabilities knowledge process. also provides corresponding for mitigating managing consequences attacks points out relevant open challenges to take into account systems. used consistent with complemented by glossary terms associated security systems intended assist non-expert readers. Taken together, are meant inform other standards future practice guides assessing systems, establishing common language understanding rapidly developing landscape.

Language: Английский

Citations

46

A survey of safety and trustworthiness of large language models through the lens of verification and validation DOI Creative Commons
Xiaowei Huang, Wenjie Ruan, Wei Huang

et al.

Artificial Intelligence Review, Journal Year: 2024, Volume and Issue: 57(7)

Published: June 17, 2024

Abstract Large language models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response fast adoption industrial applications, this survey concerns safety trustworthiness. First, we review known vulnerabilities limitations the LLMs, categorising them into inherent issues, attacks, unintended bugs. Then, consider if how Verification Validation (V&V) techniques, which been widely developed traditional software deep learning such as convolutional neural networks independent processes check alignment implementations against specifications, can be integrated further extended throughout lifecycle LLMs provide rigorous analysis trustworthiness applications. Specifically, four complementary techniques: falsification evaluation, verification, runtime monitoring, regulations ethical use. total, 370+ references are considered support quick understanding issues from perspective V&V. While intensive research has conducted identify yet practical methods called ensure requirements.

Language: Английский

Citations

33

Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive Survey DOI
Yichen Wan, Youyang Qu, Wei Ni

et al.

IEEE Communications Surveys & Tutorials, Journal Year: 2024, Volume and Issue: 26(3), P. 1861 - 1897

Published: Jan. 1, 2024

Due to the greatly improved capabilities of devices, massive data, and increasing concern about data privacy, Federated Learning (FL) has been increasingly considered for applications wireless communication networks (WCNs). Wireless FL (WFL) is a distributed method training global deep learning model in which large number participants each train local on their datasets then upload updates central server. However, general, nonindependent identically (non-IID) WCNs raises concerns robustness, as malicious participant could potentially inject "backdoor" into by uploading poisoned or models over WCN. This cause misclassify inputs specific target class while behaving normally with benign inputs. survey provides comprehensive review latest backdoor attacks defense mechanisms. It classifies them according targets (data poisoning poisoning), attack phase (local collection, training, aggregation), stage before aggregation, during after aggregation). The strengths limitations existing strategies mechanisms are analyzed detail. Comparisons methods designs carried out, pointing noteworthy findings, open challenges, potential future research directions related security privacy WFL.

Language: Английский

Citations

24

The rise and potential of large language model based agents: a survey DOI

Zhiheng Xi,

Wen-Xiang Chen, Xin Hua Guo

et al.

Science China Information Sciences, Journal Year: 2025, Volume and Issue: 68(2)

Published: Jan. 17, 2025

Language: Английский

Citations

22

APBAM: Adversarial Perturbation-driven Backdoor Attack in Multimodal Learning DOI
Shaobo Zhang, Wenli Chen, Xiong Li

et al.

Information Sciences, Journal Year: 2025, Volume and Issue: unknown, P. 121847 - 121847

Published: Jan. 1, 2025

Language: Английский

Citations

5

The "Beatrix" Resurrections: Robust Backdoor Detection via Gram Matrices DOI Open Access
Wanlun Ma, Derui Wang, Ruoxi Sun

et al.

Published: Jan. 1, 2023

Language: Английский

Citations

32

Black-Box Dataset Ownership Verification via Backdoor Watermarking DOI
Yiming Li, Mingyan Zhu, Xue Yang

et al.

IEEE Transactions on Information Forensics and Security, Journal Year: 2023, Volume and Issue: 18, P. 2318 - 2332

Published: Jan. 1, 2023

Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness efficiency. The rapid development of DNNs benefited from the existence some high-quality datasets ( e.g ., ImageNet), which allow researchers developers to easily verify performance their methods. Currently, almost all existing released require that they can only be academic or educational purposes rather than commercial without permission. However, there is still no good way ensure that. In this paper, we formulate protection as verifying whether are training a (suspicious) third-party model, where defenders query model while having information about parameters details. Based on formulation, propose embed external patterns via backdoor watermarking ownership verification protect them. Our method contains two main parts, including dataset verification. Specifically, exploit poison-only attacks BadNets) design hypothesis-test-guided We also provide theoretical analyses our Experiments multiple benchmark different tasks conducted, method. code reproducing experiments available at https://github.com/THUYimingLi/DVBW.

Language: Английский

Citations

32

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks DOI Creative Commons
Zhengyan Zhang, Guangxuan Xiao, Yongwei Li

et al.

Deleted Journal, Journal Year: 2023, Volume and Issue: 20(2), P. 180 - 193

Published: March 2, 2023

Abstract The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from Internet and fine-tune them on downstream datasets, while downloaded may suffer backdoor attacks. Different previous attacks aiming at a target task, we show that backdoored model can behave maliciously various tasks without foreknowing task information. Attackers restrict output representations (the values of neurons) trigger-embedded samples arbitrary predefined through additional training, namely neuron-level attack (NeuBA). Since fine-tuning little effect parameters, fine-tuned will retain functionality predict specific label embedded with same trigger. To provoke multiple labels attackers introduce several triggers contrastive values. In experiments both natural language processing (NLP) computer vision (CV), NeuBA well control predictions instances different trigger designs. Our findings sound red alarm wide use models. Finally, apply defense methods find pruning is promising technique resist by omitting neurons.

Language: Английский

Citations

25