Published: Oct. 25, 2024
Language: Английский
Published: Oct. 25, 2024
Language: Английский
IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2023, Volume and Issue: 46(1), P. 150 - 170
Published: Oct. 10, 2023
Recent success of deep learning is largely attributed to the sheer amount data used for training neural networks. Despite unprecedented success, massive data, unfortunately, significantly increases burden on storage and transmission further gives rise a cumbersome model process. Besides, relying raw per se yields concerns about privacy copyright. To alleviate these shortcomings, dataset distillation (DD), also known as condensation (DC), was introduced has recently attracted much research attention in community. Given an original dataset, DD aims derive smaller containing synthetic samples, based which trained models yield performance comparable with those dataset. In this paper, we give comprehensive review summary recent advances its application. We first introduce task formally propose overall algorithmic framework followed by all existing methods. Next, provide systematic taxonomy current methodologies area, discuss their theoretical interconnections. present challenges through extensive empirical studies envision possible directions future works.
Language: Английский
Citations
58IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2023, Volume and Issue: 46(1), P. 17 - 32
Published: Oct. 6, 2023
Deep learning technology has developed unprecedentedly in the last decade and become primary choice many application domains. This progress is mainly attributed to a systematic collaboration which rapidly growing computing resources encourage advanced algorithms deal with massive data. However, it gradually challenging handle unlimited growth of data limited power. To this end, diverse approaches are proposed improve processing efficiency. Dataset distillation, dataset reduction method, addresses problem by synthesizing small typical from substantial attracted much attention deep community. Existing distillation methods can be taxonomized into meta-learning matching frameworks according whether they explicitly mimic performance target Although shown surprising compressing datasets, there still several limitations such as distilling high-resolution or complex label spaces. paper provides holistic understanding multiple aspects, including algorithms, factorized comparison, applications. Finally, we discuss challenges promising directions further promote future studies on distillation.
Language: Английский
Citations
35e-Prime - Advances in Electrical Engineering Electronics and Energy, Journal Year: 2025, Volume and Issue: unknown, P. 100909 - 100909
Published: Jan. 1, 2025
Language: Английский
Citations
1Published: Jan. 1, 2023
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models.It encapsulates the knowledge from large dataset into smaller synthetic dataset.A model trained on this distilled can attain comparable performance original dataset.However, existing techniques mainly aim at achieving best trade-off between resource usage and utility.The security risks stemming them have not been explored.This study performs first backdoor attack against models by in image domain.Concretely, we inject triggers during procedure rather than stage, where all previous attacks are performed.We propose two types of attacks, namely NAIVEATTACK DOORPING.NAIVEATTACK simply adds raw initial phase, while DOORPING iteratively updates entire procedure.We conduct extensive evaluations multiple datasets, architectures, techniques.Empirical evaluation shows that achieves decent success rate (ASR) scores some cases, reaches higher ASR (close 1.0) cases.Furthermore, comprehensive ablation analyze factors may affect performance.Finally, evaluate defense mechanisms our show practically circumvent these mechanisms.
Language: Английский
Citations
15Sensors, Journal Year: 2025, Volume and Issue: 25(8), P. 2368 - 2368
Published: April 8, 2025
The accurate and efficient classification of network traffic, including malicious is essential for effective management, cybersecurity, resource optimization. However, traffic methods in modern, complex, dynamic networks face significant challenges, particularly at the edge, where resources are limited issues such as privacy concerns concept drift arise. Condensation techniques offer a solution by reducing data size, simplifying complex models, transferring knowledge from data. This paper explores condensation methods—such coreset selection, compression, distillation, dataset distillation—within context tasks. It clarifies relationship between these classification, introducing each method its typical applications. also outlines potential scenarios applying technique, highlighting associated challenges open research issues. To best our knowledge, this first comprehensive summary specifically tailored
Language: Английский
Citations
0Published: Aug. 1, 2023
Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density, dataset offers a range potential applications, including support for continual learning, neural architecture search, privacy protection. Despite recent advances, we lack holistic understanding approaches applications. Our survey aims bridge this gap by first proposing taxonomy distillation, characterizing existing approaches, then systematically reviewing data modalities, related In addition, summarize challenges discuss future directions field research.
Language: Английский
Citations
8Neural Networks, Journal Year: 2024, Volume and Issue: 172, P. 106154 - 106154
Published: Jan. 29, 2024
Language: Английский
Citations
2IEEE Transactions on Information Forensics and Security, Journal Year: 2023, Volume and Issue: 18, P. 5848 - 5859
Published: Jan. 1, 2023
Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates protection subjects where data is distributed across multiple users in cross-silo FL setting. In this paper, we propose a subject-level attack based on augmentation model discrepancy. It can effectively infer whether distribution target subject has been sampled used for training by specific federated users, even if other (also) may sample from same use it as part their set. Specifically, adversary uses generative adversarial network (GAN) perform small amount priori federation-associated information known advance. Subsequently, aggregates two different outputs global tested user using an optimal feature construction method. We simulate controlled federation configuration conduct extensive experiments real datasets that include both image categorical data. Results show area under curve (AUC) improved 12.6% 16.8% compared classical attack. This at expense test accuracy augmented with GAN, which most 3.5% lower than also explore degree leakage between overfitted well-generalized setting conclude experimentally former more likely leak degradation rate up 0.43. Finally, present possible defense mechanisms attenuate newly discovered risk.
Language: Английский
Citations
4Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 334 - 351
Published: Nov. 20, 2024
Language: Английский
Citations
1arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown
Published: Jan. 1, 2023
Deep learning technology has developed unprecedentedly in the last decade and become primary choice many application domains. This progress is mainly attributed to a systematic collaboration which rapidly growing computing resources encourage advanced algorithms deal with massive data. However, it gradually challenging handle unlimited growth of data limited power. To this end, diverse approaches are proposed improve processing efficiency. Dataset distillation, dataset reduction method, addresses problem by synthesizing small typical from substantial attracted much attention deep community. Existing distillation methods can be taxonomized into meta-learning matching frameworks according whether they explicitly mimic performance target Although shown surprising compressing datasets, there still several limitations such as distilling high-resolution or complex label spaces. paper provides holistic understanding multiple aspects, including algorithms, factorized comparison, applications. Finally, we discuss challenges promising directions further promote future studies on distillation.
Language: Английский
Citations
1