Cited by Revisiting the fragility of influence functions

Fairness and Explainability for Enabling Trust in AI Systems DOI

Dimitris Sacharidis

Human-computer interaction series, Год журнала: 2024, Номер unknown, С. 85 - 110

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

Evolving Interpretable Visual Classifiers with Large Language Models DOI

Mia Chiquier,

Utkarsh Mall, Carl Vondrick

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 183 - 201

Опубликована: Окт. 30, 2024

Язык: Английский

Процитировано

FedMUA: Exploring the Vulnerabilities of Federated Learning to Malicious Unlearning Attacks DOI

Jian Chen,

Ziyuan Lin, Wanyu Lin

и другие.

IEEE Transactions on Information Forensics and Security, Год журнала: 2025, Номер 20, С. 1665 - 1678

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

Conversational Explanations: Discussing Explainable AI with Non-AI Experts DOI

Tong Zhang,

Mengxin Zhang,

Wei Yan Low

и другие.

Опубликована: Март 19, 2025

Язык: Английский

Процитировано

Class-wise federated unlearning: Harnessing active forgetting with teacher-student memory generation DOI

Yuyuan Li, Jiaming Zhang, Yixiu Liu

и другие.

Knowledge-Based Systems, Год журнала: 2025, Номер unknown, С. 113353 - 113353

Опубликована: Март 1, 2025

Язык: Английский

Процитировано

Responsible AI DOI

Giorgos Giannopoulos, Dimitris Sacharidis

Human-computer interaction series, Год журнала: 2025, Номер unknown, С. 619 - 644

Опубликована: Янв. 1, 2025

Процитировано

Interpreting Deep Learning Models in Natural Language Processing: A Review DOI

Xiaofei Sun, Diyi Yang, Xiaoya Li

и другие.

arXiv (Cornell University), Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, long-standing criticism against neural is the lack interpretability, which not only reduces reliability NLP systems but also limits scope their applications areas where interpretability essential (e.g., health care applications). In response, increasing interest interpreting has spurred diverse array interpretation methods over recent years. this survey, we provide comprehensive review various for NLP. We first stretch out high-level taxonomy NLP, i.e., training-based approaches, test-based and hybrid approaches. Next, describe sub-categories each category detail, e.g., influence-function based methods, KNN-based attention-based models, saliency-based perturbation-based etc. point deficiencies current suggest some avenues future research.

Язык: Английский

Процитировано

HILDIF: Interactive Debugging of NLI Models Using Influence Functions DOI

Hugo Zylberajch,

Piyawat Lertvittayakumjorn, Francesca Toni

и другие.

Опубликована: Янв. 1, 2021

Biases and artifacts in training data can cause unwelcome behavior text classifiers (such as shallow pattern matching), leading to lack of generalizability. One solution this problem is include users the loop leverage their feedback improve models. We propose a novel explanatory debugging pipeline called HILDIF, enabling humans deep using influence functions an explanation method. experiment on Natural Language Inference (NLI) task, showing that HILDIF effectively alleviate artifact problems fine-tuned BERT models result increased model

Язык: Английский

Процитировано

Evaluating Data Attribution for Text-to-Image Models DOI

Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu

и другие.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Год журнала: 2023, Номер unknown, С. 7158 - 7169

Опубликована: Окт. 1, 2023

While large text-to-image models are able to synthesize "novel" images, these images necessarily a reflection of the training data. The problem data attribution in such – which set most responsible for appearance given generated image is difficult yet important one. As an initial step toward this problem, we evaluate through "customization" methods, tune existing large-scale model exemplar object or style. Our key insight that allow us efficiently create synthetic computationally influenced by construction. With our new dataset exemplar-influenced various algorithms and different possible feature spaces. Furthermore, on dataset, can standard models, as DINO, CLIP, ViT, problem. Even though procedure tuned towards small sets, show generalization larger sets. Finally, taking into account inherent uncertainty assign soft scores over images.

Язык: Английский

Процитировано

Neural networks memorise personal information from one sample DOI

John Hartley,

Pedro P. Sanchez,

Fasih Haider

и другие.

Scientific Reports, Год журнала: 2023, Номер 13(1)

Опубликована: Дек. 4, 2023

Deep neural networks (DNNs) have achieved high accuracy in diagnosing multiple diseases/conditions at a large scale. However, number of concerns been raised about safeguarding data privacy and algorithmic bias the network models. We demonstrate that unique features (UFs), such as names, IDs, or other patient information can be memorised (and eventually leaked) by even when it occurs on single training sample within dataset. explain this memorisation phenomenon showing is more likely to occur UFs are an instance rare concept. propose methods identify whether given model does not memorise (known) feature. Importantly, our method require access therefore deployed external entity. conclude implications robustness, but also pose risk patients who consent use their for

Язык: Английский

Процитировано