Neural Networks, Год журнала: 2023, Номер 162, С. 581 - 588
Опубликована: Март 24, 2023
Язык: Английский
Neural Networks, Год журнала: 2023, Номер 162, С. 581 - 588
Опубликована: Март 24, 2023
Язык: Английский
ACM Transactions on Intelligent Systems and Technology, Год журнала: 2024, Номер 15(2), С. 1 - 38
Опубликована: Янв. 2, 2024
Large language models (LLMs) have demonstrated impressive capabilities in natural processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding explaining these is crucial elucidating behaviors, limitations, social impacts. In article, we introduce a taxonomy explainability techniques provide structured overview methods Transformer-based models. We categorize based on the training paradigms LLMs: traditional fine-tuning-based paradigm prompting-based paradigm. For each paradigm, summarize goals dominant approaches generating local explanations individual predictions global overall model knowledge. also discuss metrics evaluating generated how can be leveraged to debug improve performance. Lastly, examine key challenges emerging opportunities explanation era LLMs comparison conventional deep learning
Язык: Английский
Процитировано
191ACM Computing Surveys, Год журнала: 2022, Номер 55(8), С. 1 - 42
Опубликована: Июль 9, 2022
Neural networks for NLP are becoming increasingly complex and widespread, there is a growing concern if these models responsible to use. Explaining helps address the safety ethical concerns essential accountability. Interpretability serves provide explanations in terms that understandable humans. Additionally, post-hoc methods after model learned generally model-agnostic. This survey provides categorization of how recent interpretability communicate humans, it discusses each method in-depth, they validated, as latter often common concern.
Язык: Английский
Процитировано
126Information Sciences, Год журнала: 2022, Номер 615, С. 238 - 292
Опубликована: Окт. 10, 2022
Язык: Английский
Процитировано
118Frontiers in Artificial Intelligence, Год журнала: 2023, Номер 6
Опубликована: Фев. 23, 2023
Explanations have gained an increasing level of interest in the AI and Machine Learning (ML) communities order to improve model transparency allow users form a mental trained ML model. However, explanations can go beyond this one way communication as mechanism elicit user control, because once understand, they then provide feedback. The goal paper is present overview research where are combined with interactive capabilities mean learn new models from scratch edit debug existing ones. To end, we draw conceptual map state-of-the-art, grouping relevant approaches based on their intended purpose how structure interaction, highlighting similarities differences between them. We also discuss open issues outline possible directions forward, hope spurring further blooming topic.
Язык: Английский
Процитировано
34Machine Learning, Год журнала: 2024, Номер 113(5), С. 2351 - 2403
Опубликована: Март 29, 2024
Язык: Английский
Процитировано
16Опубликована: Янв. 1, 2023
Removing information from a machine learning model is non-trivial task that requires to partially revert the training process.This unavoidable when sensitive data, such as credit card numbers or passwords, accidentally enter and need be removed afterwards.Recently, different concepts for unlearning have been proposed address this problem.While these approaches are effective in removing individual data points, they do not scale scenarios where larger groups of features labels reverted.In paper, we propose first method labels.Our approach builds on concept influence functions realizes through closed-form updates parameters.It enables adapt retrospectively, thereby correcting leaks privacy issues.For models with strongly convex loss functions, our provides certified theoretical guarantees.For non-convex losses, empirically show significantly faster than other strategies.
Язык: Английский
Процитировано
22Transactions of the Association for Computational Linguistics, Год журнала: 2021, Номер 9, С. 1508 - 1528
Опубликована: Янв. 1, 2021
Abstract Debugging a machine learning model is hard since the bug usually involves training data and process. This becomes even harder for an opaque deep if we have no clue about how actually works. In this survey, review papers that exploit explanations to enable humans give feedback debug NLP models. We call problem explanation-based human debugging (EBHD). particular, categorize discuss existing work along three dimensions of EBHD (the context, workflow, experimental setting), compile findings on components affect providers, highlight open problems could be future research directions.
Язык: Английский
Процитировано
38Опубликована: Янв. 1, 2022
Language models (LMs) have been shown to memorize a great deal of factual knowledge contained in their training data. But when an LM generates assertion, it is often difficult determine where learned this information and whether true. In paper, we propose the problem fact tracing: identifying which examples taught generate particular assertion. Prior work on data attribution (TDA) may offer effective tools for such examples, known as “proponents”. We present first quantitative benchmark evaluate this. compare two popular families TDA methods — gradient-based embedding-based find that much headroom remains. For example, both lower proponent-retrieval precision than retrieval baseline (BM25) does not access at all. identify key challenges be necessary further improvement overcoming gradient saturation, also show how several nuanced implementation details existing neural can significantly improve overall tracing performance.
Язык: Английский
Процитировано
222022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2023, Номер unknown, С. 20166 - 20175
Опубликована: Июнь 1, 2023
Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge. Prevalent replay-based methods address this challenge by rehearsing a small buffer holding the seen data, for which delicate sample selection strategy is required. However, existing schemes typically seek only maximize utility of ongoing selection, overlooking interference between successive rounds selection. Motivated this, we dissect interaction sequential steps within framework built influence functions. We manage identify new class second-order influences that will gradually amplify incidental bias in replay and compromise process. To regularize effects, novel objective proposed, also has clear connections two widely adopted criteria. Furthermore, present an efficient implementation optimizing proposed criterion. Experiments multiple continual benchmarks demonstrate advantage our approach over state-of-the-art methods. Code available at https://github.com/feifeiobama/InfluenceCL.
Язык: Английский
Процитировано
13Опубликована: Март 4, 2024
Most fair machine learning methods either highly rely on the sensitive information of training samples or require a large modification target models, which hinders their practical application. To address this issue, we propose two-stage algorithm named FAIRIF. It minimizes loss over reweighted data set (second stage) where sample weights are computed to balance model performance across different demographic groups (first stage). FAIRIF can be applied wide range models trained by stochastic gradient descent without changing model, while only requiring group annotations small validation compute weights. Theoretically, show that, in classification setting, three notions disparity among mitigated with Experiments synthetic sets demonstrate that yields better fairness-utility trade-offs against various types bias; and real-world sets, effectiveness scalability Moreover, as evidenced experiments pretrained is able alleviate unfairness issue hurting performance.
Язык: Английский
Процитировано
3