Mitigating Biases in Hate Speech Detection from A Causal Perspective DOI Creative Commons
Zhehao Zhang, Jiaao Chen, Diyi Yang

et al.

Published: Jan. 1, 2023

Nowadays, many hate speech detectors are built to automatically detect hateful content. However, their training sets sometimes skewed towards certain stereotypes (e.g., race or religion-related). As a result, the prone depend on some shortcuts for predictions. Previous works mainly focus token-level analysis and heavily rely human experts’ annotations identify spurious correlations, which is not only costly but also incapable of discovering higher-level artifacts. In this work, we use grammar induction find patterns analyze phenomenon from causal perspective. Concretely, categorize verify different biases based spuriousness influence model prediction. Then, propose two mitigation approaches including Multi-Task Intervention Data-Specific these confounders. Experiments conducted 9 datasets demonstrate effectiveness our approaches.

Language: Английский

Shortcut Learning of Large Language Models in Natural Language Understanding DOI Open Access
Mengnan Du, Fengxiang He, Na Zou

et al.

Communications of the ACM, Journal Year: 2023, Volume and Issue: 67(1), P. 110 - 120

Published: Dec. 21, 2023

Shortcuts often hinder the robustness of large language models.

Language: Английский

Citations

35

SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers DOI Creative Commons

Dheeraj Rajagopal,

Vidhisha Balachandran, Eduard Hovy

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2021, Volume and Issue: unknown

Published: Jan. 1, 2021

We introduce SelfExplain, a novel self-explaining model that explains text classifier’s predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) globally interpretable layer identifies the most influential concepts in training set for given sample and (2) locally quantifies contribution of each local input concept computing relevance score relative to predicted label. Experiments across five text-classification datasets show facilitates interpretability without sacrificing performance. Most importantly, explanations from sufficiency are perceived as adequate, trustworthy understandable human judges compared widely-used baselines.

Language: Английский

Citations

38

“Will You Find These Shortcuts?” A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification DOI Creative Commons
Jasmijn Bastings, Sebastian Ebert,

Polina Zablotskaia

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2022, Volume and Issue: unknown, P. 976 - 991

Published: Jan. 1, 2022

Feature attribution a.k.a. input salience methods which assign an importance score to a feature are abundant but may produce surprisingly different results for the same model on input. While differences expected if disparate definitions of assumed, most claim provide faithful attributions and point at features relevant model’s prediction. Existing work faithfulness evaluation is not conclusive does clear answer as how be compared.Focusing text classification debugging scenario, our main contribution protocol that makes use partially synthetic data obtain ground truth ranking. Following protocol, we do in-depth analysis four standard method classes range datasets lexical shortcuts BERT LSTM models. We demonstrate some popular configurations poor even simple while judged too simplistic works remarkably well BERT.

Language: Английский

Citations

28

Heterogeneous graph neural networks with post-hoc explanations for multi-modal and explainable land use inference DOI Creative Commons
Xuehao Zhai, Junqi Jiang, Adam Dejl

et al.

Information Fusion, Journal Year: 2025, Volume and Issue: unknown, P. 103057 - 103057

Published: March 1, 2025

Language: Английский

Citations

0

A Survey on Symbolic Knowledge Distillation of Large Language Models DOI
K. Acharya, Alvaro Velasquez, Houbing Song

et al.

IEEE Transactions on Artificial Intelligence, Journal Year: 2024, Volume and Issue: unknown, P. 1 - 21

Published: Jan. 1, 2024

This survey paper delves into the emerging and critical area of symbolic knowledge distillation in Large Language Models (LLMs). As LLMs like Generative Pre-trained Transformer-3 (GPT-3) Bidirectional Encoder Representations from Transformers (BERT) continue to expand scale complexity, challenge effectively harnessing their extensive becomes paramount. concentrates on process distilling intricate, often implicit contained within these models a more symbolic, explicit form. transformation is crucial for enhancing interpretability, efficiency, applicability LLMs. We categorize existing research based methodologies applications, focusing how can be used improve transparency functionality smaller, efficient Artificial Intelligence (AI) models. The discusses core challenges, including maintaining depth comprehensible format, explores various approaches techniques that have been developed this field. identify gaps current potential opportunities future advancements. aims provide comprehensive overview LLMs, spotlighting its significance progression towards accessible AI systems.

Language: Английский

Citations

3

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates DOI Creative Commons

Xiaochuang Han,

Yulia Tsvetkov

Published: Jan. 1, 2021

Among the most critical limitations of deep learning NLP models are their lack interpretability, and reliance on spurious correlations. Prior work proposed various approaches to interpreting black-box unveil correlations, but research was primarily used in human-computer interaction scenarios. It still remains underexplored whether or how such model interpretations can be automatically "unlearn" confounding features. In this work, we propose influence tuning—a procedure that leverages update parameters towards a plausible interpretation (rather than an relies patterns data) addition predict task labels. We show controlled setup, tuning help deconfounding from data, significantly outperforming baseline methods use adversarial training.

Language: Английский

Citations

14

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future DOI Creative Commons
Linyi Yang, Yaoxian Song,

Xuan Ren

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2023, Volume and Issue: unknown, P. 4533 - 4559

Published: Jan. 1, 2023

Linyi Yang, Yaoxian Song, Xuan Ren, Chenyang Lyu, Yidong Wang, Jingming Zhuo, Lingqiao Liu, Jindong Jennifer Foster, Yue Zhang. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

Language: Английский

Citations

5

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations DOI Creative Commons

Chenglei Si,

Dan Friedman,

Nitish Joshi

et al.

Published: Jan. 1, 2023

In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate inductive biases from perspective feature bias: which more likely use given a set underspecified demonstrations in two features are equally predictive labels. First, we characterize GPT-3 by constructing range NLP datasets and combinations. find that LLMs exhibit clear biases—for example, demonstrating strong bias predict labels according sentiment rather than shallow lexical features, like punctuation. Second, evaluate effect different interventions designed impose favor particular feature, such as adding natural instruction or using semantically relevant label words. that, while many can influence learner prefer it be difficult overcome prior biases. Overall, our results provide broader picture types may exploit how better aligned with intended task.

Language: Английский

Citations

5

Rationale Aware Contrastive Learning Based Approach to Classify and Summarize Crisis-Related Microblogs DOI
Thi Huyen Tram Nguyen, Koustav Rudra

Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Journal Year: 2022, Volume and Issue: unknown, P. 1552 - 1562

Published: Oct. 16, 2022

Recent fashion of information propagation on Twitter makes the platform a crucial conduit for tactical data and emergency responses during disasters. However, real-time about crises is immersed in large volume emotional irrelevant posts. It brings necessity to develop an automatic tool identify disaster-related messages summarize consumption situation planning. Besides, explainability methods determining their applicability real-life scenarios. studies also highlight importance learning good latent representation tweets several downstream tasks. In this paper, we take advantage state-of-the-art methods, such as transformers contrastive build interpretable classifier. Our proposed model classifies into different humanitarian categories extracts rationale snippets supporting evidence output decisions. The framework helps learn better representations by bringing related closer embedding space. Furthermore, employ classification labels rationales efficiently generate summaries crisis events. Extensive experiments over datasets show that (i). our classifier obtains best performance-interpretability trade-off, (ii). summarizer shows superior performance (1.4%-22% improvement) with significantly less computation cost than baseline models.

Language: Английский

Citations

8

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations DOI Creative Commons
Nils Feldhus, Qianli Wang,

Tatiana Anikina

et al.

Published: Jan. 1, 2023

While recently developed NLP explainability methods let us open the black box in various ways (Madsen et al., 2022), a missing ingredient this endeavor is an interactive tool offering conversational interface. Such dialogue system can help users explore datasets and models with explanations contextualized manner, e.g. via clarification or follow-up questions, through natural language We adapt explanation framework TalkToModel (Slack 2022) to domain, add new NLP-specific operations such as free-text rationalization, illustrate its generalizability on three tasks (dialogue act classification, question answering, hate speech detection). To recognize user queries for explanations, we evaluate fine-tuned few-shot prompting implement novel adapter-based approach. then conduct two studies (1) perceived correctness helpfulness of dialogues, (2) simulatability, i.e. how objectively helpful dialogical are humans figuring out model's predicted label when it's not shown. found rationalization feature attribution were explaining model behavior. Moreover, could more reliably predict outcome based rather than one-off explanations.

Language: Английский

Citations

4