Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level DOI Creative Commons
Yoonsu Kim, J. K.W. Lee, Seoyoung Kim

et al.

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle use this technology and experience different kinds dissatisfaction. Researchers have introduced several methods, prompt engineering, improve model responses. they focus on enhancing the model's performance specific tasks, little has been investigated how deal user dissatisfaction resulting from Therefore, ChatGPT case study, we examine users' along their strategies address After organizing LLM into seven categories based literature review, collected 511 instances dissatisfactory responses 107 detailed recollections experiences, which released publicly accessible dataset. Our analysis reveals that most frequently when fails grasp intentions, while rate severity related accuracy highest. We also identified four tactics employ effectiveness. found often do not any dissatisfaction, even using tactics, 72% remained unresolved. Moreover, low knowledge LLMs tend face more put minimal effort addressing Based findings, propose design implications for minimizing usability LLM.

Language: Английский

Identifying open-texture in regulations using LLMs DOI Creative Commons
Clement Guitton, Reto Gubelmann,

Ghassen Karray

et al.

Artificial Intelligence and Law, Journal Year: 2025, Volume and Issue: unknown

Published: May 6, 2025

Language: Английский

Citations

0

Generative Calibration for In-context Learning DOI Creative Commons

Zhongtao Jiang,

Yuanzhe Zhang, Liu Cao

et al.

Published: Jan. 1, 2023

As one of the most exciting features large language models (LLMs), in-context learning is a mixed blessing. While it allows users to fast-prototype task solver with only few training examples, performance generally sensitive various configurations prompt such as choice or order examples. In this paper, we for first time theoretically and empirically identify that paradox mainly due label shift model data distribution, in which LLMs marginal p(y) while having good conditional p(x|y). With understanding, can simply calibrate predictive distribution by adjusting marginal, estimated via Monte-Carlo sampling over model, i.e., generation LLMs. We call our approach generative calibration. conduct exhaustive experiments 12 text classification tasks scaling from 774M 33B, find proposed method greatly consistently outperforms ICL well state-of-the-art calibration methods, up 27% absolute macro-F1. Meanwhile, also stable under different configurations.

Language: Английский

Citations

1

Comparable Demonstrations Are Important In In-Context Learning: A Novel Perspective On Demonstration Selection DOI Open Access
Caoyun Fan,

Jidong Tian,

Yitian Li

et al.

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Journal Year: 2024, Volume and Issue: unknown, P. 10436 - 10440

Published: March 18, 2024

In-Context Learning (ICL) is an important paradigm for adapting Large Language Models (LLMs) to downstream tasks through a few demonstrations. Despite the great success of ICL, limitation demonstration number may lead bias, i.e. input-label mapping induced by LLMs misunderstands task's essence. Inspired human experience, we attempt mitigate such bias perspective inter-demonstration relationship. Specifically, construct Comparable Demonstrations (CDs) minimally editing texts flip corresponding labels, in order highlight essence and eliminate potential spurious correlations comparison. Through series experiments on CDs, find that (1) does exist LLMs, CDs can significantly reduce bias; (2) exhibit good performance especially out-of-distribution scenarios. In summary, this study explores ICL mechanisms from novel perspective, providing deeper insight into selection strategy ICL.

Language: Английский

Citations

0

Labeling Radiology Report With GPT-4 Prompt Engineering: Comparative Study of in-Context Prompting (Preprint) DOI Creative Commons
Songsoo Kim, Donghyun Kim,

Hyunjoo Shin

et al.

Published: March 15, 2024

BACKGROUND Large language models, such as Generative Pre-trained Transformer-4 (GPT-4), utilize a method known in-context learning, which enhances the model's responses by understanding context provided within input text. OBJECTIVE This study aims to assess labeling efficacy of in radiology reports and validate performance enhancement through learning. METHODS In this retrospective study, were obtained utilizing Medical Information Mart for Intensive Care III (MIMIC-III) database, manually labeled two radiologists evaluation. Two experimental prompts defined comparison: “Basic prompt,” included sections “Task” “Output,” “In-context added “Context” section additional information. Labeling experiments conducted on head CT multi-label classification ten predefined labels (mass, hemorrhage, infarct, vascular, white matter, volume loss, hydrocephalus, pneumocephalus, foreign body, fracture) - Experiment 1. abdomen actionable findings based four different (gastrointestinal, genitourinary, musculoskeletal, vascular) 2. Precision, recall, F1-scores, accuracy compared between prompting scenarios. RESULTS 1, most labels, In-context demonstrated notable improvement F1 scores (up 0.658) 0.155), except hemorrhage pneumocephalus labels. Statistically significant differences observed (vascular, mass, body). For 2, prompt significantly enhanced (by up 0.306) 0.107) across all Basic prompts. CONCLUSIONS Our that with engineering has commendable various tasks real-world reports. It offers flexible, researcher-tailored approach using

Language: Английский

Citations

0

Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level DOI Creative Commons
Yoonsu Kim, J. K.W. Lee, Seoyoung Kim

et al.

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle use this technology and experience different kinds dissatisfaction. Researchers have introduced several methods, prompt engineering, improve model responses. they focus on enhancing the model's performance specific tasks, little has been investigated how deal user dissatisfaction resulting from Therefore, ChatGPT case study, we examine users' along their strategies address After organizing LLM into seven categories based literature review, collected 511 instances dissatisfactory responses 107 detailed recollections experiences, which released publicly accessible dataset. Our analysis reveals that most frequently when fails grasp intentions, while rate severity related accuracy highest. We also identified four tactics employ effectiveness. found often do not any dissatisfaction, even using tactics, 72% remained unresolved. Moreover, low knowledge LLMs tend face more put minimal effort addressing Based findings, propose design implications for minimizing usability LLM.

Language: Английский

Citations

1