MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings DOI Creative Commons

Amila Ferron,

Amber Shore,

Ekata Mitra

и другие.

Опубликована: Янв. 1, 2023

As dialogue systems become more popular, evaluation of their response quality gains importance. Engagingness highly correlates with overall and creates a sense connection that gives human participants fulfilling experience. Although qualities like coherence fluency are readily measured well-worn automatic metrics, evaluating engagingness often relies on assessment, which is costly time-consuming process. Existing metrics evaluate the without conversation history, designed for one dataset, or have limited correlation annotations. Furthermore, they been tested exclusively English conversations. Given increasingly available in languages beyond English, multilingual capabilities essential. We propose large language models (LLMs) may be used through prompting, ask how prompt constructs translated prompts compare setting. provide prompt-design taxonomy find using selected elements LLMs, including our comprehensive definition engagingness, outperforms state-of-the-art methods across multiple languages.

Язык: Английский

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges DOI Creative Commons
Giorgio Franceschelli, Mirco Musolesi

Journal of Artificial Intelligence Research, Год журнала: 2024, Номер 79, С. 417 - 446

Опубликована: Фев. 6, 2024

Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science last decade. At same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for variety machine learning tasks. In this survey, we discuss state art, opportunities and open research questions applying RL to generative AI. particular, will three types applications, namely, an alternative way generation without specified objectives; generating outputs while concurrently maximizing objective function; and, finally, embedding desired characteristics, which cannot be easily captured by means function, into process. We conclude survey with in-depth discussion challenges fascinating emerging area.

Язык: Английский

Процитировано

18

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies DOI Creative Commons
Liangming Pan, Michael Saxon,

Wenda Xu

и другие.

Transactions of the Association for Computational Linguistics, Год журнала: 2024, Номер 12, С. 484 - 506

Опубликована: Янв. 1, 2024

Abstract While large language models (LLMs) have shown remarkable effectiveness in various NLP tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A promising approach rectify these flaws is correcting LLMs with feedback, where the LLM itself prompted or guided feedback fix problems its own output. Techniques leveraging automated feedback—either produced by (self-correction) some external system—are of particular interest make LLM-based solutions more practical deployable minimal human intervention. This paper provides an exhaustive review recent advances categorizing them into training-time, generation-time, post-hoc approaches. We also identify potential challenges future directions this emerging field.

Язык: Английский

Процитировано

17

Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design DOI Creative Commons

Lindia Tjuatja,

Valerie Chen, Tongshuang Wu

и другие.

Transactions of the Association for Computational Linguistics, Год журнала: 2024, Номер 12, С. 1011 - 1026

Опубликована: Янв. 1, 2024

Abstract One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity prompt wording—but interestingly, also display sensitivities instruction changes form response biases. We investigate extent which reflect human biases, if at all. look survey design, where biases caused by wordings “prompts” have been extensively explored social psychology literature. Drawing from these works, we design a dataset and framework evaluate whether exhibit human-like questionnaires. Our comprehensive evaluation nine models shows that popular open commercial generally fail behavior, particularly undergone RLHF. Furthermore, even model significant change same direction humans, find they are sensitive perturbations do not elicit humans. These results highlight pitfalls using proxies, underscore need finer-grained characterizations behavior.1

Язык: Английский

Процитировано

11

Explainable AI: Bridging the Gap between Machine Learning Models and Human Understanding DOI Creative Commons

Rajiv Avacharmal,

Ai Ml,

Risk Lead

и другие.

Journal of Informatics Education and Research, Год журнала: 2024, Номер unknown

Опубликована: Янв. 1, 2024

Explainable AI (XAI) is one of the key game-changing features in machine learning models, which contribute to making them more transparent, regulated and usable different applications. In (the) investigation this paper, we consider four rows explanation methods—LIME, SHAP, Anchor, Decision Tree-based Explanation—in disentangling decision-making process black box models within fields. our experiments, use datasets that cover domains, for example, health, finance image classification, compare accuracy, fidelity, coverage, precision human satisfaction each method. Our work shows rule trees approach called (Decision explanation) mostly superior comparison other non-model-specific methods performing higher coverage regardless classifier. addition this, respondents who answered qualitative evaluation indicated they were very content with decision tree-based explanations these types are easy understandable. Furthermore, most famous sorts clarifications instinctive significant. The over discoveries stretch on utilize interpretable strategies facilitating hole between understanding thus advancing straightforwardness responsibility AI-driven decision-making.

Язык: Английский

Процитировано

10

The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation DOI Creative Commons

Patrick Fernandes,

Daniel Deutsch,

Mara Finkelstein

и другие.

Опубликована: Янв. 1, 2023

Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André Martins, Graham Neubig, Ankush Garg, Jonathan Clark, Markus Freitag, Orhan Firat. Proceedings of the Eighth Conference on Machine Translation. 2023.

Язык: Английский

Процитировано

16

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs DOI Creative Commons

Afra Feyza Akyürek,

Ekin Akyürek,

Ashwin Kalyan

и другие.

Опубликована: Янв. 1, 2023

Afra Feyza Akyurek, Ekin Ashwin Kalyan, Peter Clark, Derry Tanti Wijaya, Niket Tandon. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

Язык: Английский

Процитировано

12

DRESS : Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback DOI
Yangyi Chen, Karan Sikka, Michael Cogswell

и другие.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2024, Номер 35, С. 14239 - 14250

Опубликована: Июнь 16, 2024

Язык: Английский

Процитировано

4

Models of rational agency in human-centered AI: the realist and constructivist alternatives DOI
Jacob Sparks, Ava Thomas Wright

AI and Ethics, Год журнала: 2025, Номер unknown

Опубликована: Янв. 23, 2025

Язык: Английский

Процитировано

0

Reasearch on Cross-National E-commerce User Behavior Analysis and Conversion Rate Improvement Based on the Improved XLSTM Algorithm DOI Open Access
Jingbo Zhai,

Feihong Le

Applied Mathematics and Nonlinear Sciences, Год журнала: 2025, Номер 10(1)

Опубликована: Янв. 1, 2025

Abstract The rapid expansion of cross-national e-commerce has brought significant opportunities and challenges in understanding diverse consumer behavior. This study introduces an innovative framework combining the XLSTM (Extended Long Short-Term Memory) model with K-means clustering to analyze user behavior optimize conversion rates on global platforms. extends traditional LSTM models by incorporating multi-dimensional cell states, attention mechanisms, improved memory capabilities, enabling it effectively capture complex temporal cross-cultural patterns. integration enhances process providing high-quality embeddings that lead well-defined stable clusters. Through comprehensive evaluations, combined approach demonstrates superior performance across key metrics, including Silhouette Score, Davies-Bouldin Index (DBI), Adjusted Rand (ARI), compared standalone algorithms LSTM-based methods. Feature importance analysis further identifies coupon usage, visit frequency, product category interest as most influential factors purchase decisions. findings highlight potential this methodology improve engagement marketing strategies for

Язык: Английский

Процитировано

0

A Survey of LLM Datasets: From Autoregressive Model to AI Chatbot DOI
Fei Du,

Xin-Jian Ma,

Jingru Yang

и другие.

Journal of Computer Science and Technology, Год журнала: 2024, Номер 39(3), С. 542 - 566

Опубликована: Май 1, 2024

Язык: Английский

Процитировано

2