Improving Summarization with Human Edits DOI Creative Commons
Zonghai Yao,

Benjamin Schloss,

Sai P. Selvaraj

и другие.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Год журнала: 2023, Номер unknown, С. 2604 - 2620

Опубликована: Янв. 1, 2023

Recent work has shown the promise of learning with human feedback paradigms to produce human-determined high-quality text. Existing works use train large language models (LLMs) in general domain abstractive summarization and have obtained summary quality exceeding traditional likelihood training. In this paper, we focus on a less explored form – Human Edits. We propose Sequence Alignment (un)Likelihood Training (SALT), novel technique both human-edited model-generated data together training loop. addition, demonstrate simulating Edits ground truth summaries coming from existing Imitation edits, along after training, reduce need for expensive human-edit data. our experiments, extend exploration medical summarization. Our results effectiveness SALT improving Through additional show that outperforms conventional RLHF method (designed preferences) DPO, when applied hope evidence paper prompts researchers explore, collect, better different approaches scalably.

Язык: Английский

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies DOI Creative Commons
Liangming Pan, Michael Saxon,

Wenda Xu

и другие.

Transactions of the Association for Computational Linguistics, Год журнала: 2024, Номер 12, С. 484 - 506

Опубликована: Янв. 1, 2024

Abstract While large language models (LLMs) have shown remarkable effectiveness in various NLP tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A promising approach rectify these flaws is correcting LLMs with feedback, where the LLM itself prompted or guided feedback fix problems its own output. Techniques leveraging automated feedback—either produced by (self-correction) some external system—are of particular interest make LLM-based solutions more practical deployable minimal human intervention. This paper provides an exhaustive review recent advances categorizing them into training-time, generation-time, post-hoc approaches. We also identify potential challenges future directions this emerging field.

Язык: Английский

Процитировано

17

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation DOI Creative Commons

Patrick Fernandes,

Aman Madaan,

Emmy Liu

и другие.

Transactions of the Association for Computational Linguistics, Год журнала: 2023, Номер 11, С. 1643 - 1668

Опубликована: Янв. 1, 2023

Abstract Natural language generation has witnessed significant advancements due to the training of large models on vast internet-scale datasets. Despite these advancements, there exists a critical challenge: These can inadvertently generate content that is toxic, inaccurate, and unhelpful, existing automatic evaluation metrics often fall short identifying shortcomings. As become more capable, human feedback an invaluable signal for evaluating improving models. This survey aims provide overview recent research leveraged improve natural generation. First, we introduce taxonomy distilled from categorize organize varied forms feedback. Next, discuss how be described by its format objective, cover two approaches proposed use (either or decoding): directly using We also datasets human-feedback data collection, concerns surrounding collection. Finally, nascent field AI feedback, which uses make judgments based set principles minimize need intervention. release website this at feedback-gap-survey.info.

Язык: Английский

Процитировано

25

Evaluating language models for mathematics through interactions DOI Creative Commons
Katherine M. Collins,

Albert Q. Jiang,

Simon Frieder

и другие.

Proceedings of the National Academy of Sciences, Год журнала: 2024, Номер 121(24)

Опубликована: Июнь 3, 2024

There is much excitement about the opportunity to harness power of large language models (LLMs) when building problem-solving assistants. However, standard methodology evaluating LLMs relies on static pairs inputs and outputs; this insufficient for making an informed decision which are best use in interactive setting, how that varies by setting. Static assessment therefore limits we understand model capabilities. We introduce CheckMate, adaptable prototype platform humans interact with evaluate LLMs. conduct a study CheckMate three (InstructGPT, ChatGPT, GPT-4) as assistants proving undergraduate-level mathematics, mixed cohort participants from undergraduate students professors mathematics. release resulting interaction rating dataset, MathConverse. By analyzing MathConverse, derive taxonomy human query behaviors uncover despite generally positive correlation, there notable instances divergence between correctness perceived helpfulness LLM generations, among other findings. Further, garner more granular understanding GPT-4 mathematical through series case studies, contributed experienced mathematicians. conclude actionable takeaways ML practitioners mathematicians: communicate uncertainty, respond well user corrections, can provide concise rationale their recommendations, may constitute better Humans should inspect output carefully given current shortcomings potential surprising fallibility.

Язык: Английский

Процитировано

15

Learning from models beyond fine-tuning DOI

Hongling Zheng,

Li Shen, Anke Tang

и другие.

Nature Machine Intelligence, Год журнала: 2025, Номер unknown

Опубликована: Янв. 16, 2025

Язык: Английский

Процитировано

1

SummIt: Iterative Text Summarization via ChatGPT DOI Creative Commons
Haopeng Zhang, Xiao Liu, Jiawei Zhang

и другие.

Опубликована: Янв. 1, 2023

Existing text summarization systems have made significant progress in recent years, but typically generate summaries a single step. The one-shot setting is sometimes inadequate, however, as the generated summary may contain hallucinations or overlook important details related to reader's interests. In this paper, we address limitation by proposing SummIt, an iterative framework based on large language models like ChatGPT. Our enables model refine iteratively through self-evaluation and feedback, closely resembling process humans undertake when drafting revising summaries. Furthermore, explore potential benefits of integrating knowledge topic extractors into enhance faithfulness controllability. We evaluate performance our three benchmark datasets empirical qualitative analyses. also conduct human evaluation validate effectiveness model's refinements find issue over-correction.

Язык: Английский

Процитировано

20

Generative AI in Fashion: Overview DOI Open Access
Wei-Pei Shi, Wai Keung Wong, Xingxing Zou

и другие.

ACM Transactions on Intelligent Systems and Technology, Год журнала: 2025, Номер unknown

Опубликована: Фев. 18, 2025

Generative Artificial Intelligence (GenAI) has recently gained immense popularity by offering various applications for generating high-quality and aesthetically pleasing content of image, 3D, video data format. The innovative GenAI solutions have shifted paradigms across design-related industries, particularly fashion. In this paper, we explore the incorporation into fashion-related tasks applications. Our examination encompasses a thorough review more than 470 research papers an in-depth analysis over 300 applications, focusing on their contributions to field. These are identified as 13 within four categories: multi-modal fashion understanding, synthesis dynamic (video animatable 3D) formats We delve these methods, recognizing potential propel future endeavours toward achieving state-of-the-art (SOTA) performance. Furthermore, present comprehensive overview 53 publicly available datasets suitable training benchmarking fashion-centric models, accompanied relevant evaluation metrics. Finally, real-world unveiling existing challenges directions. With investigation analysis, paper is targeted serve useful resource understanding current landscape in fashion, paving way innovations Papers discussed along with public code links at: https://github.com/wendashi/Cool-GenAI-Fashion-Papers/ .

Язык: Английский

Процитировано

0

MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models DOI Creative Commons

Deepak Nathani,

David Wang,

Liangming Pan

и другие.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Год журнала: 2023, Номер unknown, С. 6591 - 6616

Опубликована: Янв. 1, 2023

Language Models (LMs) have shown impressive performance in various natural language tasks. However, when it comes to reasoning, LMs still face challenges such as hallucination, generating incorrect intermediate reasoning steps, and making mathematical errors. Recent research has focused on enhancing through *self-improvement* using feedback. Nevertheless, existing approaches relying a single generic feedback source fail address the diverse error types found LM-generated chains. In this work, we propose **Multi-Aspect Feedback**, an iterative refinement framework that integrates multiple modules, including frozen external tools, each focusing specific category. Our experimental results demonstrate efficacy of our approach addressing several errors chain thus improving overall LM We see improvement up 20% Mathematical Reasoning 18% Logical Entailment.

Язык: Английский

Процитировано

3

Interpretable Enterprise Credit Rating via Reinforcement Learning DOI
JingQiu Wang, Weiyu Guo

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 292 - 303

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

0

Machine Generated Explanations and Their Evaluation DOI
Edward Richards

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Год журнала: 2024, Номер unknown, С. 3074 - 3074

Опубликована: Июль 10, 2024

Язык: Английский

Процитировано

0

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs DOI Creative Commons
Ryo Kamoi, Yusen Zhang,

Nan Zhang

и другие.

Transactions of the Association for Computational Linguistics, Год журнала: 2024, Номер 12, С. 1417 - 1440

Опубликована: Янв. 1, 2024

Abstract Self-correction is an approach to improving responses from large language models (LLMs) by refining the using LLMs during inference. Prior work has proposed various self-correction frameworks different sources of feedback, including self-evaluation and external feedback. However, there still no consensus on question when can correct their own mistakes, as recent studies also report negative results. In this work, we critically survey broad papers discuss conditions required for successful self-correction. We first find that prior often do not define research questions in detail involve impractical or unfair evaluations over-evaluate To tackle these issues, categorize provide a checklist designing appropriate experiments. Our critical based newly categorized shows (1) demonstrates with feedback prompted LLMs, except tasks are exceptionally suited self-correction, (2) works well use reliable (3) large-scale fine-tuning enables

Язык: Английский

Процитировано

0