Can Large Language Models Transform Computational Social Science? DOI Creative Commons

Caleb Ziems,

William A. Held,

Omar Ahmed Shaikh

и другие.

Computational Linguistics, Год журнала: 2023, Номер 50(1), С. 237 - 291

Опубликована: Дек. 12, 2023

Abstract Large language models (LLMs) are capable of successfully performing many processing tasks zero-shot (without training data). If LLMs can also reliably classify and explain social phenomena like persuasiveness political ideology, then could augment the computational science (CSS) pipeline in important ways. This work provides a road map for using as CSS tools. Towards this end, we contribute set prompting best practices an extensive evaluation to measure performance 13 on 25 representative English benchmarks. On taxonomic labeling (classification), fail outperform fine-tuned but still achieve fair levels agreement with humans. free-form coding (generation), produce explanations that often exceed quality crowdworkers’ gold references. We conclude today’s research two ways: (1) serving data annotators human annotation teams, (2) bootstrapping challenging creative generation (e.g., explaining underlying attributes text). In summary, posed meaningfully participate analysis partnership

Язык: Английский

Large language models in medicine DOI
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan

и другие.

Nature Medicine, Год журнала: 2023, Номер 29(8), С. 1930 - 1940

Опубликована: Июль 17, 2023

Язык: Английский

Процитировано

1439

Large language models encode clinical knowledge DOI Creative Commons
Karan Singhal, Shekoofeh Azizi, Tao Tu

и другие.

Nature, Год журнала: 2023, Номер 620(7972), С. 172 - 180

Опубликована: Июль 12, 2023

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess knowledge of typically rely on automated evaluations based limited benchmarks. Here, address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries new dataset questions searched online, HealthSearchQA. We propose human evaluation framework model answers along multiple axes including factuality, comprehension, reasoning, possible harm bias. In addition, evaluate Pathways Language Model 1 (PaLM, 540-billion parameter LLM) its instruction-tuned variant, Flan-PaLM 2 MultiMedQA. Using combination prompting strategies, achieves state-of-the-art accuracy every MultiMedQA multiple-choice (MedQA 3 , MedMCQA 4 PubMedQA 5 Measuring Massive Multitask Understanding (MMLU) topics 6 ), 67.6% MedQA (US Medical Licensing Exam-style questions), surpassing prior state art by more than 17%. However, reveals key gaps. To resolve this, introduce instruction prompt tuning, parameter-efficient approach aligning LLMs domains using few exemplars. The resulting model, Med-PaLM, performs encouragingly, remains inferior clinicians. show that recall reasoning improve with scale suggesting potential utility in medicine. Our reveal limitations today’s models, reinforcing importance both frameworks method development creating safe, helpful applications.

Язык: Английский

Процитировано

1373

Learning to Prompt for Vision-Language Models DOI
Kaiyang Zhou, Jingkang Yang, Chen Change Loy

и другие.

International Journal of Computer Vision, Год журнала: 2022, Номер 130(9), С. 2337 - 2348

Опубликована: Июль 31, 2022

Язык: Английский

Процитировано

1183

A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development DOI
Tianyu Wu, Shizhu He, Jingping Liu

и другие.

IEEE/CAA Journal of Automatica Sinica, Год журнала: 2023, Номер 10(5), С. 1122 - 1136

Опубликована: Май 1, 2023

ChatGPT, an artificial intelligence generated content (AIGC) model developed by OpenAI, has attracted world-wide attention for its capability of dealing with challenging language understanding and generation tasks in the form conversations. This paper briefly provides overview on history, status quo potential future development helping to provide entry point think about ChatGPT. Specifically, from limited open-accessed resources, we conclude core techniques mainly including large-scale models, in-context learning, reinforcement learning human feedback key technical steps developing Chat-GPT. We further analyze pros cons ChatGPT rethink duality various fields. Although it been widely acknowledged that brings plenty opportunities fields, mankind should still treat use properly avoid threat, e.g., academic integrity safety challenge. Finally, discuss several open problems as

Язык: Английский

Процитировано

738

A Survey on Evaluation of Large Language Models DOI Open Access
Yupeng Chang, Xu Wang, Jindong Wang

и другие.

ACM Transactions on Intelligent Systems and Technology, Год журнала: 2024, Номер 15(3), С. 1 - 45

Опубликована: Янв. 23, 2024

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance various applications. As LLMs continue play a vital role research daily use, evaluation becomes increasingly critical, not only at the task level, but also society level for better understanding of potential risks. Over past years, significant efforts have been made examine from perspectives. This paper presents comprehensive review these methods LLMs, focusing on three key dimensions: what evaluate , where how . Firstly, we provide an overview perspective tasks, encompassing general natural processing reasoning, medical usage, ethics, education, social sciences, agent applications, other areas. Secondly, answer ‘where’ ‘how’ questions by diving into benchmarks, which serve as crucial components assessing LLMs. Then, summarize success failure cases different tasks. Finally, shed light several future challenges that lie ahead evaluation. Our aim is offer invaluable insights researchers realm evaluation, thereby aiding development more proficient point should be treated essential discipline assist We consistently maintain related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey

Язык: Английский

Процитировано

702

Conditional Prompt Learning for Vision-Language Models DOI
Kaiyang Zhou, Jingkang Yang, Chen Change Loy

и другие.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Год журнала: 2022, Номер unknown

Опубликована: Июнь 1, 2022

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways adapt these downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces concept prompt learning—a recent trend in NLP—to vision domain for adapting models. Specifically, CoOp turns context words a into set learnable vectors and, with only few labeled images learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify critical problem CoOp: learned is not generalizable wider unseen classes within same dataset, suggesting that overfits base observed during training. To address problem, propose Conditional (CoCoOp), which extends by further learning lightweight neural network generate each image an input-conditional token (vector). Compared CoOp's static prompts, dynamic prompts instance and are thus less sensitive class shift. Extensive experiments show CoCoOp generalizes much better than classes, even showing promising transferability beyond single dataset; yields stronger generalization performance as well. Code available at https://github.com/KaiyangZhou/CoOp.

Язык: Английский

Процитировано

692

DINOv2: Learning Robust Visual Features without Supervision DOI Creative Commons

Maxime Oquab,

Timothée Darcet,

Théo Moutakanni

и другие.

arXiv (Cornell University), Год журнала: 2023, Номер unknown

Опубликована: Янв. 1, 2023

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way similar foundation models computer vision. These could greatly simplify use images any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This shows existing methods, especially self-supervised can produce such if trained enough curated from diverse sources. We revisit approaches combine different techniques to scale our terms size. Most technical contributions aim at accelerating stabilizing training scale. In data, we propose an automatic pipeline build a dedicated, diverse, dataset instead uncurated as typically done literature. models, train ViT (Dosovitskiy et al., 2020) with 1B parameters distill it into series smaller surpass best available OpenCLIP (Ilharco 2021) most benchmarks pixel levels.

Язык: Английский

Процитировано

517

Multitask Prompted Training Enables Zero-Shot Task Generalization DOI Creative Commons

Victor Sanh,

Albert Webson,

Colin Raffel

и другие.

arXiv (Cornell University), Год журнала: 2021, Номер unknown

Опубликована: Янв. 1, 2021

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has hypothesized that this is consequence implicit multitask learning in models' pretraining (Radford 2019). Can instead be directly induced by explicit learning? To test question at scale, we develop system for easily mapping any natural into human-readable prompted form. We convert large supervised datasets, each with multiple prompts wording. These datasets allow benchmarking the ability model perform completely held-out tasks. fine-tune pretrained encoder-decoder (Raffel 2020; Lester 2021) mixture covering wide variety The attains strong performance several standard often outperforming up 16x its size. Further, our approach subset from BIG-bench benchmark, 6x All trained are available https://github.com/bigscience-workshop/t-zero and all https://github.com/bigscience-workshop/promptsource.

Язык: Английский

Процитировано

465

Generative AI at Work DOI Open Access
Erik Brynjolfsson,

Danielle Li,

Lindsey Raymond

и другие.

Опубликована: Апрель 1, 2023

We study the staggered introduction of a generative AI-based conversational assistant using data from 5,179 customer support agents.Access to tool increases productivity, as measured by issues resolved per hour, 14 percent on average, with greatest impact novice and lowskilled workers, minimal experienced highly skilled workers.We provide suggestive evidence that AI model disseminates potentially tacit knowledge more able workers helps newer move down experience curve.In addition, we show assistance improves sentiment, reduces requests for managerial intervention, employee retention.

Язык: Английский

Процитировано

451

Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond DOI Open Access
Mike Perkins

Journal of University Teaching and Learning Practice, Год журнала: 2023, Номер 20(2)

Опубликована: Янв. 1, 2023

This paper explores the academic integrity considerations of students’ use Artificial Intelligence (AI) tools using Large Language Models (LLMs) such as ChatGPT in formal assessments. We examine evolution these tools, and highlight potential ways that LLMs can support education students digital writing beyond, including teaching composition, possibilities co-creation between humans AI, supporting EFL learners, improving Automated Writing Evaluations (AWE). describe demonstrate have creating original, coherent text avoid detection by existing technological methods trained staff alike, demonstrating a major concern related to students. Analysing various issues raise for both Higher Education Institutions (HEIs) students, we conclude it is not student any AI defines whether plagiarism or breach has occurred, but made clear student. Deciding particular be defined misconduct determined policies given HEI, which must updated consider how will used future educational environments.

Язык: Английский

Процитировано

406