Can Large Language Models Transform Computational Social Science? DOI Creative Commons

Caleb Ziems,

William A. Held,

Omar Ahmed Shaikh

et al.

Computational Linguistics, Journal Year: 2023, Volume and Issue: 50(1), P. 237 - 291

Published: Dec. 12, 2023

Abstract Large language models (LLMs) are capable of successfully performing many processing tasks zero-shot (without training data). If LLMs can also reliably classify and explain social phenomena like persuasiveness political ideology, then could augment the computational science (CSS) pipeline in important ways. This work provides a road map for using as CSS tools. Towards this end, we contribute set prompting best practices an extensive evaluation to measure performance 13 on 25 representative English benchmarks. On taxonomic labeling (classification), fail outperform fine-tuned but still achieve fair levels agreement with humans. free-form coding (generation), produce explanations that often exceed quality crowdworkers’ gold references. We conclude today’s research two ways: (1) serving data annotators human annotation teams, (2) bootstrapping challenging creative generation (e.g., explaining underlying attributes text). In summary, posed meaningfully participate analysis partnership

Language: Английский

Large language models in medicine DOI
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan

et al.

Nature Medicine, Journal Year: 2023, Volume and Issue: 29(8), P. 1930 - 1940

Published: July 17, 2023

Language: Английский

Citations

1439

Large language models encode clinical knowledge DOI Creative Commons
Karan Singhal, Shekoofeh Azizi, Tao Tu

et al.

Nature, Journal Year: 2023, Volume and Issue: 620(7972), P. 172 - 180

Published: July 12, 2023

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess knowledge of typically rely on automated evaluations based limited benchmarks. Here, address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries new dataset questions searched online, HealthSearchQA. We propose human evaluation framework model answers along multiple axes including factuality, comprehension, reasoning, possible harm bias. In addition, evaluate Pathways Language Model 1 (PaLM, 540-billion parameter LLM) its instruction-tuned variant, Flan-PaLM 2 MultiMedQA. Using combination prompting strategies, achieves state-of-the-art accuracy every MultiMedQA multiple-choice (MedQA 3 , MedMCQA 4 PubMedQA 5 Measuring Massive Multitask Understanding (MMLU) topics 6 ), 67.6% MedQA (US Medical Licensing Exam-style questions), surpassing prior state art by more than 17%. However, reveals key gaps. To resolve this, introduce instruction prompt tuning, parameter-efficient approach aligning LLMs domains using few exemplars. The resulting model, Med-PaLM, performs encouragingly, remains inferior clinicians. show that recall reasoning improve with scale suggesting potential utility in medicine. Our reveal limitations today’s models, reinforcing importance both frameworks method development creating safe, helpful applications.

Language: Английский

Citations

1373

Learning to Prompt for Vision-Language Models DOI
Kaiyang Zhou, Jingkang Yang, Chen Change Loy

et al.

International Journal of Computer Vision, Journal Year: 2022, Volume and Issue: 130(9), P. 2337 - 2348

Published: July 31, 2022

Language: Английский

Citations

1183

A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development DOI
Tianyu Wu, Shizhu He, Jingping Liu

et al.

IEEE/CAA Journal of Automatica Sinica, Journal Year: 2023, Volume and Issue: 10(5), P. 1122 - 1136

Published: May 1, 2023

ChatGPT, an artificial intelligence generated content (AIGC) model developed by OpenAI, has attracted world-wide attention for its capability of dealing with challenging language understanding and generation tasks in the form conversations. This paper briefly provides overview on history, status quo potential future development helping to provide entry point think about ChatGPT. Specifically, from limited open-accessed resources, we conclude core techniques mainly including large-scale models, in-context learning, reinforcement learning human feedback key technical steps developing Chat-GPT. We further analyze pros cons ChatGPT rethink duality various fields. Although it been widely acknowledged that brings plenty opportunities fields, mankind should still treat use properly avoid threat, e.g., academic integrity safety challenge. Finally, discuss several open problems as

Language: Английский

Citations

738

A Survey on Evaluation of Large Language Models DOI Open Access
Yupeng Chang, Xu Wang, Jindong Wang

et al.

ACM Transactions on Intelligent Systems and Technology, Journal Year: 2024, Volume and Issue: 15(3), P. 1 - 45

Published: Jan. 23, 2024

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance various applications. As LLMs continue play a vital role research daily use, evaluation becomes increasingly critical, not only at the task level, but also society level for better understanding of potential risks. Over past years, significant efforts have been made examine from perspectives. This paper presents comprehensive review these methods LLMs, focusing on three key dimensions: what evaluate , where how . Firstly, we provide an overview perspective tasks, encompassing general natural processing reasoning, medical usage, ethics, education, social sciences, agent applications, other areas. Secondly, answer ‘where’ ‘how’ questions by diving into benchmarks, which serve as crucial components assessing LLMs. Then, summarize success failure cases different tasks. Finally, shed light several future challenges that lie ahead evaluation. Our aim is offer invaluable insights researchers realm evaluation, thereby aiding development more proficient point should be treated essential discipline assist We consistently maintain related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey

Language: Английский

Citations

702

Conditional Prompt Learning for Vision-Language Models DOI
Kaiyang Zhou, Jingkang Yang, Chen Change Loy

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2022, Volume and Issue: unknown

Published: June 1, 2022

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways adapt these downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces concept prompt learning—a recent trend in NLP—to vision domain for adapting models. Specifically, CoOp turns context words a into set learnable vectors and, with only few labeled images learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify critical problem CoOp: learned is not generalizable wider unseen classes within same dataset, suggesting that overfits base observed during training. To address problem, propose Conditional (CoCoOp), which extends by further learning lightweight neural network generate each image an input-conditional token (vector). Compared CoOp's static prompts, dynamic prompts instance and are thus less sensitive class shift. Extensive experiments show CoCoOp generalizes much better than classes, even showing promising transferability beyond single dataset; yields stronger generalization performance as well. Code available at https://github.com/KaiyangZhou/CoOp.

Language: Английский

Citations

692

DINOv2: Learning Robust Visual Features without Supervision DOI Creative Commons

Maxime Oquab,

Timothée Darcet,

Théo Moutakanni

et al.

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way similar foundation models computer vision. These could greatly simplify use images any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This shows existing methods, especially self-supervised can produce such if trained enough curated from diverse sources. We revisit approaches combine different techniques to scale our terms size. Most technical contributions aim at accelerating stabilizing training scale. In data, we propose an automatic pipeline build a dedicated, diverse, dataset instead uncurated as typically done literature. models, train ViT (Dosovitskiy et al., 2020) with 1B parameters distill it into series smaller surpass best available OpenCLIP (Ilharco 2021) most benchmarks pixel levels.

Language: Английский

Citations

517

Multitask Prompted Training Enables Zero-Shot Task Generalization DOI Creative Commons

Victor Sanh,

Albert Webson,

Colin Raffel

et al.

arXiv (Cornell University), Journal Year: 2021, Volume and Issue: unknown

Published: Jan. 1, 2021

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has hypothesized that this is consequence implicit multitask learning in models' pretraining (Radford 2019). Can instead be directly induced by explicit learning? To test question at scale, we develop system for easily mapping any natural into human-readable prompted form. We convert large supervised datasets, each with multiple prompts wording. These datasets allow benchmarking the ability model perform completely held-out tasks. fine-tune pretrained encoder-decoder (Raffel 2020; Lester 2021) mixture covering wide variety The attains strong performance several standard often outperforming up 16x its size. Further, our approach subset from BIG-bench benchmark, 6x All trained are available https://github.com/bigscience-workshop/t-zero and all https://github.com/bigscience-workshop/promptsource.

Language: Английский

Citations

465

Generative AI at Work DOI Open Access
Erik Brynjolfsson,

Danielle Li,

Lindsey Raymond

et al.

Published: April 1, 2023

We study the staggered introduction of a generative AI-based conversational assistant using data from 5,179 customer support agents.Access to tool increases productivity, as measured by issues resolved per hour, 14 percent on average, with greatest impact novice and lowskilled workers, minimal experienced highly skilled workers.We provide suggestive evidence that AI model disseminates potentially tacit knowledge more able workers helps newer move down experience curve.In addition, we show assistance improves sentiment, reduces requests for managerial intervention, employee retention.

Language: Английский

Citations

451

Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond DOI Open Access
Mike Perkins

Journal of University Teaching and Learning Practice, Journal Year: 2023, Volume and Issue: 20(2)

Published: Jan. 1, 2023

This paper explores the academic integrity considerations of students’ use Artificial Intelligence (AI) tools using Large Language Models (LLMs) such as ChatGPT in formal assessments. We examine evolution these tools, and highlight potential ways that LLMs can support education students digital writing beyond, including teaching composition, possibilities co-creation between humans AI, supporting EFL learners, improving Automated Writing Evaluations (AWE). describe demonstrate have creating original, coherent text avoid detection by existing technological methods trained staff alike, demonstrating a major concern related to students. Analysing various issues raise for both Higher Education Institutions (HEIs) students, we conclude it is not student any AI defines whether plagiarism or breach has occurred, but made clear student. Deciding particular be defined misconduct determined policies given HEI, which must updated consider how will used future educational environments.

Language: Английский

Citations

406