Studying and improving reasoning in humans and machines DOI Creative Commons
Nicolas Yax, Hernán Anlló, Stefano Palminteri

et al.

Communications Psychology, Journal Year: 2024, Volume and Issue: 2(1)

Published: June 3, 2024

In the present study, we investigate and compare reasoning in large language models (LLMs) humans, using a selection of cognitive psychology tools traditionally dedicated to study (bounded) rationality. We presented human participants an array pretrained LLMs new variants classical experiments, cross-compared their performances. Our results showed that most included errors akin those frequently ascribed error-prone, heuristic-based reasoning. Notwithstanding this superficial similarity, in-depth comparison between humans indicated important differences with human-like reasoning, models' limitations disappearing almost entirely more recent LLMs' releases. Moreover, show while it is possible devise strategies induce better performance, machines are not equally responsive same prompting schemes. conclude by discussing epistemological implications challenges comparing machine behavior for both artificial intelligence psychology.

Language: Английский

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories DOI Creative Commons
Max Pellert, Clemens M. Lechner, Claudia Wagner

et al.

Perspectives on Psychological Science, Journal Year: 2024, Volume and Issue: 19(5), P. 808 - 826

Published: Jan. 2, 2024

We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous in large language models (LLMs). start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological (metaphorically speaking) vast text corpora on which they are trained. Such contain sediments of personalities, values, beliefs, and biases countless authors these texts, LLMs learn through a complex training process. The such way potentially influence their behavior, is, outputs downstream tasks applications employed, turn may have real-world consequences individuals social groups. By eliciting LLMs’ responses language-based inventories, we bring light. Psychometric profiling enables researchers study compare terms characteristics, thereby providing window into exhibit (or mimic). discuss history similar ideas outline possible approaches LLMs. demonstrate one promising approach, zero-shot classification, several inventories. conclude by highlighting open challenges future avenues research AI Psychometrics.

Language: Английский

Citations

22

Are the Futures Computable? Knightian Uncertainty and Artificial Intelligence DOI
David M. Townsend, Richard A. Hunt,

Judy Rady

et al.

Academy of Management Review, Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 5, 2024

The growing sophistication of artificial intelligence (AI) tools in entrepreneurship is transforming how new ventures identify, gather, analyze, and utilize information from their internal external operating environments to automate critical choices, decisions, tasks. For many startups corporate ventures, prior research suggests that AI provides significant task performance advantages entrepreneurs addressing the problem uncertainty, part, through enhanced predictive capabilities. What less clear, however, whether enable manage problems "Knightian uncertainty"—a fundamental type uncertainty manifests a cascading set four interrelated problems: actor ignorance, practical indeterminism, agentic novelty, competitive recursion. In this study, we argue capabilities are contingent upon ability these systems grapple with Knightian uncertainty. We investigate logic approach an in-depth analysis limits foundational emerging types address problems, identifying areas computational irreducibility where manifestation use entrepreneurship.

Language: Английский

Citations

19

Artificial intelligence and qualitative research: The promise and perils of large language model (LLM) ‘assistance’ DOI Creative Commons
John D. Roberts, Max Baker, Jane Andrew

et al.

Critical Perspectives on Accounting, Journal Year: 2024, Volume and Issue: 99, P. 102722 - 102722

Published: Feb. 22, 2024

New large language models (LLMs) like ChatGPT have the potential to change qualitative research by contributing every stage of process from generating interview questions structuring publications. However, it is far clear whether such 'assistance' will enable or deskill and eventually displace researcher. This paper sets out explore implications for recently emerged capabilities LLMs; how they acquired their seemingly 'human-like' 'converse' with us humans, in what ways these are deceptive misleading. Building on a comparison different 'trainings' humans LLMs, first traces human-like qualities LLM human proclivity project communicative intent into onto LLMs' purely imitative capacity predict structure communication. It then goes detail which communication misleading relation absolute 'certainty' LLMs 'converse', intrinsic tendencies 'hallucination' 'sycophancy', narrow conception 'artificial intelligence', complete lack ethical sensibility responsibility, finally feared danger an 'emergence' 'human-competitive' 'superhuman' capabilities. The concludes noting dangers widespread use as 'mediators' self-understanding culture. A postscript offers brief reflection only can do researchers.

Language: Английский

Citations

16

Machine culture DOI
Levin Brinkmann, Fabian Baumann, Jean‐François Bonnefon

et al.

Nature Human Behaviour, Journal Year: 2023, Volume and Issue: 7(11), P. 1855 - 1868

Published: Nov. 20, 2023

Language: Английский

Citations

39

Language Model Behavior: A Comprehensive Survey DOI Creative Commons

Tyler A. Chang,

Benjamin Bergen

Computational Linguistics, Journal Year: 2023, Volume and Issue: 50(1), P. 293 - 350

Published: Nov. 15, 2023

Abstract Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English model behavior before task-specific fine-tuning. Language possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these are sensitive specific inputs surface features. Despite dramatic increases quality as scale hundreds billions parameters, the still prone unfactual responses, commonsense errors, memorized text, social biases. Many weaknesses can be framed over-generalizations or under-generalizations learned patterns text. We synthesize results highlight what currently known about large capabilities, thus providing a resource for applied work research adjacent fields that use models.

Language: Английский

Citations

38

Creating a large language model of a philosopher DOI Creative Commons
Eric Schwitzgebel, David Schwitzgebel, Anna Strasser

et al.

Mind & Language, Journal Year: 2023, Volume and Issue: 39(2), P. 237 - 259

Published: July 12, 2023

Can large language models produce expert‐quality philosophical texts? To investigate this, we fine‐tuned GPT‐3 with the works of philosopher Daniel Dennett. evaluate model, asked real Dennett 10 questions and then posed same to collecting four responses for each question without cherry‐picking. Experts on Dennett's work succeeded at distinguishing Dennett‐generated machine‐generated answers above chance but substantially short our expectations. Philosophy blog readers performed similarly experts, while ordinary research participants were near GPT‐3's from those an “actual human philosopher”.

Language: Английский

Citations

36

Using ChatGPT and Other Large Language Model (LLM) Applications for Academic Paper Assignments DOI Open Access
Andreas Jungherr

Published: March 24, 2023

Large language models (LLMs), like ChatGPT, GitHub Copilot, and Microsoft present challenges in university education, particularly for paper assignments. These AI-driven tools enable students to (semi)automatically complete tasks that were previously considered evidence of skill acquisition, potentially affecting grading development. However, the use these is not legally plagiarism becoming increasingly integrated into various software solutions.University education social sciences aims develop students' abilities make sense world, connect their observations with abstract structures, measure phenomena interest, systematically test expectations, findings structured accounts. practices are learned through repeated performance tasks, such as writing research papers. LLM applications ChatGPT create conflicting incentives students, who might rely on them produce parts papers instead engaging learning process.While LLMs can be helpful knowledge discovery, assistance, coding using effectively safely requires an understanding underlying mechanisms, potential weaknesses, enough domain identify mistakes. This makes challenging early stages acquiring scientific skills knowledge.Educators must train responsibly new tools, reflecting tensions strengths weaknesses academic tasks. working provide guidelines responsible contexts, specifically at Chair Governance Complex Innovative Technological Systems University Bamberg. The discusses function written assignments, necessary them, evaluates ChatGPT's assisting It concludes advice maximize benefits while mitigating risks focusing enabling learning.

Language: Английский

Citations

26

Large Language Models versus Natural Language Understanding and Generation DOI Creative Commons
Nikitas N. Karanikolas, Eirini Manga, Nikoletta E. Samaridi

et al.

Published: Nov. 24, 2023

In recent years, the process humans adopt to learn a foreign language has moved from strict "Grammar –Translation" method, which is based mainly on grammar and syntax rules, more innovative processes, resulting modern "Communicative approach". As its name states, this approach focuses coherent communication with native speakers cultivation of oral skills, without taking into consideration, at least first stages, rules that govern language.

Language: Английский

Citations

25

Embers of autoregression show how large language models are shaped by the problem they are trained to solve DOI Creative Commons
R. Thomas McCoy, Shunyu Yao,

Dan Friedman

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2024, Volume and Issue: 121(41)

Published: Oct. 4, 2024

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that develop a holistic understanding these systems, we must consider the problem they were trained solve: next-word prediction over Internet text. By recognizing pressures this task exerts, can make predictions about strategies LLMs will adopt, allowing us reason when succeed or fail. Using approach—which call teleological approach—we identify three factors hypothesize influence LLM accuracy: probability be performed, target output, provided input. To test our predictions, evaluate five (GPT-3.5, GPT-4, Claude 3, Llama Gemini 1.0) on 11 tasks, find robust evidence are influenced by in hypothesized ways. Many experiments reveal surprising failure modes. For instance, GPT-4’s accuracy at decoding simple cipher is 51% output high-probability sentence but only 13% low-probability, even though deterministic one for which should not matter. These results show AI practitioners careful using low-probability situations. More broadly, conclude as if humans instead treat them distinct type system—one has been shaped its own particular set pressures.

Language: Английский

Citations

15

Exploring generative AI assisted feedback writing for students’ written responses to a physics conceptual question with prompt engineering and few-shot learning DOI Creative Commons
Tong Wan, Zhongzhou Chen

Physical Review Physics Education Research, Journal Year: 2024, Volume and Issue: 20(1)

Published: June 13, 2024

Instructor’s feedback plays a critical role in students’ development of conceptual understanding and reasoning skills. However, grading student written responses providing personalized can take substantial amount time, especially large enrollment courses. In this study, we explore using GPT-3.5 to write on questions with prompt engineering few-shot learning techniques. stage I, used small portion (n=20) the one question iteratively train GPT generate feedback. Four paired human-written were included as examples for GPT. We tasked another 16 refined through several iterations. II, gave four researchers (one graduate three undergraduate researchers) well two versions feedback, by authors other Students asked rate correctness usefulness each indicate which was generated The results showed that students tended human equally correctness, but they all rated more useful. Additionally, success rates identifying GPT’s low, ranging from 0.1 0.6. III, rest (n=65). messages instructors based extent modification needed if give students. All approximately 70% (ranging 68% 78%) statements needing only minor or no modification. This study demonstrated feasibility generative artificial intelligence (AI) an assistant relatively number prompt. An AI be solutions substantially reduce time spent responses. Published American Physical Society 2024

Language: Английский

Citations

12