Cited by “Personhood and AI: Why large language models don’t understand us”

Daniel Katz, Michael James Bommarito, Shang Gao

и другие.

Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences, Год журнала: 2024, Номер 382(2270)

Опубликована: Фев. 26, 2024

In this paper, we experimentally evaluate the zero-shot performance of GPT-4 against prior generations GPT on entire uniform bar examination (UBE), including not only multiple-choice multistate (MBE), but also open-ended essay exam (MEE) and test (MPT) components. On MBE, significantly outperforms both human test-takers models, demonstrating a 26% increase over ChatGPT beating humans in five seven subject areas. MEE MPT, which have previously been evaluated by scholars, scores an average 4.2/6.0 when compared with much lower for ChatGPT. Graded across UBE components, manner test-taker would be, approximately 297 points, excess passing threshold all jurisdictions. These findings document just rapid remarkable advance large language model generally, potential such models to support delivery legal services society. This article is part theme issue 'A complexity science approach law governance'.

Язык: Английский

Процитировано

The language network as a natural kind within the broader landscape of the human brain DOI

Evelina Fedorenko, Anna A. Ivanova, Tamar I. Regev

и другие.

Nature reviews. Neuroscience, Год журнала: 2024, Номер 25(5), С. 289 - 312

Опубликована: Апрель 12, 2024

Язык: Английский

Процитировано

Symbols and grounding in large language models DOI

Ellie Pavlick

Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences, Год журнала: 2023, Номер 381(2251)

Опубликована: Июнь 4, 2023

Large language models (LLMs) are one of the most impressive achievements artificial intelligence in recent years. However, their relevance to study more broadly remains unclear. This article considers potential LLMs serve as understanding humans. While debate on this question typically centres around models’ performance challenging tasks, argues that answer depends underlying competence, and thus focus should be empirical work which seeks characterize representations processing algorithms underlie model behaviour. From perspective, offers counterarguments two commonly cited reasons why cannot plausible humans: lack symbolic structure grounding. For each, a case is made trends undermine common assumptions about LLMs, it premature draw conclusions LLMs’ ability (or thereof) offer insights human representation understanding. part discussion meeting issue ‘Cognitive intelligence’.

Язык: Английский

Процитировано

Language is primarily a tool for communication rather than thought DOI

Evelina Fedorenko, Steven T. Piantadosi,

Edward Gibson

и другие.

Nature, Год журнала: 2024, Номер 630(8017), С. 575 - 586

Опубликована: Июнь 19, 2024

Язык: Английский

Процитировано

Creating a large language model of a philosopher DOI

Eric Schwitzgebel, David Schwitzgebel, Anna Strasser

и другие.

Mind & Language, Год журнала: 2023, Номер 39(2), С. 237 - 259

Опубликована: Июль 12, 2023

Can large language models produce expert‐quality philosophical texts? To investigate this, we fine‐tuned GPT‐3 with the works of philosopher Daniel Dennett. evaluate model, asked real Dennett 10 questions and then posed same to collecting four responses for each question without cherry‐picking. Experts on Dennett's work succeeded at distinguishing Dennett‐generated machine‐generated answers above chance but substantially short our expectations. Philosophy blog readers performed similarly experts, while ordinary research participants were near GPT‐3's from those an “actual human philosopher”.

Язык: Английский

Процитировано

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely DOI

Carina Kauf, Anna A. Ivanova, Giulia Rambelli

и другие.

Cognitive Science, Год журнала: 2023, Номер 47(11)

Опубликована: Ноя. 1, 2023

Abstract Word co‐occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large models (LLMs), trained to predict words context, leverage these achieve impressive performance on diverse semantic tasks requiring world An important but understudied question about LLMs’ abilities is whether they acquire generalized knowledge common events. Here, we test five pretrained LLMs (from 2018's BERT 2023's MPT) assign higher likelihood plausible descriptions agent−patient interactions than minimally different implausible versions the same event. Using three curated sets minimal sentence pairs (total n = 1215), found that possess substantial event knowledge, outperforming other distributional models. In particular, almost always possible versus impossible events ( The teacher bought laptop vs. ). However, show less consistent preferences for likely unlikely nanny tutored boy follow‐up analyses, (i) LLM scores are driven by both plausibility and surface‐level features, (ii) generalize well across syntactic variants (active passive constructions) (synonymous sentences), (iii) some errors mirror human judgment ambiguity, (iv) serves as an organizing dimension internal representations. Overall, our results aspects naturally emerge from linguistic patterns, also highlight gap between representations possible/impossible likely/unlikely

Язык: Английский

Процитировано

Learning to Prompt in the Classroom to Understand AI Limits: A Pilot Study DOI

Emily Theophilou, Cansu Koyutürk, Mona Yavari

и другие.

Lecture notes in computer science, Год журнала: 2023, Номер unknown, С. 481 - 496

Опубликована: Янв. 1, 2023

Язык: Английский

Процитировано

Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet) DOI

Eunice Yiu, Eliza Kosoy, Alison Gopnik

и другие.

Perspectives on Psychological Science, Год журнала: 2023, Номер 19(5), С. 874 - 883

Опубликована: Окт. 26, 2023

Much discussion about large language models and language-and-vision has focused on whether these are intelligent agents. We present an alternative perspective. First, we argue that artificial intelligence (AI) cultural technologies enhance transmission efficient powerful imitation engines. Second, explore what AI can tell us innovation by testing they be used to discover new tools novel causal structures contrasting their responses with those of human children. Our work serves as a first step in determining which particular representations competences, well kinds knowledge or skill, derived from learning techniques data. In particular, cognitive capacities enabled statistical analysis large-scale linguistic Critically, our findings suggest machines may need more than image data allow the small child produce.

Язык: Английский

Процитировано

Reclaiming AI as a Theoretical Tool for Cognitive Science DOI

Iris van Rooij, Olivia Guest, Federico Adolfi

и другие.

Computational Brain & Behavior, Год журнала: 2024, Номер unknown

Опубликована: Сен. 27, 2024

Язык: Английский

Процитировано

Embers of autoregression show how large language models are shaped by the problem they are trained to solve DOI

R. Thomas McCoy, Shunyu Yao,

Dan Friedman

и другие.

Proceedings of the National Academy of Sciences, Год журнала: 2024, Номер 121(41)

Опубликована: Окт. 4, 2024

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that develop a holistic understanding these systems, we must consider the problem they were trained solve: next-word prediction over Internet text. By recognizing pressures this task exerts, can make predictions about strategies LLMs will adopt, allowing us reason when succeed or fail. Using approach—which call teleological approach—we identify three factors hypothesize influence LLM accuracy: probability be performed, target output, provided input. To test our predictions, evaluate five (GPT-3.5, GPT-4, Claude 3, Llama Gemini 1.0) on 11 tasks, find robust evidence are influenced by in hypothesized ways. Many experiments reveal surprising failure modes. For instance, GPT-4’s accuracy at decoding simple cipher is 51% output high-probability sentence but only 13% low-probability, even though deterministic one for which should not matter. These results show AI practitioners careful using low-probability situations. More broadly, conclude as if humans instead treat them distinct type system—one has been shaped its own particular set pressures.

Язык: Английский

Процитировано