AI & Society, Год журнала: 2023, Номер 39(5), С. 2499 - 2506
Опубликована: Июль 12, 2023
Язык: Английский
AI & Society, Год журнала: 2023, Номер 39(5), С. 2499 - 2506
Опубликована: Июль 12, 2023
Язык: Английский
Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences, Год журнала: 2024, Номер 382(2270)
Опубликована: Фев. 26, 2024
In this paper, we experimentally evaluate the zero-shot performance of GPT-4 against prior generations GPT on entire uniform bar examination (UBE), including not only multiple-choice multistate (MBE), but also open-ended essay exam (MEE) and test (MPT) components. On MBE, significantly outperforms both human test-takers models, demonstrating a 26% increase over ChatGPT beating humans in five seven subject areas. MEE MPT, which have previously been evaluated by scholars, scores an average 4.2/6.0 when compared with much lower for ChatGPT. Graded across UBE components, manner test-taker would be, approximately 297 points, excess passing threshold all jurisdictions. These findings document just rapid remarkable advance large language model generally, potential such models to support delivery legal services society. This article is part theme issue 'A complexity science approach law governance'.
Язык: Английский
Процитировано
78Nature reviews. Neuroscience, Год журнала: 2024, Номер 25(5), С. 289 - 312
Опубликована: Апрель 12, 2024
Язык: Английский
Процитировано
68Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences, Год журнала: 2023, Номер 381(2251)
Опубликована: Июнь 4, 2023
Large language models (LLMs) are one of the most impressive achievements artificial intelligence in recent years. However, their relevance to study more broadly remains unclear. This article considers potential LLMs serve as understanding humans. While debate on this question typically centres around models’ performance challenging tasks, argues that answer depends underlying competence, and thus focus should be empirical work which seeks characterize representations processing algorithms underlie model behaviour. From perspective, offers counterarguments two commonly cited reasons why cannot plausible humans: lack symbolic structure grounding. For each, a case is made trends undermine common assumptions about LLMs, it premature draw conclusions LLMs’ ability (or thereof) offer insights human representation understanding. part discussion meeting issue ‘Cognitive intelligence’.
Язык: Английский
Процитировано
50Nature, Год журнала: 2024, Номер 630(8017), С. 575 - 586
Опубликована: Июнь 19, 2024
Язык: Английский
Процитировано
35Mind & Language, Год журнала: 2023, Номер 39(2), С. 237 - 259
Опубликована: Июль 12, 2023
Can large language models produce expert‐quality philosophical texts? To investigate this, we fine‐tuned GPT‐3 with the works of philosopher Daniel Dennett. evaluate model, asked real Dennett 10 questions and then posed same to collecting four responses for each question without cherry‐picking. Experts on Dennett's work succeeded at distinguishing Dennett‐generated machine‐generated answers above chance but substantially short our expectations. Philosophy blog readers performed similarly experts, while ordinary research participants were near GPT‐3's from those an “actual human philosopher”.
Язык: Английский
Процитировано
36Cognitive Science, Год журнала: 2023, Номер 47(11)
Опубликована: Ноя. 1, 2023
Abstract Word co‐occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large models (LLMs), trained to predict words context, leverage these achieve impressive performance on diverse semantic tasks requiring world An important but understudied question about LLMs’ abilities is whether they acquire generalized knowledge common events. Here, we test five pretrained LLMs (from 2018's BERT 2023's MPT) assign higher likelihood plausible descriptions agent−patient interactions than minimally different implausible versions the same event. Using three curated sets minimal sentence pairs (total n = 1215), found that possess substantial event knowledge, outperforming other distributional models. In particular, almost always possible versus impossible events ( The teacher bought laptop vs. ). However, show less consistent preferences for likely unlikely nanny tutored boy follow‐up analyses, (i) LLM scores are driven by both plausibility and surface‐level features, (ii) generalize well across syntactic variants (active passive constructions) (synonymous sentences), (iii) some errors mirror human judgment ambiguity, (iv) serves as an organizing dimension internal representations. Overall, our results aspects naturally emerge from linguistic patterns, also highlight gap between representations possible/impossible likely/unlikely
Язык: Английский
Процитировано
32Lecture notes in computer science, Год журнала: 2023, Номер unknown, С. 481 - 496
Опубликована: Янв. 1, 2023
Язык: Английский
Процитировано
27Perspectives on Psychological Science, Год журнала: 2023, Номер 19(5), С. 874 - 883
Опубликована: Окт. 26, 2023
Much discussion about large language models and language-and-vision has focused on whether these are intelligent agents. We present an alternative perspective. First, we argue that artificial intelligence (AI) cultural technologies enhance transmission efficient powerful imitation engines. Second, explore what AI can tell us innovation by testing they be used to discover new tools novel causal structures contrasting their responses with those of human children. Our work serves as a first step in determining which particular representations competences, well kinds knowledge or skill, derived from learning techniques data. In particular, cognitive capacities enabled statistical analysis large-scale linguistic Critically, our findings suggest machines may need more than image data allow the small child produce.
Язык: Английский
Процитировано
24Computational Brain & Behavior, Год журнала: 2024, Номер unknown
Опубликована: Сен. 27, 2024
Язык: Английский
Процитировано
15Proceedings of the National Academy of Sciences, Год журнала: 2024, Номер 121(41)
Опубликована: Окт. 4, 2024
The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that develop a holistic understanding these systems, we must consider the problem they were trained solve: next-word prediction over Internet text. By recognizing pressures this task exerts, can make predictions about strategies LLMs will adopt, allowing us reason when succeed or fail. Using approach—which call teleological approach—we identify three factors hypothesize influence LLM accuracy: probability be performed, target output, provided input. To test our predictions, evaluate five (GPT-3.5, GPT-4, Claude 3, Llama Gemini 1.0) on 11 tasks, find robust evidence are influenced by in hypothesized ways. Many experiments reveal surprising failure modes. For instance, GPT-4’s accuracy at decoding simple cipher is 51% output high-probability sentence but only 13% low-probability, even though deterministic one for which should not matter. These results show AI practitioners careful using low-probability situations. More broadly, conclude as if humans instead treat them distinct type system—one has been shaped its own particular set pressures.
Язык: Английский
Процитировано
15