Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 571 - 592
Опубликована: Ноя. 28, 2024
Язык: Английский
Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 571 - 592
Опубликована: Ноя. 28, 2024
Язык: Английский
Computational Linguistics, Год журнала: 2023, Номер 50(1), С. 293 - 350
Опубликована: Ноя. 15, 2023
Abstract Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English model behavior before task-specific fine-tuning. Language possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these are sensitive specific inputs surface features. Despite dramatic increases quality as scale hundreds billions parameters, the still prone unfactual responses, commonsense errors, memorized text, social biases. Many weaknesses can be framed over-generalizations or under-generalizations learned patterns text. We synthesize results highlight what currently known about large capabilities, thus providing a resource for applied work research adjacent fields that use models.
Язык: Английский
Процитировано
41Perspectives on Psychological Science, Год журнала: 2023, Номер 19(5), С. 874 - 883
Опубликована: Окт. 26, 2023
Much discussion about large language models and language-and-vision has focused on whether these are intelligent agents. We present an alternative perspective. First, we argue that artificial intelligence (AI) cultural technologies enhance transmission efficient powerful imitation engines. Second, explore what AI can tell us innovation by testing they be used to discover new tools novel causal structures contrasting their responses with those of human children. Our work serves as a first step in determining which particular representations competences, well kinds knowledge or skill, derived from learning techniques data. In particular, cognitive capacities enabled statistical analysis large-scale linguistic Critically, our findings suggest machines may need more than image data allow the small child produce.
Язык: Английский
Процитировано
27Behavior Research Methods, Год журнала: 2024, Номер 56(6), С. 6082 - 6100
Опубликована: Янв. 23, 2024
Research on language and cognition relies extensively psycholinguistic datasets or "norms". These contain judgments of lexical properties like concreteness age acquisition, can be used to norm experimental stimuli, discover empirical relationships in the lexicon, stress-test computational models. However, collecting human at scale is both time-consuming expensive. This issue compounded for multi-dimensional norms those incorporating context. The current work asks whether large models (LLMs) leveraged augment creation large, English. I use GPT-4 collect multiple kinds semantic (e.g., word similarity, contextualized sensorimotor associations, iconicity) English words compare these against "gold standard". For each dataset, find that GPT-4's are positively correlated with judgments, some cases rivaling even exceeding average inter-annotator agreement displayed by humans. then identify several ways which LLM-generated differ from human-generated systematically. also perform "substitution analyses", demonstrate replacing a statistical model does not change sign parameter estimates (though select cases, there significant changes their magnitude). conclude discussing considerations limitations associated general, including concerns data contamination, choice LLM, external validity, construct quality. Additionally, all (over 30,000 total) made available online further analysis.
Язык: Английский
Процитировано
10Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Год журнала: 2023, Номер unknown
Опубликована: Янв. 1, 2023
Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack grounded benchmark measure how understand social language. Here, introduce new theory-driven benchmark, SocKET, contains 58 NLP tasks testing knowledge which group into five categories: humor & sarcasm, offensiveness, sentiment emotion, trustworthiness. In tests on the demonstrate current attain only moderate performance but reveal significant potential for task transfer among different types categories tasks, were predicted from theory. Through zero-shot evaluations, show pretrained already possess some innate limited capabilities understanding training one category can improve others. Our provides systematic way analyze model an important dimension points clear room improvement build more socially-aware LLMs. The resources released https://github.com/minjechoi/SOCKET.
Язык: Английский
Процитировано
23Language and Linguistics Compass, Год журнала: 2025, Номер 19(2)
Опубликована: Фев. 3, 2025
ABSTRACT Large Language Models (LLMs) have dramatically transformed the AI landscape. They can produce remarkable fluent text and exhibit a range of natural language understanding generation capabilities. This article explores how LLMs might be used for sociolinguistic research and, conversely, sociolinguistics contribute to development LLMs. It argues that both areas will benefit from thoughtful, engaging collaboration. Sociolinguists are not merely end users LLMs; they crucial role play in
Язык: Английский
Процитировано
0Cognitive Research Principles and Implications, Год журнала: 2025, Номер 10(1)
Опубликована: Апрель 5, 2025
Abstract Data visualizations play a crucial role in communicating patterns quantitative data, making data visualization literacy key target of STEM education. However, it is currently unclear to what degree different assessments measure the same underlying constructs. Here, we administered two widely used graph comprehension (Galesic and Garcia-Retamero Med Dec Mak 31:444–457, 2011; Lee et al. IEEE Trans Vis Comput Graph 235:51–560, 2016) both university-based convenience sample demographically representative adult participants USA ( N =1,113). Our analysis individual variability test performance suggests that overall scores are correlated between associated with amount prior coursework mathematics. further exploration error these probe somewhat distinct components literacy, do not find evidence correspond categories guided design either (e.g., questions require retrieving values rather than comparisons). Together, findings suggest opportunities for development more comprehensive organized by better account detailed behavioral patterns.
Язык: Английский
Процитировано
0Cognitive Science, Год журнала: 2025, Номер 49(5)
Опубликована: Май 1, 2025
Abstract Humor is an essential aspect of human experience, yet surprisingly, little known about how we recognize and understand humorous utterances. Most theories humor emphasize the role incongruity detection resolution (e.g., frame‐shifting), as well cognitive capacities like Theory Mind pragmatic reasoning. In multiple preregistered experiments, ask whether to what extent exposure purely linguistic input can account for ability one‐line jokes identify their entailments. We find that GPT‐3, a large language model (LLM) trained on only data, exhibits above‐chance performance in tasks designed test its detect, appreciate, comprehend jokes. exploratory work, also comprehension several open‐source LLMs, such Llama‐3 Mixtral. Although all LLMs tested fall short performance, both humans show tendency misclassify nonjokes with surprising endings Results suggest are remarkably adept at some involving jokes, but reveal key limitations distributional approaches meaning.
Язык: Английский
Процитировано
0Опубликована: Июль 10, 2023
This study examines the pragmatic abilities of OpenAI’s ChatGPT, a conversational agent based on multi-layer Transformer network GPT-3.5. To do so, we administered language battery to assess expressive and receptive skills compared results with human performance. ChatGPT were mostly human-like but revealed weaknesses in domains Gricean maxim quantity, text-based inferences, physical metaphors, humor comprehension. On one hand, these findings suggest that at least part linguistic competence, as evaluated via current assessment tools, might be distributionally encoded language. other situated meta-representational aspects inferencing appear not yet fully accounted for LLMs.
Язык: Английский
Процитировано
8Опубликована: Янв. 1, 2023
People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, user-facing artificial intelligence systems will require similar skills in pragmatics: relying various types of — from shared linguistic goals conventions, the visual embodied world use language effectively. We survey existing grounded settings pragmatic modeling approaches analyze how task goals, environmental contexts, communicative affordances each work meaning. present recommendations for future design elicit phenomena, suggest directions that focus a broader range contexts affordances.
Язык: Английский
Процитировано
6Journal of Machine Learning for Modeling and Computing, Год журнала: 2024, Номер 5(2), С. 1 - 44
Опубликована: Янв. 1, 2024
The new polymath large language models (LLMs) can greatly speed up scientific reviews, possibly using more unbiased quantitative metrics, facilitating cross-disciplinary connections, and identifying emerging trends research gaps by analyzing volumes of data. However, at the present time, they lack required deep understanding complex methodologies, have difficulty in evaluating innovative claims, are unable to assess ethical issues conflicts interest. Herein, we consider 13 geotechnical parrot tales (GPT)-related papers across different domains, reviewed a human reviewer SciSpace, model, with reviews evaluated three distinct types evaluators, namely GPT-3.5, crowd panel, GPT-4. We found that 50% SciSpace's responses objective questions align those reviewer, GPT-4 (informed evaluator) often rating higher accuracy, SciSpace structure, clarity, completeness. In subjective questions, uninformed evaluators (GPT-3.5 panel) showed varying preferences between responses, panel showing preference for responses. rated them equally accuracy structure but favored
Язык: Английский
Процитировано
2