McGill BabyLM Shared Task Submission: The Effects of Data Formatting and Structural Biases DOI Creative Commons

Ziling Cheng,

Rahul Aralikatte,

Ian Porada

и другие.

Опубликована: Янв. 1, 2023

In this study, we describe our submission to the 2023 BabyLM shared-task's strict-small track.Our findings demonstrate feasibility of training high-performing models within constraints limited data, computational resources, and time.We provide evidence that formatting input can significantly impact downstream performance.Furthermore, induction structural biases into through use part-of-speech trees yields modest benefits.Our most successful model achieves 79% on BLiMP evaluations 72% SuperGLUE evaluations.

Язык: Английский

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora DOI Creative Commons
Alex Warstadt, Aaron Mueller,

Leshem Choshen

и другие.

Опубликована: Янв. 1, 2023

Alex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell. Proceedings of the BabyLM Challenge at 27th Conference on Computational Natural Language Learning. 2023.

Язык: Английский

Процитировано

46

Why large language models are poor theories of human linguistic cognition: A reply to Piantadosi DOI Creative Commons
Roni Katzir

Biolinguistics, Год журнала: 2023, Номер 17

Опубликована: Дек. 15, 2023

In a recent manuscript entitled “Modern language models refute Chomsky’s approach to language”, Steven Piantadosi proposes that large such as GPT-3 can serve serious theories of human linguistic cognition. In fact, he maintains these are significantly better than proposals emerging from within generative linguistics. The present note explains why this claim is wrong.

Язык: Английский

Процитировано

14

Modeling rapid language learning by distilling Bayesian priors into artificial neural networks DOI Creative Commons
R. Thomas McCoy, Thomas L. Griffiths

Nature Communications, Год журнала: 2025, Номер 16(1)

Опубликована: Май 20, 2025

Humans can learn languages from remarkably little experience. Developing computational models that explain this ability has been a major challenge in cognitive science. Existing approaches have successful at explaining how humans generalize rapidly controlled settings but are usually too restrictive to tractably handle naturalistic data. We show learning limited data is possible with an approach bridges the divide between two popular modeling traditions: Bayesian and neural networks. This distills model's inductive biases-the factors guide generalization-into network flexible representations. Like model, resulting system formal linguistic patterns network, it also aspects of English syntax naturally-occurring sentences. Thus, model provides single

Язык: Английский

Процитировано

0

Neural Networks as Cognitive Models of the Processing of Syntactic Constraints DOI Creative Commons

Suhas Arehalli,

Tal Linzen

Open Mind, Год журнала: 2024, Номер 8, С. 558 - 614

Опубликована: Янв. 1, 2024

Abstract Languages are governed by syntactic constraints—structural rules that determine which sentences grammatical in the language. In English, one such constraint is subject-verb agreement, dictates number of a verb must match its corresponding subject: “the dogs run”, but dog runs”. While this appears to be simple, practice speakers make agreement errors, particularly when noun phrase near differs from subject (for example, speaker might produce ungrammatical sentence key cabinets rusty”). This phenomenon, referred as attraction, sensitive wide range properties sentence; no single existing model able generate predictions for variety materials studied human experimental literature. We explore viability neural network language models—broad-coverage systems trained predict next word corpus—as framework addressing limitation. analyze errors made Long Short-Term Memory (LSTM) networks and compare them those humans. The models successfully simulate certain results, so-called asymmetry difference between attraction strength sentences, failed others, effect distance or notional (conceptual) number. further evaluate with explicit supervision, find form supervision does not always lead more human-like behavior. Finally, we show corpus used train significantly affects pattern produced network, discuss strengths limitations tool understanding processing.

Язык: Английский

Процитировано

1

Cross-linguistically consistent semantic and syntactic annotation of child-directed speech DOI Creative Commons

Ida Szubert,

Omri Abend, Nathan Schneider

и другие.

Language Resources and Evaluation, Год журнала: 2024, Номер unknown

Опубликована: Май 15, 2024

Abstract Corpora of child speech and child-directed (CDS) have enabled major contributions to the study language acquisition, yet semantic annotation for such corpora is still scarce lacks a uniform standard. Semantic CDS particularly important understanding nature input children receive developing computational models acquisition. For example, under assumption that are able infer meaning representations (at least some of) utterances they hear, acquisition task learn grammar can map novel adult onto their corresponding representations, in face noise distraction by other contextually possible meanings. To this problem develop it, we need provide both ideally using consistent across range languages order facilitate cross-linguistic comparative studies. This paper proposes methodology constructing paired with sentential logical forms, uses method create two corpora, English Hebrew. The approach enforces cross-linguistically representation, building on recent advances dependency representation parsing. Specifically, involves steps. First, annotate Universal Dependencies (UD) scheme syntactic annotation, which has been developed apply consistently wide variety domains typologically diverse languages. Next, further these data applying an automatic transducing forms (LFs) from UD structures. LF complementary strengths: structures language-neutral support reliable multiple annotators, whereas LFs neutral as derivation transparently encode relations. Using approach, CHILDES: Brown’s Adam corpus (English; $$\approx$$ 80% its utterances), all Berman’s Hagar (Hebrew). We verify quality inter-annotator agreement study, manually evaluate transduced representations. then demonstrate utility compiled through (1) longitudinal prevalence different phenomena CDS, (2) existing model briefly comparing results

Язык: Английский

Процитировано

1

The significance of structural rich club hubs for the processing of hierarchical stimuli DOI Creative Commons
Falko Mecklenbrauck, Marius Gruber, Sophie Siestrup

и другие.

Human Brain Mapping, Год журнала: 2023, Номер 45(4)

Опубликована: Дек. 8, 2023

Abstract The brain's structural network follows a hierarchy that is described as rich club (RC) organization, with RC hubs forming the well‐interconnected top of this hierarchy. In study, we tested whether are involved in processing hierarchically higher structures stimulus sequences. Moreover, explored role previously suggested cortical gradients along anterior‐posterior and medial‐lateral axes throughout frontal cortex. To end, conducted functional magnetic resonance imaging (fMRI) experiment presented participants blocks digit sequences were structured on different nested levels. We additionally collected diffusion weighted data same subjects to identify hubs. This classification then served basis for region interest analysis fMRI data. determined centrality measures areas found activation clusters whole‐brain analysis. Our findings support anterior medial shift stimuli. Additionally, structure engages more than lower Areas also likely be part furthermore central network. summary, our results highlight potential organization shaping

Язык: Английский

Процитировано

3

Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior? DOI Creative Commons

Aryaman Chobey,

Oliver Smith,

Anzi Wang

и другие.

Опубликована: Янв. 1, 2023

The use of neural language models to model human behavior has met with mixed success.While some work found that the surprisal estimates from these can be used predict a wide range and behavioral responses, other studying more complex syntactic phenomena generate incorrect predictions.This paper explores extent which misalignment between empirical model-predicted minimized by training on developmentally plausible data, such as in BabyLM Challenge.We trained teacher "strict-small" dataset sentence level create curriculum.We tentative evidence our curriculum made it easier for acquire linguistic knowledge data: subset tasks challenge suite evaluating models' grammatical English, first data then few randomly ordered epochs performed slightly better than alone.This improved acquisition did not result alignment reading behavior, however: (with or without curriculum) generated predictions were misaligned larger less curated datasets.This suggests datasets alone is likely insufficient capable accurately predicting processing.

Язык: Английский

Процитировано

1

What language models can tell us about learning adjectives DOI Open Access
Megan Gotowski, Forrest Davis

Proceedings of the Linguistic Society of America, Год журнала: 2024, Номер 9(1), С. 5693 - 5693

Опубликована: Май 15, 2024

It has been argued that language models (LMs) inform our knowledge of acquisition. While LMs are claimed to replicate aspects grammatical knowledge, it remains unclear how this translates acquisition directly. We ask if a model trained specifically on child-directed speech (CDS) is able capture adjectives. Ultimately, results reveal what the “learning” adjectives distributed in CDS, and not properties different adjective classes. highlighting ability learn distributional information, these findings suggest alone cannot explain children generalize beyond their input.

Язык: Английский

Процитировано

0

The richness of the stimulus: Constructional variation and development in child-directed speech DOI
Bastian Bunzeck, Holger Diessel

First Language, Год журнала: 2024, Номер unknown

Опубликована: Дек. 13, 2024

In a seminal study, Cameron-Faulkner et al. made two important observations about utterance-level constructions in English child-directed speech (CDS). First, they observed that canonical in/transitive sentences are surprisingly infrequent child-direct (given SVO word order is often thought to play key role the acquisition of syntax). Second, found many CDS introduced by lexical frame (such as Let’s. . ., There or What do you .?). Using much larger and more diverse dataset than al., this study shows vary with factors: (1) interactive situation (2) children’s age. While not particularly frequent free toy sessions, predominant other social situations (e.g. during mealtimes shared book reading sessions) increase frequency children get older. Furthermore, our data show different occur types frames length structure. Many include short consisting one words, but questions extensive formed from small set items follow power-law distribution. Considering these findings, we argue structural properties likely facilitate grammar and, particular, questions.

Язык: Английский

Процитировано

0

SemanticScape: A Distributional Model of Concepts Grounded in Distance Patterns between Objects DOI Open Access
Andrea Gregor de Varda, Marco A. Petilli, Marco Marelli

и другие.

Опубликована: Окт. 2, 2023

Data-driven models of concepts are gaining popularity in Psychology and Cognitive Science. Distributional semantic represent word meanings as abstract co-occurrence patterns, excel at capturing human meaning intuitions about conceptual relationships; however, they lack the explicit links to physical world that humans acquire through perception. Computer vision neural networks, on other hand, can produce representations visually-grounded concepts, but do not support extraction information relationships between objects. To bridge gap distributional computer we introduce SemanticScape, a model grounded visual objects natural images. The captures latent statistics spatial organization environment. Its implementation is based calculation summed Euclidean distances all object pairs scenes, which then abstracted by means dimensionality reduction. We validate our against similarity, relatedness, analogical reasoning, several implicit processing measurements. Our results show SemanticScape explains variance responses tasks above beyond what be accounted for standard convolutional networks; it predictive performance perceptual tasks.

Язык: Английский

Процитировано

0