An Information-Theoretic Analysis of Targeted Regressions during Reading DOI Open Access
Ethan Wilcox, Tiago Pimentel, Clara Meister

и другие.

Опубликована: Март 1, 2024

Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization two previous qualitative hypotheses regressions, which we dub reactivation reanalysis. argue that these make different predictions the pointwise mutual information pmi a regression’s source target. Intuitively, words measures how much more (or less) likely one word to be present given other. On hand, hypothesis predicts regressions occur associated, implying high positive values pmi. other reanalysis should disassociated with each other, negative, low As second theoretical contribution, expand on theories by considering not only but also expected pmi, E[pmi], where expectation taken over possible realizations The rationale this language processing involves making inferences under uncertainty, readers may uncertain they have read, especially if was skipped. To test both theories, use contemporary models estimate pmi-based statistics pairs in three corpora eye tracking data English, as well six languages across families (Indo-European, Uralic, Turkic). Our results consistent tested: Positive E[pmi] consistently help predict patterns whereas negative do not. interpretation increases predictive scope our studies first systematic crosslinguistic analysis literature. support and, broadly, number behaviors can linked principles.

Язык: Английский

The Eye Movement Database of Passage Reading in Vertically Written Traditional Mongolian DOI Creative Commons

Yaqian Bao,

Xingshan Li, Victor Kuperman

и другие.

Scientific Data, Год журнала: 2025, Номер 12(1)

Опубликована: Март 25, 2025

This paper introduces an eye-tracking corpus of passage reading data in the vertical writing system traditional Mongolian. extends Multilingual Eye Movement Corpus (MECO) database and includes from 66 native readers Mongolian script 12 texts comprising 99 sentences 2,592 words. MECO aims to address research gap studies on understudied languages. As one very few actively used systems, these offer unique insights into cognitive visual processing demands reading. The provides reliability estimates for reports lexical benchmark effects word frequency length. Additionally, a valuable opportunity cross-linguistic comparisons eye movement data, especially with horizontal contributing better understanding how direction influences processing.

Язык: Английский

Процитировано

0

Influence of the surprisal power adjustment on spoken word duration in emotional speech in Serbian DOI

Jelena M. Lazić,

Sanja Vujnović

Computer Speech & Language, Год журнала: 2025, Номер unknown, С. 101803 - 101803

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

Derivational morphology reveals analogical generalization in large language models DOI Creative Commons
Valentin Hofmann,

Leonie Weissweiler,

David R. Mortensen

и другие.

Proceedings of the National Academy of Sciences, Год журнала: 2025, Номер 122(19)

Опубликована: Май 9, 2025

What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which skills of LLMs resemble rules. As yet, it is not known whether could equally well be explained as result analogy. A key shortcoming prior research its focus on regular phenomena, for rule-based and analogical approaches make same predictions. Here, we instead examine derivational morphology, specifically English adjective nominalization, displays notable variability. We introduce a method investigating LLMs: Focusing GPT-J, fit cognitive that instantiate learning LLM training data compare their predictions set nonce adjectives those LLM, allowing us draw direct conclusions regarding underlying mechanisms. expected, explain GPT-J nominalization patterns. However, variable patterns, model provides much better match. Furthermore, GPT-J’s behavior sensitive individual word frequencies, even forms, consistent an account but one. These findings refute hypothesis involves rules, suggesting analogy mechanism. Overall, our study suggests processes play bigger role than previously thought.

Язык: Английский

Процитировано

0

Bigger is not always better: The importance of human-scale language modeling for psycholinguistics DOI Creative Commons
Ethan Wilcox, Michael Y. Hu, Aaron Mueller

и другие.

Journal of Memory and Language, Год журнала: 2025, Номер 144, С. 104650 - 104650

Опубликована: Май 23, 2025

Язык: Английский

Процитировано

0

MulCogBench: a multi-modal cognitive benchmark dataset for evaluating Chinese and English computational language models DOI
Yunhao Zhang, Xiaohan Zhang, Chong Li

и другие.

Language Resources and Evaluation, Год журнала: 2025, Номер unknown

Опубликована: Май 30, 2025

Язык: Английский

Процитировано

0

An information-theoretic analysis of targeted regressions during reading DOI Creative Commons
Ethan Wilcox, Tiago Pimentel, Clara Meister

и другие.

Cognition, Год журнала: 2024, Номер 249, С. 105765 - 105765

Опубликована: Май 20, 2024

Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization two previous qualitative hypotheses regressions, which we dub reactivation reanalysis. argue that these make different predictions the pointwise mutual information pmi a regression's source target. Intuitively, words measures how much more (or less) likely one word to be present given other. On hand, hypothesis predicts regressions occur associated, implying high positive values pmi. other reanalysis should not associated with each other, negative, low As second theoretical contribution, expand on theories by considering only but also expected pmi, E[pmi], where expectation taken over possible realizations The rationale this language processing involves making inferences under uncertainty, readers may uncertain they have read, especially if was skipped. To test both theories, use contemporary models estimate pmi-based statistics pairs in three corpora eye tracking data English, as well six languages across families (Indo-European, Uralic, Turkic). Our results consistent tested: Positive E[pmi] consistently help predict patterns whereas negative do not. interpretation increases predictive scope our studies first systematic crosslinguistic analysis literature. support and, broadly, number behaviors can linked principles.

Язык: Английский

Процитировано

3

Cloze probability, predictability ratings, and computational estimates for 205 English sentences, aligned with existing EEG and reading time data DOI Creative Commons

Andrea Gregor de Varda,

Marco Marelli, Simona Amenta

и другие.

Behavior Research Methods, Год журнала: 2023, Номер 56(5), С. 5190 - 5213

Опубликована: Окт. 25, 2023

We release a database of cloze probability values, predictability ratings, and computational estimates for sample 205 English sentences (1726 words), aligned with previously released word-by-word reading time data (both self-paced eye-movement records; Frank et al., Behavior Research Methods, 45(4), 1182-1190. 2013) EEG responses (Frank Brain Language, 140, 1-11. 2015). Our analyses show that ratings are the best predictors signal (N400, P600, LAN) times, eye movement patterns, when spillover effects taken into account. The particularly effective at explaining variance in eye-tracking without spillover. Cloze have decent overall psychometric accuracy early fixation patterns (first duration). results indicate choice measurement word context critically depends on processing index being considered.

Язык: Английский

Процитировано

6

Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship? DOI Creative Commons
Sathvik Nair, Philip Resnik

Опубликована: Янв. 1, 2023

An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based subword tokenization, not decomposition of words into morphemes. Does matter? We carefully test this by comparing surprisal estimates orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings provide evidence *in the aggregate*, do suffer relative to morphological orthographic segmentation. However, a finer-grained analysis points potential issues relying BPE-based as well providing promising involving morphologically-aware suggesting new method for evaluating prediction.

Язык: Английский

Процитировано

4

Prediction in reading: A review of predictability effects, their theoretical implications, and beyond DOI Creative Commons
Roslyn Wong, Erik D. Reichle, Aaron Veldre

и другие.

Psychonomic Bulletin & Review, Год журнала: 2024, Номер unknown

Опубликована: Окт. 31, 2024

Abstract Historically, prediction during reading has been considered an inefficient and cognitively expensive processing mechanism given the inherently generative nature of language, which allows upcoming text to unfold in infinite number possible ways. This article provides accessible comprehensive review psycholinguistic research that, over past 40 or so years, investigated whether readers are capable generating predictions reading, typically via experiments on effects predictability (i.e., how well a word can be predicted from its prior context). Five theoretically important issues addressed: What is best measure predictability? functional relationship between difficulty? stage(s) does affect? Are ubiquitous? processes do actually reflect? Insights computational models about manifests itself facilitate also discussed. concludes by arguing that can, certain extent, taken as demonstrating evidence but flexible component real-time language comprehension, line with broader predictive accounts cognitive functioning. However, converging evidence, especially concurrent eye-tracking brain-imaging methods, necessary refine theories prediction.

Язык: Английский

Процитировано

1

Word Frequency and Predictability Dissociate in Naturalistic Reading DOI Open Access
Cory Shain

Опубликована: Июль 6, 2023

Many studies of human language processing have shown that readers slow down at less frequent or predictable words, but there is debate about whether frequency and predictability effects reflect separable cognitive phenomena: are operations retrieve words from the mental lexicon based on sensory cues distinct those predict upcoming context? Previous evidence for a frequency-predictability dissociation mostly small samples (both estimating testing their behavior), artificial materials (e.g., isolated constructed sentences), implausible modeling assumptions (discrete-time dynamics, linearity, additivity, constant variance, invariance over time), which raises question: do dissociate in ordinary comprehension, such as story reading? This study leverages recent progress open data computational to address this question scale. A large collection naturalistic reading (six datasets, >2.2M datapoints) analyzed using nonlinear continuous-time regression, estimated statistical models trained more than currently typical psycholinguistics. Despite use data, strong estimates, flexible regression models, results converge with earlier experimental supporting dissociable additive effects.

Язык: Английский

Процитировано

2