Cited by Readers have to work harder to understand a badly translated text: an eye-tracking study into the effects of translation errors

Derivational morphology reveals analogical generalization in large language models DOI

Valentin Hofmann,

Leonie Weissweiler,

David R. Mortensen

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2025, Volume and Issue: 122(19)

Published: May 9, 2025

What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which skills of LLMs resemble rules. As yet, it is not known whether could equally well be explained as result analogy. A key shortcoming prior research its focus on regular phenomena, for rule-based and analogical approaches make same predictions. Here, we instead examine derivational morphology, specifically English adjective nominalization, displays notable variability. We introduce a method investigating LLMs: Focusing GPT-J, fit cognitive that instantiate learning LLM training data compare their predictions set nonce adjectives those LLM, allowing us draw direct conclusions regarding underlying mechanisms. expected, explain GPT-J nominalization patterns. However, variable patterns, model provides much better match. Furthermore, GPT-J’s behavior sensitive individual word frequencies, even forms, consistent an account but one. These findings refute hypothesis involves rules, suggesting analogy mechanism. Overall, our study suggests processes play bigger role than previously thought.

Language: Английский

Citations

Mouse Tracking for Reading (MoTR): A new naturalistic incremental processing measurement tool DOI

Ethan Wilcox, Cui Ding, Mrinmaya Sachan

et al.

Journal of Memory and Language, Journal Year: 2024, Volume and Issue: 138, P. 104534 - 104534

Published: May 25, 2024

We introduce Mouse Tracking for Reading (MoTR) a new incremental processing measurement tool that can be used to collect word-by-word reading times. In MoTR trial, participants are presented with text, which is blurred, except small region around the tip of mouse. Participants must move mouse reveal and read text. movement recorded, and, using postprocessing pipeline we present, analyzed produce scanpaths as well validate in two suites experiments. first experiment, data English-language Provo Corpus (Luke Christianson, 2018). analyze show interpolate between types strategies during trial – sometimes they fixate on individual words, somewhat akin eye-tracking, while other times more constant pass over slowing down response difficulties. Taking these into account, produced by our analysis correlate previously collected eye-tracking this corpus, correlations higher than those SPR data, also corpus. Furthermore, demonstrate there linear relationship by-word values word-level surprisal values, has been shown (Smith Levy, 2013). second assess whether study sentence phenomena targeted psycholinguistics Using materials from Witzel et al. (2012), English speakers' preferences low attachment online comprehension. argue presents compelling tradeoff multiple experimental considerations: It cheap run browser enabling collection internet. naturalistic some alternative measures, allowing skip words regress previous regions. Finally, it good sensitivity, detecting signatures psycholinguistic behaviors relatively number participants.

Language: Английский

Citations

Word Forms Reflect Trade‐Offs Between Speaker Effort and Robust Listener Recognition DOI

Stephan C. Meylan, Thomas L. Griffiths

Cognitive Science, Journal Year: 2024, Volume and Issue: 48(7)

Published: July 1, 2024

Abstract How do cognitive pressures shape the lexicons of natural languages? Here, we reframe George Kingsley Zipf's proposed “law abbreviation” within a more general framework that relates it to affect speakers and listeners. In this new framework, speakers' drive reduce effort (Zipf's proposal) is counteracted by need for low‐frequency words have word forms are sufficiently distinctive allow accurate recognition To support replicate extend recent work using prevalence subword phonemic sequences (phonotactic probability) measure production in place length. Across languages corpora, phonotactic probability strongly correlated with frequency than We also show ease speech perceptual difficulty indexes degree competition from alternative interpretations recognition. This consistent claim there must be trade‐offs between these two factors, inconsistent proposal facilitates both perception production. our knowledge, first offer an explanation why long, phonotactically improbable remain languages.

Language: Английский

Citations

An information-theoretic analysis of targeted regressions during reading DOI

Ethan Wilcox, Tiago Pimentel, Clara Meister

et al.

Cognition, Journal Year: 2024, Volume and Issue: 249, P. 105765 - 105765

Published: May 20, 2024

Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization two previous qualitative hypotheses regressions, which we dub reactivation reanalysis. argue that these make different predictions the pointwise mutual information pmi a regression's source target. Intuitively, words measures how much more (or less) likely one word to be present given other. On hand, hypothesis predicts regressions occur associated, implying high positive values pmi. other reanalysis should not associated with each other, negative, low As second theoretical contribution, expand on theories by considering only but also expected pmi, E[pmi], where expectation taken over possible realizations The rationale this language processing involves making inferences under uncertainty, readers may uncertain they have read, especially if was skipped. To test both theories, use contemporary models estimate pmi-based statistics pairs in three corpora eye tracking data English, as well six languages across families (Indo-European, Uralic, Turkic). Our results consistent tested: Positive E[pmi] consistently help predict patterns whereas negative do not. interpretation increases predictive scope our studies first systematic crosslinguistic analysis literature. support and, broadly, number behaviors can linked principles.

Language: Английский

Citations

Cloze probability, predictability ratings, and computational estimates for 205 English sentences, aligned with existing EEG and reading time data DOI

Andrea Gregor de Varda,

Marco Marelli, Simona Amenta

et al.

Behavior Research Methods, Journal Year: 2023, Volume and Issue: 56(5), P. 5190 - 5213

Published: Oct. 25, 2023

We release a database of cloze probability values, predictability ratings, and computational estimates for sample 205 English sentences (1726 words), aligned with previously released word-by-word reading time data (both self-paced eye-movement records; Frank et al., Behavior Research Methods, 45(4), 1182-1190. 2013) EEG responses (Frank Brain Language, 140, 1-11. 2015). Our analyses show that ratings are the best predictors signal (N400, P600, LAN) times, eye movement patterns, when spillover effects taken into account. The particularly effective at explaining variance in eye-tracking without spillover. Cloze have decent overall psychometric accuracy early fixation patterns (first duration). results indicate choice measurement word context critically depends on processing index being considered.

Language: Английский

Citations

Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship? DOI

Sathvik Nair, Philip Resnik

Published: Jan. 1, 2023

An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based subword tokenization, not decomposition of words into morphemes. Does matter? We carefully test this by comparing surprisal estimates orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings provide evidence *in the aggregate*, do suffer relative to morphological orthographic segmentation. However, a finer-grained analysis points potential issues relying BPE-based as well providing promising involving morphologically-aware suggesting new method for evaluating prediction.

Language: Английский

Citations

Prediction in reading: A review of predictability effects, their theoretical implications, and beyond DOI

Roslyn Wong, Erik D. Reichle, Aaron Veldre

et al.

Psychonomic Bulletin & Review, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 31, 2024

Abstract Historically, prediction during reading has been considered an inefficient and cognitively expensive processing mechanism given the inherently generative nature of language, which allows upcoming text to unfold in infinite number possible ways. This article provides accessible comprehensive review psycholinguistic research that, over past 40 or so years, investigated whether readers are capable generating predictions reading, typically via experiments on effects predictability (i.e., how well a word can be predicted from its prior context). Five theoretically important issues addressed: What is best measure predictability? functional relationship between difficulty? stage(s) does affect? Are ubiquitous? processes do actually reflect? Insights computational models about manifests itself facilitate also discussed. concludes by arguing that can, certain extent, taken as demonstrating evidence but flexible component real-time language comprehension, line with broader predictive accounts cognitive functioning. However, converging evidence, especially concurrent eye-tracking brain-imaging methods, necessary refine theories prediction.

Language: Английский

Citations

Word Frequency and Predictability Dissociate in Naturalistic Reading DOI

Cory Shain

Published: July 6, 2023

Many studies of human language processing have shown that readers slow down at less frequent or predictable words, but there is debate about whether frequency and predictability effects reflect separable cognitive phenomena: are operations retrieve words from the mental lexicon based on sensory cues distinct those predict upcoming context? Previous evidence for a frequency-predictability dissociation mostly small samples (both estimating testing their behavior), artificial materials (e.g., isolated constructed sentences), implausible modeling assumptions (discrete-time dynamics, linearity, additivity, constant variance, invariance over time), which raises question: do dissociate in ordinary comprehension, such as story reading? This study leverages recent progress open data computational to address this question scale. A large collection naturalistic reading (six datasets, >2.2M datapoints) analyzed using nonlinear continuous-time regression, estimated statistical models trained more than currently typical psycholinguistics. Despite use data, strong estimates, flexible regression models, results converge with earlier experimental supporting dissociable additive effects.

Language: Английский

Citations

Language Model Quality Correlates with Psychometric Predictive Power in Multiple Languages DOI

Ethan Wilcox, Clara Meister, Ryan Cotterell

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2023, Volume and Issue: unknown, P. 7503 - 7511

Published: Jan. 1, 2023

Surprisal theory (Hale, 2001; Levy, 2008) posits that a word's reading time is proportional to its surprisal (i.e., negative log probability given the proceeding context). Since we are unable access ground-truth probability, has been empirically tested using estimates from language models (LMs). Under premise holds, would expect higher quality provide more powerful predictors of human behavior—a conjecture dub quality–power (QP) hypothesis. Unfortunately, empirical support for QP hypothesis mixed. Some studies in English have found correlations between LM and predictive power, but other Japanese data, as well larger LMs, find no such correlations. In this work, conduct systematic crosslinguistic assessment We train LMs scratch on small- medium-sized datasets 13 languages (across five families) assess their ability predict eye tracking data. power eleven these thirteen languages, suggesting that, within range model classes sizes tested, better indeed processing behaviors.

Language: Английский

Citations

An Information-Theoretic Analysis of Targeted Regressions during Reading DOI

Ethan Wilcox, Tiago Pimentel, Clara Meister

et al.

Published: March 1, 2024

Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization two previous qualitative hypotheses regressions, which we dub reactivation reanalysis. argue that these make different predictions the pointwise mutual information pmi a regression’s source target. Intuitively, words measures how much more (or less) likely one word to be present given other. On hand, hypothesis predicts regressions occur associated, implying high positive values pmi. other reanalysis should disassociated with each other, negative, low As second theoretical contribution, expand on theories by considering not only but also expected pmi, E[pmi], where expectation taken over possible realizations The rationale this language processing involves making inferences under uncertainty, readers may uncertain they have read, especially if was skipped. To test both theories, use contemporary models estimate pmi-based statistics pairs in three corpora eye tracking data English, as well six languages across families (Indo-European, Uralic, Turkic). Our results consistent tested: Positive E[pmi] consistently help predict patterns whereas negative do not. interpretation increases predictive scope our studies first systematic crosslinguistic analysis literature. support and, broadly, number behaviors can linked principles.

Language: Английский

Citations