Large corpora and large language models: a replicable method for automating grammatical annotation DOI
Cameron Morin, Matti Marttinen Larsson

Linguistics Vanguard, Journal Year: 2025, Volume and Issue: unknown

Published: April 9, 2025

Abstract Much linguistic research relies on annotated datasets of features extracted from text corpora, but the rapid quantitative growth these corpora has created practical difficulties for linguists to manually clean and annotate large data samples. In this paper, we present a method that leverages language models assisting linguist in grammatical annotation through prompt engineering, training, evaluation. We apply methodological pipeline case study formal variation English evaluative verb construction “ consider X (as) (to be) Y”, based model Claude 3.5 Sonnet Davies’s NOW Sketch Engine’s EnTenTen21 corpora. Overall, reach accuracy over 90 % our held-out test samples with only small amount training data, validating very quantities tokens future. discuss generalizability results wider range studies constructions change, underlining value AI copilots as tools future research, notwithstanding some important caveats.

Language: Английский

Acquiring constraints on filler-gap dependencies from structural collocations: Assessing a computational learning model of island-insensitivity in Norwegian DOI Creative Commons
Anastasia Kobzeva, Dave Kush

Language Acquisition, Journal Year: 2025, Volume and Issue: unknown, P. 1 - 44

Published: March 13, 2025

Language: Английский

Citations

0

Large corpora and large language models: a replicable method for automating grammatical annotation DOI
Cameron Morin, Matti Marttinen Larsson

Linguistics Vanguard, Journal Year: 2025, Volume and Issue: unknown

Published: April 9, 2025

Abstract Much linguistic research relies on annotated datasets of features extracted from text corpora, but the rapid quantitative growth these corpora has created practical difficulties for linguists to manually clean and annotate large data samples. In this paper, we present a method that leverages language models assisting linguist in grammatical annotation through prompt engineering, training, evaluation. We apply methodological pipeline case study formal variation English evaluative verb construction “ consider X (as) (to be) Y”, based model Claude 3.5 Sonnet Davies’s NOW Sketch Engine’s EnTenTen21 corpora. Overall, reach accuracy over 90 % our held-out test samples with only small amount training data, validating very quantities tokens future. discuss generalizability results wider range studies constructions change, underlining value AI copilots as tools future research, notwithstanding some important caveats.

Language: Английский

Citations

0