Linguistics Vanguard, Год журнала: 2025, Номер unknown
Опубликована: Апрель 9, 2025
Abstract Much linguistic research relies on annotated datasets of features extracted from text corpora, but the rapid quantitative growth these corpora has created practical difficulties for linguists to manually clean and annotate large data samples. In this paper, we present a method that leverages language models assisting linguist in grammatical annotation through prompt engineering, training, evaluation. We apply methodological pipeline case study formal variation English evaluative verb construction “ consider X (as) (to be) Y”, based model Claude 3.5 Sonnet Davies’s NOW Sketch Engine’s EnTenTen21 corpora. Overall, reach accuracy over 90 % our held-out test samples with only small amount training data, validating very quantities tokens future. discuss generalizability results wider range studies constructions change, underlining value AI copilots as tools future research, notwithstanding some important caveats.
Язык: Английский