Large corpora and large language models: a replicable method for automating grammatical annotation DOI
Cameron Morin, Matti Marttinen Larsson

Linguistics Vanguard, Год журнала: 2025, Номер unknown

Опубликована: Апрель 9, 2025

Abstract Much linguistic research relies on annotated datasets of features extracted from text corpora, but the rapid quantitative growth these corpora has created practical difficulties for linguists to manually clean and annotate large data samples. In this paper, we present a method that leverages language models assisting linguist in grammatical annotation through prompt engineering, training, evaluation. We apply methodological pipeline case study formal variation English evaluative verb construction “ consider X (as) (to be) Y”, based model Claude 3.5 Sonnet Davies’s NOW Sketch Engine’s EnTenTen21 corpora. Overall, reach accuracy over 90 % our held-out test samples with only small amount training data, validating very quantities tokens future. discuss generalizability results wider range studies constructions change, underlining value AI copilots as tools future research, notwithstanding some important caveats.

Язык: Английский

Collaborative Growth: When Large Language Models Meet Sociolinguistics DOI Creative Commons
Dong Nguyen

Language and Linguistics Compass, Год журнала: 2025, Номер 19(2)

Опубликована: Фев. 3, 2025

ABSTRACT Large Language Models (LLMs) have dramatically transformed the AI landscape. They can produce remarkable fluent text and exhibit a range of natural language understanding generation capabilities. This article explores how LLMs might be used for sociolinguistic research and, conversely, sociolinguistics contribute to development LLMs. It argues that both areas will benefit from thoughtful, engaging collaboration. Sociolinguists are not merely end users LLMs; they crucial role play in

Язык: Английский

Процитировано

0

Large corpora and large language models: a replicable method for automating grammatical annotation DOI
Cameron Morin, Matti Marttinen Larsson

Linguistics Vanguard, Год журнала: 2025, Номер unknown

Опубликована: Апрель 9, 2025

Abstract Much linguistic research relies on annotated datasets of features extracted from text corpora, but the rapid quantitative growth these corpora has created practical difficulties for linguists to manually clean and annotate large data samples. In this paper, we present a method that leverages language models assisting linguist in grammatical annotation through prompt engineering, training, evaluation. We apply methodological pipeline case study formal variation English evaluative verb construction “ consider X (as) (to be) Y”, based model Claude 3.5 Sonnet Davies’s NOW Sketch Engine’s EnTenTen21 corpora. Overall, reach accuracy over 90 % our held-out test samples with only small amount training data, validating very quantities tokens future. discuss generalizability results wider range studies constructions change, underlining value AI copilots as tools future research, notwithstanding some important caveats.

Язык: Английский

Процитировано

0