Large corpora and large language models: a replicable method for automating grammatical annotation
Linguistics Vanguard,
Journal Year:
2025,
Volume and Issue:
unknown
Published: April 9, 2025
Abstract
Much
linguistic
research
relies
on
annotated
datasets
of
features
extracted
from
text
corpora,
but
the
rapid
quantitative
growth
these
corpora
has
created
practical
difficulties
for
linguists
to
manually
clean
and
annotate
large
data
samples.
In
this
paper,
we
present
a
method
that
leverages
language
models
assisting
linguist
in
grammatical
annotation
through
prompt
engineering,
training,
evaluation.
We
apply
methodological
pipeline
case
study
formal
variation
English
evaluative
verb
construction
“
consider
X
(as)
(to
be)
Y”,
based
model
Claude
3.5
Sonnet
Davies’s
NOW
Sketch
Engine’s
EnTenTen21
corpora.
Overall,
reach
accuracy
over
90
%
our
held-out
test
samples
with
only
small
amount
training
data,
validating
very
quantities
tokens
future.
discuss
generalizability
results
wider
range
studies
constructions
change,
underlining
value
AI
copilots
as
tools
future
research,
notwithstanding
some
important
caveats.
Language: Английский
Constructing understanding: on the constructional information encoded in large language models
Language Resources and Evaluation,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 20, 2024
Abstract
We
review
research
related
to
both
Construction
Grammar
(CxG)
and
Natural
Language
Processing
showing
that
recent
advances
in
probing
Large
Models
(LLMs)
for
certain
types
of
linguistic
knowledge
align
with
the
tenets
CxG.
However,
our
survey
leads
us
hypothesize
LLM
constructional
information
may
be
limited
constructions
within
lower
levels
postulated
taxonomical
“constructicons”
enumerating
a
particular
language’s
constructions.
Specifically,
studies
show
at
taxonomy,
which
are
more
substantive
fixed
elements
corresponding
frequently
used
words
construction,
type
accessible
LLMs.
In
contrast,
general,
abstract
schematic
slots
can
filled
by
variety
different
not
included
test
this
hypothesis
on
collection
10
distinct
constructions,
each
is
exhibited
50
or
corpus
instances.
Our
experimental
results
strongly
support
lead
conclude
that,
order
LLMs
generalize
point
where
purely
recognized
regardless
frequency
instantiating
(as
psycholinguistic
experimentation
has
shown
people
can),
additional
semantic
resources
needed
make
explicit
role
slot.
To
ensure
transparency
reproducibility,
we
publicly
release
data,
including
prompts
model.
Language: Английский
The development of the theory of construction grammar in 1985–2024
OOO Zhurnal Voprosy Istorii,
Journal Year:
2024,
Volume and Issue:
unknown, P. 152 - 159
Published: Nov. 4, 2024
This
article
provides
a
detailed
overview
of
the
history
development
theory
construction
grammar,
which
began
to
take
shape
in
late
1980s
and
reached
its
peak
21st
century.
The
key
stages
also
are
analyzed.
review
aims
ensure
further
laying
groundwork
for
scientific
progress.
Language: Английский