In
this
study,
we
describe
our
submission
to
the
2023
BabyLM
shared-task's
strict-small
track.Our
findings
demonstrate
feasibility
of
training
high-performing
models
within
constraints
limited
data,
computational
resources,
and
time.We
provide
evidence
that
formatting
input
can
significantly
impact
downstream
performance.Furthermore,
induction
structural
biases
into
through
use
part-of-speech
trees
yields
modest
benefits.Our
most
successful
model
achieves
79%
on
BLiMP
evaluations
72%
SuperGLUE
evaluations.
Alex
Warstadt,
Aaron
Mueller,
Leshem
Choshen,
Ethan
Wilcox,
Chengxu
Zhuang,
Juan
Ciro,
Rafael
Mosquera,
Bhargavi
Paranjabe,
Adina
Williams,
Tal
Linzen,
Ryan
Cotterell.
Proceedings
of
the
BabyLM
Challenge
at
27th
Conference
on
Computational
Natural
Language
Learning.
2023.
In
a
recent
manuscript
entitled
“Modern
language
models
refute
Chomsky’s
approach
to
language”,
Steven
Piantadosi
proposes
that
large
such
as
GPT-3
can
serve
serious
theories
of
human
linguistic
cognition.
In
fact,
he
maintains
these
are
significantly
better
than
proposals
emerging
from
within
generative
linguistics.
The
present
note
explains
why
this
claim
is
wrong.
Nature Communications,
Год журнала:
2025,
Номер
16(1)
Опубликована: Май 20, 2025
Humans
can
learn
languages
from
remarkably
little
experience.
Developing
computational
models
that
explain
this
ability
has
been
a
major
challenge
in
cognitive
science.
Existing
approaches
have
successful
at
explaining
how
humans
generalize
rapidly
controlled
settings
but
are
usually
too
restrictive
to
tractably
handle
naturalistic
data.
We
show
learning
limited
data
is
possible
with
an
approach
bridges
the
divide
between
two
popular
modeling
traditions:
Bayesian
and
neural
networks.
This
distills
model's
inductive
biases-the
factors
guide
generalization-into
network
flexible
representations.
Like
model,
resulting
system
formal
linguistic
patterns
network,
it
also
aspects
of
English
syntax
naturally-occurring
sentences.
Thus,
model
provides
single
Open Mind,
Год журнала:
2024,
Номер
8, С. 558 - 614
Опубликована: Янв. 1, 2024
Abstract
Languages
are
governed
by
syntactic
constraints—structural
rules
that
determine
which
sentences
grammatical
in
the
language.
In
English,
one
such
constraint
is
subject-verb
agreement,
dictates
number
of
a
verb
must
match
its
corresponding
subject:
“the
dogs
run”,
but
dog
runs”.
While
this
appears
to
be
simple,
practice
speakers
make
agreement
errors,
particularly
when
noun
phrase
near
differs
from
subject
(for
example,
speaker
might
produce
ungrammatical
sentence
key
cabinets
rusty”).
This
phenomenon,
referred
as
attraction,
sensitive
wide
range
properties
sentence;
no
single
existing
model
able
generate
predictions
for
variety
materials
studied
human
experimental
literature.
We
explore
viability
neural
network
language
models—broad-coverage
systems
trained
predict
next
word
corpus—as
framework
addressing
limitation.
analyze
errors
made
Long
Short-Term
Memory
(LSTM)
networks
and
compare
them
those
humans.
The
models
successfully
simulate
certain
results,
so-called
asymmetry
difference
between
attraction
strength
sentences,
failed
others,
effect
distance
or
notional
(conceptual)
number.
further
evaluate
with
explicit
supervision,
find
form
supervision
does
not
always
lead
more
human-like
behavior.
Finally,
we
show
corpus
used
train
significantly
affects
pattern
produced
network,
discuss
strengths
limitations
tool
understanding
processing.
Language Resources and Evaluation,
Год журнала:
2024,
Номер
unknown
Опубликована: Май 15, 2024
Abstract
Corpora
of
child
speech
and
child-directed
(CDS)
have
enabled
major
contributions
to
the
study
language
acquisition,
yet
semantic
annotation
for
such
corpora
is
still
scarce
lacks
a
uniform
standard.
Semantic
CDS
particularly
important
understanding
nature
input
children
receive
developing
computational
models
acquisition.
For
example,
under
assumption
that
are
able
infer
meaning
representations
(at
least
some
of)
utterances
they
hear,
acquisition
task
learn
grammar
can
map
novel
adult
onto
their
corresponding
representations,
in
face
noise
distraction
by
other
contextually
possible
meanings.
To
this
problem
develop
it,
we
need
provide
both
ideally
using
consistent
across
range
languages
order
facilitate
cross-linguistic
comparative
studies.
This
paper
proposes
methodology
constructing
paired
with
sentential
logical
forms,
uses
method
create
two
corpora,
English
Hebrew.
The
approach
enforces
cross-linguistically
representation,
building
on
recent
advances
dependency
representation
parsing.
Specifically,
involves
steps.
First,
annotate
Universal
Dependencies
(UD)
scheme
syntactic
annotation,
which
has
been
developed
apply
consistently
wide
variety
domains
typologically
diverse
languages.
Next,
further
these
data
applying
an
automatic
transducing
forms
(LFs)
from
UD
structures.
LF
complementary
strengths:
structures
language-neutral
support
reliable
multiple
annotators,
whereas
LFs
neutral
as
derivation
transparently
encode
relations.
Using
approach,
CHILDES:
Brown’s
Adam
corpus
(English;
$$\approx$$
≈
80%
its
utterances),
all
Berman’s
Hagar
(Hebrew).
We
verify
quality
inter-annotator
agreement
study,
manually
evaluate
transduced
representations.
then
demonstrate
utility
compiled
through
(1)
longitudinal
prevalence
different
phenomena
CDS,
(2)
existing
model
briefly
comparing
results
Human Brain Mapping,
Год журнала:
2023,
Номер
45(4)
Опубликована: Дек. 8, 2023
Abstract
The
brain's
structural
network
follows
a
hierarchy
that
is
described
as
rich
club
(RC)
organization,
with
RC
hubs
forming
the
well‐interconnected
top
of
this
hierarchy.
In
study,
we
tested
whether
are
involved
in
processing
hierarchically
higher
structures
stimulus
sequences.
Moreover,
explored
role
previously
suggested
cortical
gradients
along
anterior‐posterior
and
medial‐lateral
axes
throughout
frontal
cortex.
To
end,
conducted
functional
magnetic
resonance
imaging
(fMRI)
experiment
presented
participants
blocks
digit
sequences
were
structured
on
different
nested
levels.
We
additionally
collected
diffusion
weighted
data
same
subjects
to
identify
hubs.
This
classification
then
served
basis
for
region
interest
analysis
fMRI
data.
determined
centrality
measures
areas
found
activation
clusters
whole‐brain
analysis.
Our
findings
support
anterior
medial
shift
stimuli.
Additionally,
structure
engages
more
than
lower
Areas
also
likely
be
part
furthermore
central
network.
summary,
our
results
highlight
potential
organization
shaping
The
use
of
neural
language
models
to
model
human
behavior
has
met
with
mixed
success.While
some
work
found
that
the
surprisal
estimates
from
these
can
be
used
predict
a
wide
range
and
behavioral
responses,
other
studying
more
complex
syntactic
phenomena
generate
incorrect
predictions.This
paper
explores
extent
which
misalignment
between
empirical
model-predicted
minimized
by
training
on
developmentally
plausible
data,
such
as
in
BabyLM
Challenge.We
trained
teacher
"strict-small"
dataset
sentence
level
create
curriculum.We
tentative
evidence
our
curriculum
made
it
easier
for
acquire
linguistic
knowledge
data:
subset
tasks
challenge
suite
evaluating
models'
grammatical
English,
first
data
then
few
randomly
ordered
epochs
performed
slightly
better
than
alone.This
improved
acquisition
did
not
result
alignment
reading
behavior,
however:
(with
or
without
curriculum)
generated
predictions
were
misaligned
larger
less
curated
datasets.This
suggests
datasets
alone
is
likely
insufficient
capable
accurately
predicting
processing.
Proceedings of the Linguistic Society of America,
Год журнала:
2024,
Номер
9(1), С. 5693 - 5693
Опубликована: Май 15, 2024
It
has
been
argued
that
language
models
(LMs)
inform
our
knowledge
of
acquisition.
While
LMs
are
claimed
to
replicate
aspects
grammatical
knowledge,
it
remains
unclear
how
this
translates
acquisition
directly.
We
ask
if
a
model
trained
specifically
on
child-directed
speech
(CDS)
is
able
capture
adjectives.
Ultimately,
results
reveal
what
the
“learning”
adjectives
distributed
in
CDS,
and
not
properties
different
adjective
classes.
highlighting
ability
learn
distributional
information,
these
findings
suggest
alone
cannot
explain
children
generalize
beyond
their
input.
In
a
seminal
study,
Cameron-Faulkner
et
al.
made
two
important
observations
about
utterance-level
constructions
in
English
child-directed
speech
(CDS).
First,
they
observed
that
canonical
in/transitive
sentences
are
surprisingly
infrequent
child-direct
(given
SVO
word
order
is
often
thought
to
play
key
role
the
acquisition
of
syntax).
Second,
found
many
CDS
introduced
by
lexical
frame
(such
as
Let’s.
.
.,
There
or
What
do
you
.?).
Using
much
larger
and
more
diverse
dataset
than
al.,
this
study
shows
vary
with
factors:
(1)
interactive
situation
(2)
children’s
age.
While
not
particularly
frequent
free
toy
sessions,
predominant
other
social
situations
(e.g.
during
mealtimes
shared
book
reading
sessions)
increase
frequency
children
get
older.
Furthermore,
our
data
show
different
occur
types
frames
length
structure.
Many
include
short
consisting
one
words,
but
questions
extensive
formed
from
small
set
items
follow
power-law
distribution.
Considering
these
findings,
we
argue
structural
properties
likely
facilitate
grammar
and,
particular,
questions.
Data-driven
models
of
concepts
are
gaining
popularity
in
Psychology
and
Cognitive
Science.
Distributional
semantic
represent
word
meanings
as
abstract
co-occurrence
patterns,
excel
at
capturing
human
meaning
intuitions
about
conceptual
relationships;
however,
they
lack
the
explicit
links
to
physical
world
that
humans
acquire
through
perception.
Computer
vision
neural
networks,
on
other
hand,
can
produce
representations
visually-grounded
concepts,
but
do
not
support
extraction
information
relationships
between
objects.
To
bridge
gap
distributional
computer
we
introduce
SemanticScape,
a
model
grounded
visual
objects
natural
images.
The
captures
latent
statistics
spatial
organization
environment.
Its
implementation
is
based
calculation
summed
Euclidean
distances
all
object
pairs
scenes,
which
then
abstracted
by
means
dimensionality
reduction.
We
validate
our
against
similarity,
relatedness,
analogical
reasoning,
several
implicit
processing
measurements.
Our
results
show
SemanticScape
explains
variance
responses
tasks
above
beyond
what
be
accounted
for
standard
convolutional
networks;
it
predictive
performance
perceptual
tasks.