Pseudowords
such
as
“knackets”
or
“spechy”
–
letter
strings
that
are
consistent
with
the
orthotactical
rules
of
a
language
but
do
not
appear
in
its
lexicon
traditionally
considered
to
be
meaningless,
and
employed
empirical
studies.
However,
recent
studies
show
specific
semantic
patterns
associated
these
words
well
effects
on
human
pseudoword
processing
have
cast
doubt
this
view.
While
suggest
pseudowords
meanings,
they
provide
only
extremely
limited
insight
whether
humans
able
ascribe
explicit
declarative
content
unfamiliar
word
forms.
In
present
study,
we
an
exploratory-confirmatory
study
design
examine
question.
first
exploratory
started
from
pre-existing
dataset
alongside
human-generated
definitions
for
items.
Employing
18
different
models,
showed
actually
produced
(pseudo)words
were
closer
their
respective
than
other
Based
initial
results,
conducted
second,
pre-registered,
high-powered
confirmatory
collecting
new,
controlled
set
(pseudo)word
interpretations.
This
second
confirmed
results
one.
Taken
together,
findings
support
idea
meaning
construction
is
supported
by
flexible
form-to-meaning
mapping
system
based
statistical
regularities
environment
can
accommodate
novel
lexical
entries
soon
encountered.
Trends in Cognitive Sciences,
Journal Year:
2023,
Volume and Issue:
27(11), P. 1032 - 1052
Published: Sept. 11, 2023
Prediction
is
often
regarded
as
an
integral
aspect
of
incremental
language
comprehension,
but
little
known
about
the
cognitive
architectures
and
mechanisms
that
support
it.
We
review
studies
showing
listeners
readers
use
all
manner
contextual
information
to
generate
multifaceted
predictions
upcoming
input.
The
nature
these
may
vary
between
individuals
owing
differences
in
experience,
among
other
factors.
then
turn
unresolved
questions
which
guide
search
for
underlying
mechanisms.
(i)
Is
prediction
essential
processing
or
optional
strategy?
(ii)
Are
generated
from
within
system
by
domain-general
processes?
(iii)
What
relationship
memory?
(iv)
Does
comprehension
require
simulation
via
production
system?
discuss
promising
directions
making
progress
answering
developing
a
mechanistic
understanding
language.
Cognitive Science,
Journal Year:
2023,
Volume and Issue:
47(11)
Published: Nov. 1, 2023
Abstract
Word
co‐occurrence
patterns
in
language
corpora
contain
a
surprising
amount
of
conceptual
knowledge.
Large
models
(LLMs),
trained
to
predict
words
context,
leverage
these
achieve
impressive
performance
on
diverse
semantic
tasks
requiring
world
An
important
but
understudied
question
about
LLMs’
abilities
is
whether
they
acquire
generalized
knowledge
common
events.
Here,
we
test
five
pretrained
LLMs
(from
2018's
BERT
2023's
MPT)
assign
higher
likelihood
plausible
descriptions
agent−patient
interactions
than
minimally
different
implausible
versions
the
same
event.
Using
three
curated
sets
minimal
sentence
pairs
(total
n
=
1215),
found
that
possess
substantial
event
knowledge,
outperforming
other
distributional
models.
In
particular,
almost
always
possible
versus
impossible
events
(
The
teacher
bought
laptop
vs.
).
However,
show
less
consistent
preferences
for
likely
unlikely
nanny
tutored
boy
follow‐up
analyses,
(i)
LLM
scores
are
driven
by
both
plausibility
and
surface‐level
features,
(ii)
generalize
well
across
syntactic
variants
(active
passive
constructions)
(synonymous
sentences),
(iii)
some
errors
mirror
human
judgment
ambiguity,
(iv)
serves
as
an
organizing
dimension
internal
representations.
Overall,
our
results
aspects
naturally
emerge
from
linguistic
patterns,
also
highlight
gap
between
representations
possible/impossible
likely/unlikely
Transactions of the Association for Computational Linguistics,
Journal Year:
2023,
Volume and Issue:
11, P. 1451 - 1470
Published: Jan. 1, 2023
Abstract
Surprisal
theory
posits
that
less-predictable
words
should
take
more
time
to
process,
with
word
predictability
quantified
as
surprisal,
i.e.,
negative
log
probability
in
context.
While
evidence
supporting
the
predictions
of
surprisal
has
been
replicated
widely,
much
it
focused
on
a
very
narrow
slice
data:
native
English
speakers
reading
texts.
Indeed,
no
comprehensive
multilingual
analysis
exists.
We
address
this
gap
current
literature
by
investigating
relationship
between
and
times
eleven
different
languages,
distributed
across
five
language
families.
Deriving
estimates
from
models
trained
monolingual
corpora,
we
test
three
associated
theory:
(i)
whether
is
predictive
times,
(ii)
expected
contextual
entropy,
(iii)
linking
function
linear.
find
all
are
borne
out
crosslinguistically.
By
focusing
diverse
set
argue
these
results
offer
most
robust
link
date
information
incremental
processing
languages.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: April 16, 2023
Transformer
models
such
as
GPT
generate
human-like
language
and
are
highly
predictive
of
human
brain
responses
to
language.
Here,
using
fMRI-measured
1,000
diverse
sentences,
we
first
show
that
a
GPT-based
encoding
model
can
predict
the
magnitude
response
associated
with
each
sentence.
Then,
use
identify
new
sentences
predicted
drive
or
suppress
in
network.
We
these
model-selected
novel
indeed
strongly
activity
areas
individuals.
A
systematic
analysis
reveals
surprisal
well-formedness
linguistic
input
key
determinants
strength
These
results
establish
ability
neural
network
not
only
mimic
but
also
noninvasively
control
higher-level
cortical
areas,
like
Open Mind,
Journal Year:
2023,
Volume and Issue:
unknown, P. 1 - 42
Published: June 1, 2023
Words
that
are
more
surprising
given
context
take
longer
to
process.
However,
no
incremental
parsing
algorithm
has
been
shown
directly
predict
this
phenomenon.
In
work,
we
focus
on
a
class
of
algorithms
whose
runtime
does
naturally
scale
in
surprisal-those
involve
repeatedly
sampling
from
the
prior.
Our
first
contribution
is
show
simple
examples
such
increase
superlinearly
with
surprisal,
and
also
variance
increase.
These
two
predictions
stand
contrast
literature
surprisal
theory
(Hale,
2001;
Levy,
2008a)
which
assumes
expected
processing
cost
increases
linearly
makes
prediction
about
variance.
second
part
paper,
conduct
an
empirical
study
relationship
between
reading
time,
using
collection
modern
language
models
estimate
surprisal.
We
find
better
models,
time
increases.
results
consistent
sampling-based
algorithms.
Prediction
has
been
proposed
as
an
overarching
principle
that
explains
human
information
processing
in
language
and
beyond.
To
what
degree
can
difficulty
syntactically
complex
sentences
-
one
of
the
major
concerns
psycholinguistics
be
explained
by
predictability,
estimated
using
computational
models?
A
precise,
quantitative
test
this
question
requires
a
much
larger
scale
data
collection
effort
than
done
past.
We
present
Syntactic
Ambiguity
Processing
Benchmark,
dataset
self-paced
reading
times
from
2000
participants,
who
read
diverse
set
English
sentences.
This
makes
it
possible
to
measure
associated
with
individual
syntactic
constructions,
even
sentences,
precisely
enough
rigorously
predictions
models
comprehension.
find
two
different
architectures
sharply
diverge
time
data,
dramatically
underpredicting
difficulty,
failing
predict
relative
among
ambiguous
only
partially
explaining
item-wise
variability.
These
findings
suggest
prediction
is
most
likely
insufficient
on
its
own
explain
processing.
Neural
language
models
are
increasingly
valued
in
computational
psycholinguistics,
due
to
their
ability
provide
conditional
probability
distributions
over
the
lexicon
that
predictive
of
human
processing
times.
Given
vast
array
available
models,
it
is
both
theoretical
and
methodological
importance
assess
what
features
a
model
influence
its
psychometric
quality.
In
this
work
we
focus
on
parameter
size,
showing
larger
Transformer-based
generate
probabilistic
estimates
less
early
eye-tracking
measurements
reflecting
lexical
access
semantic
integration.
However,
relatively
bigger
show
an
advantage
capturing
late
reflect
full
syntactic
integration
word
into
current
context.
Our
results
supported
by
eye
movement
data
ten
languages
consider
four
spanning
from
564M
4.5B
parameters.
Open Mind,
Journal Year:
2023,
Volume and Issue:
7, P. 757 - 783
Published: Jan. 1, 2023
Abstract
In
a
typical
text,
readers
look
much
longer
at
some
words
than
others,
even
skipping
many
altogether.
Historically,
researchers
explained
this
variation
via
low-level
visual
or
oculomotor
factors,
but
today
it
is
primarily
factors
determining
word’s
lexical
processing
ease,
such
as
how
well
word
identity
can
be
predicted
from
context
discerned
parafoveal
preview.
While
the
existence
of
these
effects
established
in
controlled
experiments,
relative
importance
prediction,
preview
and
natural
reading
remains
unclear.
Here,
we
address
question
three
large
naturalistic
corpora
(n
=
104,
1.5
million
words),
using
deep
neural
networks
Bayesian
ideal
observers
to
model
linguistic
prediction
moment
reading.
Strikingly,
neither
nor
was
important
for
explaining
skipping—the
vast
majority
by
simple
model,
just
fixation
position
length.
For
times,
contrast,
found
strong
independent
contributions
preview,
with
effect
sizes
matching
those
experiments.
Together,
results
challenge
dominant
models
eye
movements
reading,
instead
support
alternative
that
describe
(but
not
times)
largely
autonomous
identification,
mostly
determined
information.