In
psycholinguistics,
surprisal
theory
posits
that
the
amount
of
online
processing
effort
expended
by
a
human
comprehender
per
word
positively
correlates
with
given
its
preceding
context.
addition
to
this
overall
correlation,
more
importantly,
specific
quantitative
form
taken
as
function
offers
insights
into
underlying
cognitive
mechanisms
language
processing.
Focusing
on
English,
previous
studies
have
looked
linearity
reading
times.
Here,
we
extend
investigation
examining
eyetracking
corpora
seven
languages:
Danish,
Dutch,
German,
Japanese,
Mandarin,
and
Russian.
We
find
evidence
for
superlinearity
in
some
languages,
but
results
are
highly
sensitive
which
model
is
used
estimate
surprisal.
Trends in Cognitive Sciences,
Год журнала:
2023,
Номер
27(11), С. 1032 - 1052
Опубликована: Сен. 11, 2023
Prediction
is
often
regarded
as
an
integral
aspect
of
incremental
language
comprehension,
but
little
known
about
the
cognitive
architectures
and
mechanisms
that
support
it.
We
review
studies
showing
listeners
readers
use
all
manner
contextual
information
to
generate
multifaceted
predictions
upcoming
input.
The
nature
these
may
vary
between
individuals
owing
differences
in
experience,
among
other
factors.
then
turn
unresolved
questions
which
guide
search
for
underlying
mechanisms.
(i)
Is
prediction
essential
processing
or
optional
strategy?
(ii)
Are
generated
from
within
system
by
domain-general
processes?
(iii)
What
relationship
memory?
(iv)
Does
comprehension
require
simulation
via
production
system?
discuss
promising
directions
making
progress
answering
developing
a
mechanistic
understanding
language.
Cognitive Science,
Год журнала:
2023,
Номер
47(11)
Опубликована: Ноя. 1, 2023
Abstract
Word
co‐occurrence
patterns
in
language
corpora
contain
a
surprising
amount
of
conceptual
knowledge.
Large
models
(LLMs),
trained
to
predict
words
context,
leverage
these
achieve
impressive
performance
on
diverse
semantic
tasks
requiring
world
An
important
but
understudied
question
about
LLMs’
abilities
is
whether
they
acquire
generalized
knowledge
common
events.
Here,
we
test
five
pretrained
LLMs
(from
2018's
BERT
2023's
MPT)
assign
higher
likelihood
plausible
descriptions
agent−patient
interactions
than
minimally
different
implausible
versions
the
same
event.
Using
three
curated
sets
minimal
sentence
pairs
(total
n
=
1215),
found
that
possess
substantial
event
knowledge,
outperforming
other
distributional
models.
In
particular,
almost
always
possible
versus
impossible
events
(
The
teacher
bought
laptop
vs.
).
However,
show
less
consistent
preferences
for
likely
unlikely
nanny
tutored
boy
follow‐up
analyses,
(i)
LLM
scores
are
driven
by
both
plausibility
and
surface‐level
features,
(ii)
generalize
well
across
syntactic
variants
(active
passive
constructions)
(synonymous
sentences),
(iii)
some
errors
mirror
human
judgment
ambiguity,
(iv)
serves
as
an
organizing
dimension
internal
representations.
Overall,
our
results
aspects
naturally
emerge
from
linguistic
patterns,
also
highlight
gap
between
representations
possible/impossible
likely/unlikely
Transactions of the Association for Computational Linguistics,
Год журнала:
2023,
Номер
11, С. 1451 - 1470
Опубликована: Янв. 1, 2023
Abstract
Surprisal
theory
posits
that
less-predictable
words
should
take
more
time
to
process,
with
word
predictability
quantified
as
surprisal,
i.e.,
negative
log
probability
in
context.
While
evidence
supporting
the
predictions
of
surprisal
has
been
replicated
widely,
much
it
focused
on
a
very
narrow
slice
data:
native
English
speakers
reading
texts.
Indeed,
no
comprehensive
multilingual
analysis
exists.
We
address
this
gap
current
literature
by
investigating
relationship
between
and
times
eleven
different
languages,
distributed
across
five
language
families.
Deriving
estimates
from
models
trained
monolingual
corpora,
we
test
three
associated
theory:
(i)
whether
is
predictive
times,
(ii)
expected
contextual
entropy,
(iii)
linking
function
linear.
find
all
are
borne
out
crosslinguistically.
By
focusing
diverse
set
argue
these
results
offer
most
robust
link
date
information
incremental
processing
languages.
Open Mind,
Год журнала:
2023,
Номер
unknown, С. 1 - 42
Опубликована: Июнь 1, 2023
Words
that
are
more
surprising
given
context
take
longer
to
process.
However,
no
incremental
parsing
algorithm
has
been
shown
directly
predict
this
phenomenon.
In
work,
we
focus
on
a
class
of
algorithms
whose
runtime
does
naturally
scale
in
surprisal-those
involve
repeatedly
sampling
from
the
prior.
Our
first
contribution
is
show
simple
examples
such
increase
superlinearly
with
surprisal,
and
also
variance
increase.
These
two
predictions
stand
contrast
literature
surprisal
theory
(Hale,
2001;
Levy,
2008a)
which
assumes
expected
processing
cost
increases
linearly
makes
prediction
about
variance.
second
part
paper,
conduct
an
empirical
study
relationship
between
reading
time,
using
collection
modern
language
models
estimate
surprisal.
We
find
better
models,
time
increases.
results
consistent
sampling-based
algorithms.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Апрель 16, 2023
Transformer
models
such
as
GPT
generate
human-like
language
and
are
highly
predictive
of
human
brain
responses
to
language.
Here,
using
fMRI-measured
1,000
diverse
sentences,
we
first
show
that
a
GPT-based
encoding
model
can
predict
the
magnitude
response
associated
with
each
sentence.
Then,
use
identify
new
sentences
predicted
drive
or
suppress
in
network.
We
these
model-selected
novel
indeed
strongly
activity
areas
individuals.
A
systematic
analysis
reveals
surprisal
well-formedness
linguistic
input
key
determinants
strength
These
results
establish
ability
neural
network
not
only
mimic
but
also
noninvasively
control
higher-level
cortical
areas,
like
Prediction
has
been
proposed
as
an
overarching
principle
that
explains
human
information
processing
in
language
and
beyond.
To
what
degree
can
difficulty
syntactically
complex
sentences
-
one
of
the
major
concerns
psycholinguistics
be
explained
by
predictability,
estimated
using
computational
models?
A
precise,
quantitative
test
this
question
requires
a
much
larger
scale
data
collection
effort
than
done
past.
We
present
Syntactic
Ambiguity
Processing
Benchmark,
dataset
self-paced
reading
times
from
2000
participants,
who
read
diverse
set
English
sentences.
This
makes
it
possible
to
measure
associated
with
individual
syntactic
constructions,
even
sentences,
precisely
enough
rigorously
predictions
models
comprehension.
find
two
different
architectures
sharply
diverge
time
data,
dramatically
underpredicting
difficulty,
failing
predict
relative
among
ambiguous
only
partially
explaining
item-wise
variability.
These
findings
suggest
prediction
is
most
likely
insufficient
on
its
own
explain
processing.
Neural
language
models
are
increasingly
valued
in
computational
psycholinguistics,
due
to
their
ability
provide
conditional
probability
distributions
over
the
lexicon
that
predictive
of
human
processing
times.
Given
vast
array
available
models,
it
is
both
theoretical
and
methodological
importance
assess
what
features
a
model
influence
its
psychometric
quality.
In
this
work
we
focus
on
parameter
size,
showing
larger
Transformer-based
generate
probabilistic
estimates
less
early
eye-tracking
measurements
reflecting
lexical
access
semantic
integration.
However,
relatively
bigger
show
an
advantage
capturing
late
reflect
full
syntactic
integration
word
into
current
context.
Our
results
supported
by
eye
movement
data
ten
languages
consider
four
spanning
from
564M
4.5B
parameters.
Transactions of the Association for Computational Linguistics,
Год журнала:
2023,
Номер
11, С. 1624 - 1642
Опубликована: Янв. 1, 2023
Abstract
Over
the
past
two
decades,
numerous
studies
have
demonstrated
how
less-predictable
(i.e.,
higher
surprisal)
words
take
more
time
to
read.
In
general,
these
implicitly
assumed
reading
process
is
purely
responsive:
Readers
observe
a
new
word
and
allocate
it
as
required.
We
argue
that
prior
results
are
also
compatible
with
at
least
partially
anticipatory:
could
make
predictions
about
future
based
on
their
expectation.
this
work,
we
operationalize
anticipation
word’s
contextual
entropy.
assess
effect
of
by
comparing
well
surprisal
entropy
predict
times
four
naturalistic
datasets:
self-paced
eye-tracking.
Experimentally,
across
datasets
analyses,
find
substantial
evidence
for
effects
over
(RT):
fact,
sometimes
better
than
in
predicting
RT.
Spillover
effects,
however,
generally
not
captured
entropy,
but
only
surprisal.
Further,
hypothesize
cognitive
mechanisms
through
which
impact
RTs—three
able
design
experiments
analyze.
Overall,
our
support
view
just
responsive,
anticipatory.1