ACM Transactions on Management Information Systems,
Journal Year:
2024,
Volume and Issue:
15(2), P. 1 - 25
Published: March 6, 2024
Intellectual
property
(IP)
theft
is
a
growing
problem.
We
build
on
prior
work
to
deter
IP
by
generating
n
fake
versions
of
technical
document
so
thief
has
expend
time
and
effort
in
identifying
the
correct
document.
Our
new
SbFAKE
framework
proposes,
for
first
time,
novel
combination
language
processing,
optimization,
psycholinguistic
concept
surprisal
generate
set
such
fakes.
start
combining
psycholinguistic-based
scores
optimization
two
bilevel
problems
(an
Explicit
one
simpler
Implicit
one)
whose
solutions
correspond
directly
desired
As
are
usually
hard
solve,
we
then
show
that
these
can
each
be
reduced
equivalent
surprisal-based
linear
programs.
performed
detailed
parameter
tuning
experiments
identified
best
parameters
algorithms.
tested
variants
(with
their
settings)
against
performing
field.
able
more
effectively
convincing
fakes
than
past
work.
In
addition,
replacing
words
an
original
with
having
similar
generates
greater
levels
deception.
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: Feb. 26, 2025
Abstract
Communication,
often
grounded
in
shared
expectations,
faces
challenges
when
a
Sender
and
Receiver
lack
common
linguistic
background.
Our
study
explores
how
people
instinctively
turn
to
the
fundamental
principles
of
physical
world
overcome
such
barriers.
Specifically,
through
an
experimental
game
which
Senders
convey
messages
via
trajectories,
we
investigate
they
develop
novel
strategies
without
relying
on
cues.
We
build
computational
model
based
principle
expectancy
violations
set
universal
priors
derived
from
movement
kinetics.
The
replicates
participant-designed
with
high
accuracy
shows
its
core
variable—surprise—predicts
Receiver’s
physiological
neuronal
responses
brain
areas
processing
expectation
violations.
This
work
highlights
adaptability
human
communication,
showing
surprise
can
be
powerful
tool
forming
new
communicative
language.
Trends in Cognitive Sciences,
Journal Year:
2023,
Volume and Issue:
27(11), P. 1032 - 1052
Published: Sept. 11, 2023
Prediction
is
often
regarded
as
an
integral
aspect
of
incremental
language
comprehension,
but
little
known
about
the
cognitive
architectures
and
mechanisms
that
support
it.
We
review
studies
showing
listeners
readers
use
all
manner
contextual
information
to
generate
multifaceted
predictions
upcoming
input.
The
nature
these
may
vary
between
individuals
owing
differences
in
experience,
among
other
factors.
then
turn
unresolved
questions
which
guide
search
for
underlying
mechanisms.
(i)
Is
prediction
essential
processing
or
optional
strategy?
(ii)
Are
generated
from
within
system
by
domain-general
processes?
(iii)
What
relationship
memory?
(iv)
Does
comprehension
require
simulation
via
production
system?
discuss
promising
directions
making
progress
answering
developing
a
mechanistic
understanding
language.
Open Mind,
Journal Year:
2023,
Volume and Issue:
unknown, P. 1 - 42
Published: June 1, 2023
Words
that
are
more
surprising
given
context
take
longer
to
process.
However,
no
incremental
parsing
algorithm
has
been
shown
directly
predict
this
phenomenon.
In
work,
we
focus
on
a
class
of
algorithms
whose
runtime
does
naturally
scale
in
surprisal-those
involve
repeatedly
sampling
from
the
prior.
Our
first
contribution
is
show
simple
examples
such
increase
superlinearly
with
surprisal,
and
also
variance
increase.
These
two
predictions
stand
contrast
literature
surprisal
theory
(Hale,
2001;
Levy,
2008a)
which
assumes
expected
processing
cost
increases
linearly
makes
prediction
about
variance.
second
part
paper,
conduct
an
empirical
study
relationship
between
reading
time,
using
collection
modern
language
models
estimate
surprisal.
We
find
better
models,
time
increases.
results
consistent
sampling-based
algorithms.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: April 16, 2023
Transformer
models
such
as
GPT
generate
human-like
language
and
are
highly
predictive
of
human
brain
responses
to
language.
Here,
using
fMRI-measured
1,000
diverse
sentences,
we
first
show
that
a
GPT-based
encoding
model
can
predict
the
magnitude
response
associated
with
each
sentence.
Then,
use
identify
new
sentences
predicted
drive
or
suppress
in
network.
We
these
model-selected
novel
indeed
strongly
activity
areas
individuals.
A
systematic
analysis
reveals
surprisal
well-formedness
linguistic
input
key
determinants
strength
These
results
establish
ability
neural
network
not
only
mimic
but
also
noninvasively
control
higher-level
cortical
areas,
like
Open Mind,
Journal Year:
2024,
Volume and Issue:
8, P. 177 - 201
Published: Jan. 1, 2024
Abstract
Many
studies
of
human
language
processing
have
shown
that
readers
slow
down
at
less
frequent
or
predictable
words,
but
there
is
debate
about
whether
frequency
and
predictability
effects
reflect
separable
cognitive
phenomena:
are
operations
retrieve
words
from
the
mental
lexicon
based
on
sensory
cues
distinct
those
predict
upcoming
context?
Previous
evidence
for
a
frequency-predictability
dissociation
mostly
small
samples
(both
estimating
testing
their
behavior),
artificial
materials
(e.g.,
isolated
constructed
sentences),
implausible
modeling
assumptions
(discrete-time
dynamics,
linearity,
additivity,
constant
variance,
invariance
over
time),
which
raises
question:
do
dissociate
in
ordinary
comprehension,
such
as
story
reading?
This
study
leverages
recent
progress
open
data
computational
to
address
this
question
scale.
A
large
collection
naturalistic
reading
(six
datasets,
>2.2
M
datapoints)
analyzed
using
nonlinear
continuous-time
regression,
estimated
statistical
models
trained
more
than
currently
typical
psycholinguistics.
Despite
use
data,
strong
estimates,
flexible
regression
models,
results
converge
with
earlier
experimental
supporting
dissociable
additive
effects.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 22, 2024
Abstract
Human
language
comprehension
is
remarkably
robust
to
ill-formed
inputs
(e.g.,
word
transpositions).
This
robustness
has
led
some
argue
that
syntactic
parsing
largely
an
illusion,
and
incremental
more
heuristic,
shallow,
semantics-based
than
often
assumed.
However,
the
available
data
are
also
consistent
with
possibility
humans
always
perform
rule-like
symbolic
simply
deploy
error
correction
mechanisms
reconstruct
when
needed.
We
put
these
hypotheses
a
new
stringent
test
by
examining
brain
responses
a)
stimuli
should
pose
challenge
for
reconstruction
but
allow
complex
meanings
be
built
within
local
contexts
through
associative/shallow
processing
(sentences
presented
in
backward
order),
b)
grammatically
well-formed
semantically
implausible
sentences
impede
heuristic
processing.
Using
novel
behavioral
paradigm,
we
demonstrate
backward-
indeed
recovery
of
grammatical
structure
during
comprehension.
Critically,
backward-presented
elicit
relatively
low
response
areas,
as
measured
fMRI.
In
contrast,
areas
similar
magnitude
naturalistic
(plausible)
sentences.
other
words,
ability
build
structures
both
necessary
sufficient
fully
engage
network.
Taken
together,
results
provide
strongest
date
support
generalized
reliance
human
on
parsing.
Significance
statement
Whether
relies
predominantly
structural
(syntactic)
cues
or
meaning-
related
(semantic)
remains
debated.
shed
light
this
question
areas’
where
semantic
pitted
against
each
other,
using
find
respond
weakly
composition
cannot
parsed
syntactically—as
confirmed
paradigm—and
they
strongly
sentences,
like
famous
‘Colorless
green
ideas
sleep
furiously’
sentence.
These
findings
accounts
suggest
can
foregone
favor
shallow
Computer Speech & Language,
Journal Year:
2024,
Volume and Issue:
89, P. 101700 - 101700
Published: July 26, 2024
Evaluating
students'
textual
response
is
a
common
and
critical
task
in
language
research
education
practice.
However,
manual
assessment
can
be
tedious
may
lack
consistency,
posing
challenges
for
both
scientific
discovery
frontline
teaching.
Leveraging
state-of-the-art
large
models
(LLMs),
we
aim
to
define
operationalize
LLM-Surprisal,
numeric
representation
of
the
interplay
between
lexical
diversity
syntactic
complexity,
empirically
theoretically
demonstrate
its
relevance
automatic
writing
Chinese
L2
(second
language)
learners'
English
development.
We
developed
an
LLM-based
natural
processing
pipeline
that
automatically
compute
text
Surprisal
scores.
By
comparing
metrics
with
widely
used
classic
indices
studies,
extended
usage
computational
writing.
Our
analyses
suggested
LLM-Surprisals
distinguish
from
L1
(first
writing,
index
development
stages,
predict
scores
provided
by
human
professionals.
This
indicated
dimension
manifest
itself
as
aspects
The
relative
advantages
disadvantages
these
approaches
were
discussed
depth.
concluded
LLMs
are
promising
tools
enhance
research.
showcase
paves
way
more
nuanced
computationally
assessing
understanding
pipelines
findings
will
inspire
teachers,
learners,
researchers
innovative
accessible
manner.
Open Mind,
Journal Year:
2023,
Volume and Issue:
7, P. 757 - 783
Published: Jan. 1, 2023
Abstract
In
a
typical
text,
readers
look
much
longer
at
some
words
than
others,
even
skipping
many
altogether.
Historically,
researchers
explained
this
variation
via
low-level
visual
or
oculomotor
factors,
but
today
it
is
primarily
factors
determining
word’s
lexical
processing
ease,
such
as
how
well
word
identity
can
be
predicted
from
context
discerned
parafoveal
preview.
While
the
existence
of
these
effects
established
in
controlled
experiments,
relative
importance
prediction,
preview
and
natural
reading
remains
unclear.
Here,
we
address
question
three
large
naturalistic
corpora
(n
=
104,
1.5
million
words),
using
deep
neural
networks
Bayesian
ideal
observers
to
model
linguistic
prediction
moment
reading.
Strikingly,
neither
nor
was
important
for
explaining
skipping—the
vast
majority
by
simple
model,
just
fixation
position
length.
For
times,
contrast,
found
strong
independent
contributions
preview,
with
effect
sizes
matching
those
experiments.
Together,
results
challenge
dominant
models
eye
movements
reading,
instead
support
alternative
that
describe
(but
not
times)
largely
autonomous
identification,
mostly
determined
information.