Recent
work
has
demonstrated
impressive
parallels
between
human
visual
representations
and
those
found
in
deep
neural
networks.
A
new
study
by
Wang
et
al.
(2023)
highlights
what
factors
may
determine
this
similarity.
(commentary)
Perceived
similarity
offers
a
window
into
the
mental
representations
underlying
our
ability
to
make
sense
of
visual
world,
yet,
collection
judgments
quickly
becomes
infeasible
for
larger
datasets,
limiting
their
generality.
To
address
this
challenge,
here
we
introduce
computational
approach
that
predicts
perceived
from
neural
network
activations
through
set
49
interpretable
dimensions
learned
on
1.46
million
triplet
odd-one-out
judgments.
The
allowed
us
predict
separate,
independently-sampled
scores
with
an
accuracy
up
0.898.
Combining
human
ratings
same
led
only
small
improvements,
indicating
used
similar
information
as
humans
in
task.
Predicting
highly
homogeneous
image
classes
revealed
performance
critically
depends
granularity
training
data.
Our
improve
brain-behavior
correspondence
large-scale
neuroimaging
dataset
and
visualize
candidate
features
use
making
judgments,
thus
highlighting
which
parts
may
carry
behaviorally-relevant
information.
Together,
results
demonstrate
current
networks
sufficient
capturing
broadly-sampled
scores,
offering
pathway
towards
automated
natural
images.
Abstract
A
central
challenge
in
cognitive
neuroscience
is
understanding
how
the
brain
represents
and
predicts
complex,
multimodal
experiences
naturalistic
settings.
Traditional
neural
encoding
models,
often
based
on
unimodal
or
static
features,
fall
short
capturing
rich,
dynamic
structure
of
real-world
cognition.
Here,
we
address
this
by
introducing
a
video-text
alignment
framework
that
whole-brain
responses
integrating
visual
linguistic
features
across
time.
Using
state-of-the-art
deep
learning
model
(VALOR),
achieve
more
accurate
generalizable
than
(AlexNet,
WordNet)
(CLIP)
baselines.
Beyond
improving
prediction,
our
automatically
maps
cortical
semantic
spaces,
aligning
with
human-annotated
dimensions
without
requiring
manual
labeling.
We
further
uncover
hierarchical
predictive
coding
gradient,
where
different
regions
anticipate
future
events
over
distinct
timescales—an
organization
correlates
individual
abilities.
These
findings
provide
new
evidence
temporal
integration
core
mechanism
function.
Our
results
demonstrate
models
aligned
stimuli
can
reveal
ecologically
valid
mechanisms,
offering
powerful,
scalable
approach
for
investigating
perception,
semantics,
prediction
human
brain.
This
advances
neuroimaging
bridging
computational
modeling
Perceived
similarity
offers
a
window
into
the
mental
representations
underlying
our
ability
to
make
sense
of
visual
world,
yet,
collection
judgments
quickly
becomes
infeasible
for
larger
datasets,
limiting
their
generality.
To
address
this
challenge,
here
we
introduce
computational
approach
that
predicts
perceived
from
neural
network
activations
through
set
49
interpretable
dimensions
learned
on
1.46
million
triplet
odd-one-out
judgments.
The
allowed
us
predict
separate,
independently-sampled
scores
with
an
accuracy
up
0.898.
Combining
human
ratings
same
led
only
small
improvements,
indicating
used
similar
information
as
humans
in
task.
Predicting
highly
homogeneous
image
classes
revealed
performance
critically
depends
granularity
training
data.
Our
improve
brain-behavior
correspondence
large-scale
neuroimaging
dataset
and
visualize
candidate
features
use
making
judgments,
thus
highlighting
which
parts
may
carry
behaviorally-relevant
information.
Together,
results
demonstrate
current
networks
sufficient
capturing
broadly-sampled
scores,
offering
pathway
towards
automated
natural
images.
PLoS Computational Biology,
Год журнала:
2024,
Номер
20(5), С. e1012058 - e1012058
Опубликована: Май 6, 2024
A
challenging
goal
of
neural
coding
is
to
characterize
the
representations
underlying
visual
perception.
To
this
end,
multi-unit
activity
(MUA)
macaque
cortex
was
recorded
in
a
passive
fixation
task
upon
presentation
faces
and
natural
images.
We
analyzed
relationship
between
MUA
latent
state-of-the-art
deep
generative
models,
including
conventional
feature-disentangled
adversarial
networks
(GANs)
(i.e.,
z
-
w
-latents
StyleGAN,
respectively)
language-contrastive
diffusion
CLIP-latents
Stable
Diffusion).
mass
univariate
encoding
analysis
showed
that
outperform
both
CLIP
explaining
responses.
Further,
-latent
features
were
found
be
positioned
at
higher
end
complexity
gradient
which
indicates
they
capture
information
relevant
high-level
activity.
Subsequently,
multivariate
decoding
resulted
spatiotemporal
reconstructions
Taken
together,
our
results
not
only
highlight
important
role
feature-disentanglement
shaping
perception
but
also
serve
as
an
benchmark
for
future
coding.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июль 2, 2024
ABSTRACT
Perceived
similarity
offers
a
window
into
the
mental
representations
underlying
our
ability
to
make
sense
of
visual
world,
yet,
collection
judgments
quickly
becomes
infeasible
for
larger
datasets,
limiting
their
generality.
To
address
this
challenge,
here
we
introduce
computational
approach
that
predicts
perceived
from
neural
network
activations
through
set
49
interpretable
dimensions
learned
on
1.46
million
triplet
odd-one-out
judgments.
The
allowed
us
predict
separate,
independently-sampled
scores
with
an
accuracy
up
0.898.
Combining
human
ratings
same
led
only
small
improvements,
indicating
used
similar
information
as
humans
in
task.
Predicting
highly
homogeneous
image
classes
revealed
performance
critically
depends
granularity
training
data.
Our
improve
brain-behavior
correspondence
large-scale
neuroimaging
dataset
and
visualize
candidate
features
use
making
judgments,
thus
highlighting
which
parts
may
carry
behaviorally-relevant
information.
Together,
results
demonstrate
current
networks
sufficient
capturing
broadly-sampled
scores,
offering
pathway
towards
automated
natural
images.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Апрель 2, 2024
Summary
The
functional
role
of
visual
activations
human
pre-frontal
cortex
remains
a
deeply
debated
question.
Its
significance
extends
to
fundamental
issues
localization
and
global
theories
consciousness.
Here
we
addressed
this
question
by
comparing,
dynamically,
the
potential
parallels
between
relational
structure
prefrontal
textual-trained
deep
neural
networks
(DNNs).
frontal
structures
were
revealed
in
intra-cranial
recordings
patients,
conducted
for
clinical
purposes,
while
patients
viewed
familiar
images
faces
places.
Our
results
reveal
that
were,
surprisingly,
predicted
text
not
DNNs.
Importantly,
temporal
dynamics
these
correlations
showed
striking
differences,
with
rapid
decline
over
time
component,
but
persistent
including
significant
image
offset
response
component.
point
dynamic
text-related
function
responses
brain.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 6, 2024
Abstract
In
recent
studies,
researchers
have
used
large
language
models
(LLMs)
to
explore
semantic
representations
in
the
brain;
however,
they
typically
assessed
different
levels
of
content,
such
as
speech,
objects,
and
stories,
separately.
this
study,
we
recorded
brain
activity
using
functional
magnetic
resonance
imaging
(fMRI)
while
participants
viewed
8.3
hours
dramas
movies.
We
annotated
these
stimuli
at
multiple
levels,
which
enabled
us
extract
latent
LLMs
for
content.
Our
findings
demonstrate
that
predict
human
more
accurately
than
traditional
models,
particularly
complex
background
stories.
Furthermore,
identify
distinct
regions
associated
with
representations,
including
multi-modal
vision-semantic
highlights
importance
modeling
multi-level
simultaneously.
will
make
our
fMRI
dataset
publicly
available
facilitate
further
research
on
aligning
function.
Please
check
out
webpage
https://sites.google.com/view/llm-and-brain/
.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Май 10, 2024
Abstract
Each
of
us
perceives
the
world
differently.
What
may
underlie
such
individual
differences
in
perception?
Here,
we
characterize
lateral
prefrontal
cortex’s
role
vision
using
computational
models,
with
a
specific
focus
on
differences.
Using
7T
fMRI
dataset,
found
that
encoding
models
relating
visual
features
extracted
from
deep
neural
network
to
brain
responses
natural
images
robustly
predict
patches
LPFC.
We
then
explored
representational
structures
and
screened
for
high
predicted
observed
more
substantial
coding
schemes
LPFC
compared
regions.
Computational
modeling
suggests
amplified
could
result
random
projection
between
sensory
high-level
regions
underlying
flexible
working
memory.
Our
study
demonstrates
under-appreciated
processing
idiosyncrasies
how
different
individuals
experience
world.