Recent
work
has
demonstrated
impressive
parallels
between
human
visual
representations
and
those
found
in
deep
neural
networks.
A
new
study
by
Wang
et
al.
(2023)
highlights
what
factors
may
determine
this
similarity.
(commentary)
Object
recognition
is
an
important
human
ability
that
relies
on
distinguishing
between
similar
objects,
for
example,
deciding
which
kitchen
utensil(s)
to
use
at
different
stages
of
meal
preparation.
Recent
work
describes
the
fine-grained
organization
knowledge
about
manipulable
objects
via
study
constituent
dimensions
are
most
relevant
behavior,
vision,
manipulation,
and
function-based
object
properties.
A
logical
extension
this
concerns
whether
or
not
these
uniquely
human,
can
be
approximated
by
deep
learning.
Here,
we
show
behavioral
well-predicted
a
state-of-the-art
multimodal
network
trained
large
diverse
set
image-text
pairs
-
CLIP-ViT
part,
also
generate
good
predictions
behavior
previously
unseen
objects.
Moreover,
model
vastly
outperforms
comparison
networks
pre-trained
with
smaller,
image-only
training
datasets.
These
results
demonstrate
impressive
capacity
approximate
knowledge.
We
discuss
possible
sources
benefit
relative
other
tested
models
(e.g.
pre-training
vs.
image
only
pre-training,
dataset
size,
architecture).
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Май 14, 2024
ABSTRACT
What
underlies
the
emergence
of
cortex-aligned
representations
in
deep
neural
network
models
vision?
The
success
widely
varied
architectures
has
motivated
prevailing
hypothesis
that
large-scale
pre-training
is
primary
factor
underlying
similarities
between
brains
and
networks.
Here,
we
challenge
this
view
by
revealing
role
architectural
inductive
biases
with
minimal
training.
We
examined
networks
but
no
quantified
their
ability
to
predict
image
visual
cortices
both
monkeys
humans.
found
emerge
convolutional
combine
two
key
manipulations
dimensionality:
compression
spatial
domain
expansion
feature
domain.
further
show
are
critical
for
obtaining
performance
gains
from
expansion—dimensionality
were
relatively
ineffective
other
targeted
lesions.
Our
findings
suggest
constraints
sufficiently
close
biological
vision
allow
many
aspects
cortical
representation
even
before
synaptic
connections
have
been
tuned
through
experience.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Авг. 24, 2023
Abstract
Object
vision
is
commonly
thought
to
involve
a
hierarchy
of
brain
regions
processing
increasingly
complex
image
features,
with
high-level
visual
cortex
supporting
object
recognition
and
categorization.
However,
supports
diverse
behavioral
goals,
suggesting
basic
limitations
this
category-centric
framework.
To
address
these
limitations,
we
mapped
series
dimensions
derived
from
large-scale
analysis
human
similarity
judgments
directly
onto
the
brain.
Our
results
reveal
broadly
distributed
representations
behaviorally-relevant
information,
demonstrating
selectivity
wide
variety
novel
while
capturing
known
selectivities
for
features
categories.
Behavior-derived
were
superior
categories
at
predicting
responses,
yielding
mixed
in
much
sparse
category-selective
clusters.
This
framework
reconciles
seemingly
disparate
findings
regarding
regional
specialization,
explaining
category
as
special
case
response
profiles
among
representational
dimensions,
more
expansive
view
on
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Апрель 28, 2023
Abstract
A
challenging
goal
of
neural
coding
is
to
characterize
the
representations
underlying
visual
perception.
To
this
end,
multi-unit
activity
(MUA)
macaque
cortex
was
recorded
in
a
passive
fixation
task
upon
presentation
faces
and
natural
images.
We
analyzed
relationship
between
MUA
latent
state-of-the-art
deep
generative
models,
including
conventional
feature-disentangled
adversarial
networks
(GANs)
(i.e.,
z
-
w
-latents
StyleGAN,
respectively)
language-contrastive
diffusion
CLIP-latents
Stable
Diffusion).
mass
univariate
encoding
analysis
showed
that
outperform
both
CLIP
explaining
responses.
Further,
-latent
features
were
found
be
positioned
at
higher
end
complexity
gradient
which
indicates
they
capture
information
relevant
high-level
activity.
Subsequently,
multivariate
decoding
resulted
spatiotemporal
reconstructions
Taken
together,
our
results
not
only
highlight
important
role
feature-disentanglement
shaping
perception
but
also
serve
as
an
benchmark
for
future
coding.
Author
summary
Neural
seeks
understand
how
brain
represents
world
by
modeling
stimuli
internal
thereof.
This
field
focuses
on
predicting
responses
(neural
encoding)
deciphering
about
from
decoding).
Recent
advances
(GANs;
type
machine
learning
model)
have
enabled
creation
photorealistic
Like
brain,
GANs
images
create,
referred
“latents”.
More
recently,
new
“
-latent”
has
been
developed
more
effectively
separates
different
image
(e.g.,
color;
shape;
texture).
In
study,
we
presented
such
GAN-generated
pictures
with
cortical
implants
accurate
predictors
then
used
these
reconstruct
perceived
high
fidelity.
The
remarkable
similarities
predictions
actual
targets
indicate
alignment
represent
same
stimulus,
even
though
never
optimized
data.
implies
general
principle
shared
phenomena,
emphasizing
importance
feature
disentanglement
deeper
areas.
Recent
work
has
demonstrated
impressive
parallels
between
human
visual
representations
and
those
found
in
deep
neural
networks.
A
new
study
by
Wang
et
al.
(2023)
highlights
what
factors
may
determine
this
similarity.
(commentary)