bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 28, 2024
Abstract
Real
world
choices
often
involve
balancing
decisions
that
are
optimized
for
the
short-vs.
long-term.
Here,
we
reason
apparently
sub-optimal
single
trial
in
macaques
may
fact
reflect
long-term,
strategic
planning.
We
demonstrate
freely
navigating
VR
sequentially
presented
targets
will
strategically
abort
offers,
forgoing
more
immediate
rewards
on
individual
trials
to
maximize
session-long
returns.
This
behavior
is
highly
specific
individual,
demonstrating
about
their
own
long-run
performance.
Reinforcement-learning
(RL)
models
suggest
this
algorithmically
supported
by
modular
actor-critic
networks
with
a
policy
module
not
only
optimizing
long-term
value
functions,
but
also
informed
of
state-action
values
allowing
rapid
optimization.
The
artificial
suggests
changes
matched
offer
ought
be
evident
as
soon
offers
made,
even
if
aborting
occurs
much
later.
confirm
prediction
units
and
population
dynamics
macaque
dorsolateral
prefrontal
cortex
(dlPFC),
parietal
area
7a
or
dorsomedial
superior
temporal
(MSTd),
upcoming
reward-maximizing
upon
presentation.
These
results
cast
dlPFC
specialized
module,
stand
contrast
recent
work
distributed
recurrent
nature
belief-networks.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 22, 2024
Abstract
How
external/internal
‘state’
is
represented
in
the
brain
crucial,
since
appropriate
representation
enables
goal-directed
behavior.
Recent
studies
suggest
that
state
and
value
can
be
simultaneously
learnt
through
reinforcement
learning
(RL)
using
reward-prediction-error
recurrent-neural-network
(RNN)
its
downstream
weights.
However,
how
such
neurally
implemented
remains
unclear
because
training
of
RNN
‘backpropagation’
method
requires
weights,
which
are
biologically
unavailable
at
upstream
RNN.
Here
we
show
random
feedback
instead
weights
still
works
‘feedback
alignment’,
was
originally
demonstrated
for
supervised
learning.
We
further
if
constrained
to
non-negative,
occurs
without
alignment
non-negative
constraint
ensures
loose
alignment.
These
results
neural
mechanisms
RL
representation/value
power
biological
constraints.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 16, 2024
ABSTRACT
Nucleus
accumbens
dopamine
signaling
is
an
important
neural
substrate
for
decision-making.
Dominant
theories
generally
discretize
and
homogenize
decision-making,
when
it
in
fact
a
continuous
process,
with
evaluation
re-evaluation
components
that
extend
beyond
simple
outcome
prediction
into
consideration
of
past
future
value.
Extensive
work
has
examined
mesolimbic
the
context
reward
error,
but
major
gaps
persist
our
understanding
how
regulates
volitional
self-guided
Moreover,
there
little
individual
differences
value
processing
may
shape
Here,
using
economic
foraging
task
mice,
we
found
dynamics
nucleus
core
reflected
decision
confidence
during
decisions,
as
well
both
change-of-mind.
Optogenetic
manipulations
release
selectively
altered
decisions
mice
whose
behavior
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 28, 2024
Abstract
Real
world
choices
often
involve
balancing
decisions
that
are
optimized
for
the
short-vs.
long-term.
Here,
we
reason
apparently
sub-optimal
single
trial
in
macaques
may
fact
reflect
long-term,
strategic
planning.
We
demonstrate
freely
navigating
VR
sequentially
presented
targets
will
strategically
abort
offers,
forgoing
more
immediate
rewards
on
individual
trials
to
maximize
session-long
returns.
This
behavior
is
highly
specific
individual,
demonstrating
about
their
own
long-run
performance.
Reinforcement-learning
(RL)
models
suggest
this
algorithmically
supported
by
modular
actor-critic
networks
with
a
policy
module
not
only
optimizing
long-term
value
functions,
but
also
informed
of
state-action
values
allowing
rapid
optimization.
The
artificial
suggests
changes
matched
offer
ought
be
evident
as
soon
offers
made,
even
if
aborting
occurs
much
later.
confirm
prediction
units
and
population
dynamics
macaque
dorsolateral
prefrontal
cortex
(dlPFC),
parietal
area
7a
or
dorsomedial
superior
temporal
(MSTd),
upcoming
reward-maximizing
upon
presentation.
These
results
cast
dlPFC
specialized
module,
stand
contrast
recent
work
distributed
recurrent
nature
belief-networks.