Learning of state representation in recurrent network: the power of random feedback and biological constraints
Takayuki Tsurumi,
No information about this author
Ayaka Kato,
No information about this author
Arvind Kumar
No information about this author
et al.
Published: Jan. 14, 2025
How
external/internal
‘state’
is
represented
in
the
brain
crucial,
since
appropriate
representation
enables
goal-directed
behavior.
Recent
studies
suggest
that
state
and
value
can
be
simultaneously
learnt
through
reinforcement
learning
(RL)
using
reward-prediction-error
recurrent-neural-network
(RNN)
its
downstream
weights.
However,
how
such
neurally
implemented
remains
unclear
because
training
of
RNN
‘backpropagation’
method
requires
weights,
which
are
biologically
unavailable
at
upstream
RNN.
Here
we
show
random
feedback
instead
weights
still
works
‘feedback
alignment’,
was
originally
demonstrated
for
supervised
learning.
We
further
if
constrained
to
non-negative,
occurs
without
alignment
non-negative
constraint
ensures
loose
alignment.
These
results
neural
mechanisms
RL
representation/value
power
biological
constraints.
Language: Английский
Learning of state representation in recurrent network: the power of random feedback and biological constraints
Takayuki Tsurumi,
No information about this author
Ayaka Kato,
No information about this author
Arvind Kumar
No information about this author
et al.
Published: Jan. 14, 2025
How
external/internal
‘state’
is
represented
in
the
brain
crucial,
since
appropriate
representation
enables
goal-directed
behavior.
Recent
studies
suggest
that
state
and
value
can
be
simultaneously
learnt
through
reinforcement
learning
(RL)
using
reward-prediction-error
recurrent-neural-network
(RNN)
its
downstream
weights.
However,
how
such
neurally
implemented
remains
unclear
because
training
of
RNN
‘backpropagation’
method
requires
weights,
which
are
biologically
unavailable
at
upstream
RNN.
Here
we
show
random
feedback
instead
weights
still
works
‘feedback
alignment’,
was
originally
demonstrated
for
supervised
learning.
We
further
if
constrained
to
non-negative,
occurs
without
alignment
non-negative
constraint
ensures
loose
alignment.
These
results
neural
mechanisms
RL
representation/value
power
biological
constraints.
Language: Английский
Reinforcement learning of state representation and value: the power of random feedback and biological constraints
Takayuki Tsurumi,
No information about this author
Ayaka Kato,
No information about this author
Arvind Kumar
No information about this author
et al.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 22, 2024
Abstract
How
external/internal
‘state’
is
represented
in
the
brain
crucial,
since
appropriate
representation
enables
goal-directed
behavior.
Recent
studies
suggest
that
state
and
value
can
be
simultaneously
learnt
through
reinforcement
learning
(RL)
using
reward-prediction-error
recurrent-neural-network
(RNN)
its
downstream
weights.
However,
how
such
neurally
implemented
remains
unclear
because
training
of
RNN
‘backpropagation’
method
requires
weights,
which
are
biologically
unavailable
at
upstream
RNN.
Here
we
show
random
feedback
instead
weights
still
works
‘feedback
alignment’,
was
originally
demonstrated
for
supervised
learning.
We
further
if
constrained
to
non-negative,
occurs
without
alignment
non-negative
constraint
ensures
loose
alignment.
These
results
neural
mechanisms
RL
representation/value
power
biological
constraints.
Language: Английский
Dopaminergic responses to identity prediction errors depend differently on the orbitofrontal cortex and hippocampus
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 17, 2024
Summary
Adaptive
behavior
depends
on
the
ability
to
predict
specific
events,
particularly
those
related
rewards.
Armed
with
such
associative
information,
we
can
infer
current
value
of
predicted
rewards
based
changing
circumstances
and
desires.
To
support
this
ability,
neural
systems
must
represent
both
identity
rewards,
these
representations
be
updated
when
they
change.
Here
tested
whether
prediction
error
signaling
dopamine
neurons
two
areas
known
specifics
rewarding
HC
OFC.
We
monitored
spiking
activity
in
rat
VTA
during
changes
number
or
flavor
expected
designed
induce
errors
reward
identity,
respectively.
In
control
animals,
registered
types,
transiently
increasing
firing
additional
drops
flavor.
These
canonical
signatures
were
significantly
disrupted
rats
ipsilateral
neurotoxic
lesions
either
Specifically,
caused
a
failure
register
type
error,
whereas
OFC
persistent
much
more
subtle
effects
errors.
results
demonstrate
that
contribute
distinct
types
information
computation
signaled
by
dopaminergic
neurons.
Language: Английский