Dorsolateral prefrontal cortex drives strategic aborting by optimizing long-run policy extraction DOI Creative Commons
Jean‐Paul Noel, Ruiyi Zhang, Xaq Pitkow

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 28, 2024

Abstract Real world choices often involve balancing decisions that are optimized for the short-vs. long-term. Here, we reason apparently sub-optimal single trial in macaques may fact reflect long-term, strategic planning. We demonstrate freely navigating VR sequentially presented targets will strategically abort offers, forgoing more immediate rewards on individual trials to maximize session-long returns. This behavior is highly specific individual, demonstrating about their own long-run performance. Reinforcement-learning (RL) models suggest this algorithmically supported by modular actor-critic networks with a policy module not only optimizing long-term value functions, but also informed of state-action values allowing rapid optimization. The artificial suggests changes matched offer ought be evident as soon offers made, even if aborting occurs much later. confirm prediction units and population dynamics macaque dorsolateral prefrontal cortex (dlPFC), parietal area 7a or dorsomedial superior temporal (MSTd), upcoming reward-maximizing upon presentation. These results cast dlPFC specialized module, stand contrast recent work distributed recurrent nature belief-networks.

Language: Английский

The role of prospective contingency in the control of behavior and dopamine signals during associative learning DOI Creative Commons
Lechen Qian, Mark Burrell, Jay A. Hennig

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 6, 2024

Associative learning depends on contingency, the degree to which a stimulus predicts an outcome. Despite its importance, neural mechanisms linking contingency behavior remain elusive. Here we examined dopamine activity in ventral striatum - signal implicated associative Pavlovian degradation task mice. We show that both anticipatory licking and responses conditioned decreased when additional rewards were delivered uncued, but remained unchanged if cued. These results conflict with contingency-based accounts using traditional definition of or novel causal model (ANCCR), can be explained by temporal difference (TD) models equipped appropriate inter-trial-interval (ITI) state representation. Recurrent networks trained within TD framework develop representations like our best 'handcrafted' model. Our findings suggest error measure describes dopaminergic activity.

Language: Английский

Citations

8

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time DOI Creative Commons
Ian Cone, Claudia Clopath, Harel Z. Shouval

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: July 12, 2024

Abstract The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) learning, whereby certain units signal reward prediction errors (RPE). TD algorithm has been traditionally mapped onto dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, predictions are inconsistent with experimental results, and previous implementations have made unscalable assumptions regarding stimulus-specific fixed bases. We propose an alternate describe signaling brain, FLEX ( F lexibly L earned E rrors x pected Reward). In FLEX, release similar, but not identical RPE, leading that contrast those TD. While itself a general framework, we specific, biophysically plausible implementation, results which consistent preponderance both existing reanalyzed data.

Language: Английский

Citations

7

Inductive biases of neural network modularity in spatial navigation DOI Creative Commons
Ruiyi Zhang, Xaq Pitkow, Dora E. Angelaki

et al.

Science Advances, Journal Year: 2024, Volume and Issue: 10(29)

Published: July 19, 2024

The brain may have evolved a modular architecture for daily tasks, with circuits featuring functionally specialized modules that match the task structure. We hypothesize this enables better learning and generalization than architectures less modules. To test this, we trained reinforcement agents various neural on naturalistic navigation task. found agent, an segregates computations of state representation, value, action into modules, achieved generalization. Its learned representation combines prediction observation, weighted by their relative uncertainty, akin to recursive Bayesian estimation. This agent’s behavior also resembles macaques’ more closely. Our results shed light possible rationale brain’s modularity suggest artificial systems can use insight from neuroscience improve in natural tasks.

Language: Английский

Citations

4

Learning of state representation in recurrent network: the power of random feedback and biological constraints DOI Open Access

Takayuki Tsurumi,

Ayaka Kato, Arvind Kumar

et al.

Published: Jan. 14, 2025

How external/internal ‘state’ is represented in the brain crucial, since appropriate representation enables goal-directed behavior. Recent studies suggest that state and value can be simultaneously learnt through reinforcement learning (RL) using reward-prediction-error recurrent-neural-network (RNN) its downstream weights. However, how such neurally implemented remains unclear because training of RNN ‘backpropagation’ method requires weights, which are biologically unavailable at upstream RNN. Here we show random feedback instead weights still works ‘feedback alignment’, was originally demonstrated for supervised learning. We further if constrained to non-negative, occurs without alignment non-negative constraint ensures loose alignment. These results neural mechanisms RL representation/value power biological constraints.

Language: Английский

Citations

0

Learning of state representation in recurrent network: the power of random feedback and biological constraints DOI Open Access

Takayuki Tsurumi,

Ayaka Kato, Arvind Kumar

et al.

Published: Jan. 14, 2025

How external/internal ‘state’ is represented in the brain crucial, since appropriate representation enables goal-directed behavior. Recent studies suggest that state and value can be simultaneously learnt through reinforcement learning (RL) using reward-prediction-error recurrent-neural-network (RNN) its downstream weights. However, how such neurally implemented remains unclear because training of RNN ‘backpropagation’ method requires weights, which are biologically unavailable at upstream RNN. Here we show random feedback instead weights still works ‘feedback alignment’, was originally demonstrated for supervised learning. We further if constrained to non-negative, occurs without alignment non-negative constraint ensures loose alignment. These results neural mechanisms RL representation/value power biological constraints.

Language: Английский

Citations

0

The devilish details affecting TDRL models in dopamine research DOI
Zhewei Zhang, Kauê Machado Costa, Angela J. Langdon

et al.

Trends in Cognitive Sciences, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 1, 2025

Language: Английский

Citations

0

Prospective contingency explains behavior and dopamine signals during associative learning DOI
Lechen Qian, Mark Burrell, Jay A. Hennig

et al.

Nature Neuroscience, Journal Year: 2025, Volume and Issue: unknown

Published: March 18, 2025

Language: Английский

Citations

0

Hippocampal sequences span experience relative to rewards DOI Creative Commons
Marielena Sosa, Mark Plitt, Lisa M. Giocomo

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Dec. 28, 2023

Hippocampal place cells fire in sequences that span spatial environments and non-spatial modalities, suggesting hippocampal activity can anchor to the most behaviorally salient aspects of experience. As reward is a highly event, we hypothesized rewards. To test this, performed two-photon imaging CA1 neurons as mice navigated virtual with changing hidden locations. When moved, firing fields subpopulation moved same relative position respect reward, constructing sequence reward-relative spanned entire task structure. The density these increased experience additional were recruited population. Conversely, largely separate maintained spatially-based code. These findings thus reveal ensembles flexibly encode multiple reference frames, reflecting structure

Language: Английский

Citations

5

Expectancy-related changes in firing of dopamine neurons depend on hippocampus DOI Open Access
Yuji K. Takahashi, Zhewei Zhang,

Marlian Montesinos-Cartegena

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 21, 2023

The orbitofrontal cortex (OFC) and hippocampus (HC) are both implicated in forming the cognitive or task maps that support flexible behavior. Previously, we used dopamine neurons as a sensor tool to measure functional effects of OFC lesions (Takahashi et al., 2011). We recorded midbrain rats performed an odor-based choice task, which errors prediction reward were induced by manipulating number timing expected rewards across blocks trials. found ipsilateral recording electrodes caused be degraded consistent with loss resolution states, particularly under conditions where hidden information was critical sharpening predictions. Here have repeated this experiment, along computational modeling results, HC lesions. results show also shapes map our however unlike OFC, provides local trial, appears necessary for estimating upper-level states based on is discontinuous separated longer timescales. contrast respective roles mapping add evidence access rich set from distributed regions regarding predictive structure environment, potentially enabling powerful teaching signal complex learning

Language: Английский

Citations

4

Nucleus accumbens dopamine release reflects Bayesian inference during instrumental learning DOI Creative Commons
Albert J. Qü, Lung-Hao Tai, Christopher D. Hall

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 13, 2023

Abstract Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error, difference between observed and predicted reward, suggesting a biological implementation for reinforcement learning. Rigorous tests of this hypothesis require assumptions about how brain maps sensory signals predictions, yet mapping is still poorly understood. In particular, non-trivial when provide ambiguous information hidden state environment. Previous work using classical conditioning tasks suggested that predictions are generated conditional on probabilistic beliefs state, such dopamine implicitly reflects these beliefs. Here we test context an instrumental task (a two-armed bandit), where switches repeatedly. We measured choice behavior recorded dLight reflecting core. Model comparison among wide set cognitive models based behavioral data favored used Bayesian updating These same also quantitatively matched measurements better than non-Bayesian alternatives. conclude belief computation contributes performance mice reflected mesolimbic signaling.

Language: Английский

Citations

4