Abstract

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

Download full-text PDF

Link Source
Download Source 1https://www.nature.com/articles/s41586-019-1924-6?error=cookies_not_supported&code=8aee4930-4aa5-4e1d-8e09-150ec5c3759eWeb Search
Download Source 2http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7476215PMC
Download Source 3http://dx.doi.org/10.1038/s41586-019-1924-6DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
16
dopamine-based reinforcement
8
distributional reinforcement
8
learning
5
distributional code
4
code dopamine-based
4
reinforcement
4
learning introduction
4
introduction reward
4
reward prediction
4