|
|
||||||||
The Journal of Neurophysiology Vol. 80 No. 1 July 1998, pp. 1-27
Copyright ©1998 by the American Physiological Society
INVITED REVIEW
Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland
| |
ABSTRACT |
|---|
|
|
|---|
Schultz, Wolfram. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1-27, 1998. The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest that midbrain dopamine systems are involved in processing reward information and learning approach behavior. Most dopamine neurons show phasic activations after primary liquid and food rewards and conditioned, reward-predicting visual and auditory stimuli. They show biphasic, activation-depression responses after stimuli that resemble reward-predicting stimuli or are novel or particularly salient. However, only few phasic activations follow aversive stimuli. Thus dopamine neurons label environmental stimuli with appetitive value, predict and detect rewards and signal alerting and motivating events. By failing to discriminate between different rewards, dopamine neurons appear to emit an alerting message about the surprising presence or absence of rewards. All responses to rewards and reward-predicting stimuli depend on event predictability. Dopamine neurons are activated by rewarding events that are better than predicted, remain uninfluenced by events that are as good as predicted, and are depressed by events that are worse than predicted. By signaling rewards according to a prediction error, dopamine responses have the formal characteristics of a teaching signal postulated by reinforcement learning theories. Dopamine responses transfer during learning from primary rewards to reward-predicting stimuli. This may contribute to neuronal mechanisms underlying the retrograde action of rewards, one of the main puzzles in reinforcement learning. The impulse response releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons. This signal may improve approach behavior by providing advance reward information before the behavior occurs, and may contribute to learning by modifying synaptic transmission. The dopamine reward signal is supplemented by activity in neurons in striatum, frontal cortex, and amygdala, which process specific reward information but do not emit a global reward prediction error signal. A cooperation between the different reward signals may assure the use of specific rewards for selectively reinforcing behaviors. Among the other projection systems, noradrenaline neurons predominantly serve attentional mechanisms and nucleus basalis neurons code rewards heterogeneously. Cerebellar climbing fibers signal errors in motor performance or errors in the prediction of aversive events to cerebellar Purkinje cells. Most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal but may reflect the absence of a general enabling function of tonic levels of extracellular dopamine. Thus dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.
When multicellular organisms arose through the evolution of self-reproducing molecules, they developed endogenous, autoregulatory mechanisms assuring that their needs for welfare and survival were met. Subjects engage in various forms of approach behavior to obtain resources for maintaining homeostatic balance and to reproduce. One class of resources is called rewards, which elicit and reinforce approach behavior. The functions of rewards were developed further during the evolution of higher mammals to support more sophisticated forms of individual and social behavior. Thus biological and cognitive needs define the nature of rewards, and the availability of rewards determines some of the basic parameters of the subject's life conditions.
Functions of rewards
Certain objects and events in the environment are of particular motivational significance by their effects on welfare, survival, and reproduction. According to the behavioral reactions elicited, the motivational value of environmental objects can be appetitive (rewarding) or aversive (punishing). (Note that "appetitive" is used synonymous for "rewarding" but not for "preparatory.") Appetitive objects have three separable basic functions. In their first function, rewards elicit approach and consummatory behavior. This is due to the objects being labeled with appetitive value through innate mechanisms or, in most cases, learning. In their second function, rewards increase the frequency and intensity of behavior leading to such objects (learning), and they maintain learned behavior by preventing extinction. Rewards serve as positive reinforcers of behavior in classical and instrumental conditioning procedures. In general incentive learning, environmental stimuli acquire appetitive value following classically conditioned stimulus-reward associations and induce approach behavior (Bindra 1968 Functions of predictions
Predictions provide advance information about future stimuli, events, or system states. They provide the basic advantage of gaining time for behavioral reactions. Some forms of predictions attribute motivational values to environmental stimuli by association with particular outcomes, thus identifying objects of vital importance and discriminating them from less valuable objects. Other forms code physical parameters of predicted objects, such as spatial position, velocity, and weight. Predictions allow an organism to evaluate future events before they actually occur, permit the selection and preparation of behavioral reactions, and increase the likelihood of approaching or avoiding objects labeled with motivational values. For example, repeated movements of objects in the same sequence allow one to predict forthcoming positions and already prepare the next movement while pursuing the present object. This reduces reaction time between individual targets, speeds up overall performance, and results in an earlier outcome. Predictive eye movements ameliorate behavioral performance through advance focusing (Flowers and Downing 1978 Behavioral conditioning
Associative appetitive learning involves the repeated and contingent pairing between an arbitrary stimulus and a primary reward (Fig. 1). This results in increasingly frequent approach behavior induced by the now "conditioned" stimulus, which partly resembles the approach behavior elicited by the primary reward and also is influenced by the nature of the conditioned stimulus. It appears that the conditioned stimulus serves as a predictor of reward and, often on the basis of an appropriate drive, sets an internal motivational state leading to the behavioral reaction. The similarity of approach reactions suggests that some of the general, preparatory components of the behavioral response are transferred from the primary reward to the earliest conditioned, reward-predicting stimulus. Thus the conditioned stimulus acts partly as a motivational substitute for the primary stimulus, probably through Pavlovian learning (Dickinson 1980
Cell bodies of dopamine neurons are located mostly in midbrain groups A8 (dorsal to lateral substantia nigra), A9 (pars compacta of substantia nigra), and A10 (ventral tegmental area medial to substantia nigra). These neurons release the neurotransmitter dopamine with nerve impulses from axonal varicosities in the striatum (caudate nucleus, putamen, and ventral striatum including nucleus accumbens) and frontal cortex, to name the most important sites. We record the impulse activity from cell bodies of single dopamine neurons during periods of 20-60 min with moveable microelectrodes from extracellular positions while monkeys learn or perform behavioral tasks. The characteristic polyphasic, relatively long impulses discharged at low frequencies make dopamine neurons easily distinguishable from other midbrain neurons. The employed behavioral paradigms include reaction time tasks, direct and delayed GO-NO GO tasks, spatial delayed response and alternation tasks, air puff and saline active avoidance tasks, operant and classically conditioned visual discrimination tasks, self-initiated movements, and unpredicted delivery of reward in the absence of any formal task. About 100-250 dopamine neurons are studied in each behavioral situation, and fractions of task-modulated neurons refer to these samples.
Activation by primary appetitive stimuli
About 75% of dopamine neurons show phasic activations when animals touch a small morsel of hidden food during exploratory movements in the absence of other phasic stimuli, without being activated by the movement itself (Romo and Schultz 1990
Unpredictability of reward
An important feature of dopamine responses is their dependency on event unpredictability. The activations following rewards do not occur when food and liquid rewards are preceded by phasic stimuli that have been conditioned to predict such rewards (Fig. 2, middle) (Ljungberg et al. 1992 Depression by omission of predicted reward
Dopamine neurons are depressed exactly at the time of the usual occurrence of reward when a fully predicted reward fails to occur, even in the absence of an immediately preceding stimulus (Fig. 2, bottom). This is observed when animals fail to obtain reward because of erroneous behavior, when liquid flow is stopped by the experimenter despite correct behavior, or when a valve opens audibly without delivering liquid (Hollerman and Schultz 1996 Activation by conditioned, reward-predicting stimuli
About 55-70% of dopamine neurons are activated by conditioned visual and auditory stimuli in the various classically or instrumentally conditioned tasks described earlier (Fig. 2, middle and bottom) (Hollerman and Schultz 1996 Transfer of activation
During the course of learning, dopamine neurons become gradually activated by conditioned, reward-predicting stimuli and progressively lose their responses to primary food or liquid rewards that become predicted (Hollerman and Schultz 1996
Unpredictability of conditioned stimuli
The activations after conditioned, reward-predicting stimuli do not occur when these stimuli themselves are preceded at a fixed interval by phasic conditioned stimuli in fully established behavioral situations. Thus with serial conditioned stimuli, dopamine neurons are activated by the earliest reward-predicting stimulus, whereas all stimuli and rewards following at predictable moments afterwards are ineffective (Fig. 3) (Schultz et al. 1993 Depression by omission of predicted conditioned stimuli
Preliminary data from a previous experiment (Schultz et al. 1993 Activation-depression with response generalization
Dopamine neurons also respond to stimuli that do not predict rewards but closely resemble reward-predicting stimuli occurring in the same context. These responses consist mostly of an activation followed by an immediate depression but may occasionally consist of pure activation or pure depression. The activations are smaller and less frequent than those following reward-predicting stimuli, and the depressions are observed in 30-60% of neurons. Dopamine neurons respond to visual stimuli that are not followed by reward but closely resemble reward-predicting stimuli, despite correct behavioral discrimination (Schultz and Romo 1990 Novelty responses
Novel stimuli elicit activations in dopamine neurons that often are followed by depressions and persist as long as behavioral orienting reactions occur (e.g., ocular saccades). Activations subside together with orienting reactions after several stimulus repetitions, depending on the physical impact of stimuli. Whereas small light-emitting diodes hardly elicit novelty responses, light flashes and the rapid visual and auditory opening of a small box elicit activations that decay gradually to baseline during <100 trials (Ljungberg et al. 1992
Homogeneous character of responses
The experiments performed so far have revealed that the majority of neurons in midbrain dopamine cell groups A8, A9, and A10 show very similar activations and depressions in a given behavioral situation, whereas the remaining dopamine neurons do not respond at all. There is a tendency for higher fractions of neurons responding in more medial regions of the midbrain, such as the ventral tegmental area and medial substantia nigra, as compared with more lateral regions, which occasionally reach statistical significance (Schultz 1986 Summary 1: adaptive responses during learning episodes
The characteristics of dopamine responses to reward-related stimuli are best illustrated in learning episodes during that rewards are particularly important for acquiring behavioral responses. The dopamine reward signal undergoes systematic changes during the progress of learning and occurs to the earliest phasic reward-related stimulus, this being either a primary reward or a reward-predicting stimulus (Ljungberg et al. 1992 Summary 2: effective stimuli for dopamine neurons
Dopamine responses are elicited by three categories of stimuli. The first category comprises primary rewards and stimuli that have become valid reward predictors through repeated and contingent pairing with rewards. These stimuli form a common class of explicit reward-predicting stimuli, as primary rewards serve as predictors of vegetative rewarding effects. Effective stimuli apparently have an alerting component, as only stimuli with a clear onset are effective. Dopamine neurons show pure activations following explicit reward-predicting stimuli and are depressed when a predicted but omitted reward fails to occur (Fig. 5, top).
Summary 3: the dopamine reward prediction error signal
The dopamine responses to explicit reward-related events can be best conceptualized and understood in terms of formal theories of learning. Dopamine neurons report rewards relative to their prediction rather than signaling primary rewards unconditionally (Fig. 2). The dopamine response is positive (activation) when primary rewards occur without being predicted. The response is nil when rewards occur as predicted. The response is negative (depression) when predicted rewards are omitted. Thus dopamine neurons report primary rewards according to the difference between the occurrence and the prediction of reward, which can be termed an error in the prediction of reward (Schultz et al. 1995b
![]()
INTRODUCTION
Abstract
Introduction
References
; Di Chiara 1995
; Fibiger and Phillips 1986
; Robbins and Everitt 1992
; Robinson and Berridge 1993
; Wise 1996
; Wise and Hoffman 1992
; Wise et al. 1978
).
![]()
REWARDS AND PREDICTIONS
). In instrumental conditioning, rewards "reinforce" behaviors by strengthening associations between stimuli and behavioral responses (Law of Effect: Thorndike 1911
). This is the essence of "coming back for more" and is related to the common notion of rewards being obtained for having done something well. In an instrumental form of incentive learning, rewards are "incentives" and serve as goals of behavior following associations between behavioral responses and outcomes (Dickinson and Balleine 1994
). In their third function, rewards induce subjective feelings of pleasure (hedonia) and positive emotional states. Aversive stimuli function in opposite directions. They induce withdrawal responses and act as negative reinforcers by increasing and maintaining avoidance behavior on repeated presentation, thereby reducing the impact of damaging events. Furthermore they induce internal emotional states of anger, fear, and panic.
).
). For example, the "fly-by-wire" technique in modern aviation computes predictable forthcoming states of airplanes. Decisions for flying maneuvers take this information into account and help to avoid excessive strain on the mechanical components of the plane, thus reducing weight and increasing the range of operation.
).

View larger version (17K):
[in a new window]
FIG. 1.
Processing of appetitive stimuli during learning. An arbitrary stimulus becomes associated with a primary food or liquid reward through repeated, contingent pairing. This conditioned, reward-predicting stimulus induces an internal motivational state by evoking an expectation of the reward, often on the basis of a corresponding hunger or thirst drive, and elicits the behavioral reaction. This scheme replicates basic notions of incentive motivation theory developed by Bindra (1968)
and Bolles (1972)
. It applies to classical conditioning, where reward is automatically delivered after the conditioned stimulus, and to instrumental (operant) conditioning, where reward delivery requires a reaction by the subject to the conditioned stimulus. This scheme applies also to aversive conditioning which is not further elaborated for reasons of brevity.
), threonine (Hrupka et al. 1997
; Wang et al. 1996
), or methionine (Delaney and Gelperin 1986
). A few primary rewards may be determined by innate instincts and support initial approach behavior and ingestion in early life, whereas the majority of rewards would be learned during the subsequent life experience of the subject. The physical appearance of rewards then could be used for predicting the much slower vegetative effects. This would dramatically accelerate the detection of rewards and constitute a major advantage for survival. Learning of rewards also allows subjects to use a much larger variety of nutrients as effective rewards and thus increase their chance for survival in zones of scarce resources.
![]()
ADAPTIVE RESPONSES TO APPETITIVE STIMULI
; Schultz and Romo 1990
; Schultz et al. 1983
) or with mnemonic or spatial components of delayed response tasks (Schultz et al. 1993
). By contrast, it was found that dopamine neurons were activated in a very distinctive manner by the rewarding characteristics of a wide range of somatosensory, visual, and auditory stimuli.
). The remaining dopamine neurons do not respond to any of the tested environmental stimuli. Dopamine neurons also are activated by a drop of liquid delivered at the mouth outside of any behavioral task or while learning such different paradigms as visual or auditory reaction time tasks, spatial delayed response or alternation, and visual discrimination, often in the same animal (Fig. 2 top) (Hollerman and Schultz 1996
; Ljungberg et al. 1991
, 1992
; Mirenowicz and Schultz 1994
; Schultz et al. 1993
). The reward responses occur independently of a learning context. Thus dopamine neurons do not appear to discriminate between different food objects and liquid rewards. However, their responses distinguish rewards from nonreward objects (Romo and Schultz 1990
). Only 14% of dopamine neurons show the phasic activations when primary aversive stimuli are presented, such as an air puff to the hand or hypertonic saline to the mouth, and most of the activated neurons respond also to rewards (Mirenowicz and Schultz 1996
). Although being nonnoxious, these stimuli are aversive in that they disrupt behavior and induce active avoidance reactions. However, dopamine neurons are not entirely insensitive to aversive stimuli, as shown by slow depressions or occasional slow activations after pain pinch stimuli in anesthetized monkeys (Schultz and Romo 1987
) and by increased striatal dopamine release after electric shock and tail pinch in awake rats (Abercrombie et al. 1989
; Doherty and Gratton 1992
; Louilot et al. 1986
; Young et al. 1993
). This suggests that the phasic responses of dopamine neurons preferentially report environmental stimuli with primary appetitive value, whereas aversive events may be signaled with a considerably slower time course.

View larger version (24K):
[in a new window]
FIG. 2.
Dopamine neurons report rewards according to an error in reward prediction. Top: drop of liquid occurs although no reward is predicted at this time. Occurrence of reward thus constitutes a positive error in the prediction of reward. Dopamine neuron is activated by the unpredicted occurrence of the liquid. Middle: conditioned stimulus predicts a reward, and the reward occurs according to the prediction, hence no error in the prediction of reward. Dopamine neuron fails to be activated by the predicted reward (right). It also shows an activation after the reward-predicting stimulus, which occurs irrespective of an error in the prediction of the later reward (left). Bottom: conditioned stimulus predicts a reward, but the reward fails to occur because of lack of reaction by the animal. Activity of the dopamine neuron is depressed exactly at the time when the reward would have occurred. Note the depression occurring >1 s after the conditioned stimulus without any intervening stimuli, revealing an internal process of reward expectation. Neuronal activity in the 3 graphs follows the equation: dopamine response (Reward) = reward occurred
reward predicted. CS, conditioned stimulus; R, primary reward. Reprinted from Schultz et al. (1997)
with permission by American Association for the Advancement of Science.
; Mirenowicz and Schultz 1994
; Romo and Schultz 1990
). One crucial difference between learning and fully acquired behavior is the degree of reward unpredictability. Dopamine neurons are activated by rewards during the learning phase but stop responding after full acquisition of visual and auditory reaction time tasks (Ljungberg et al. 1992
; Mirenowicz and Schultz 1994
), spatial delayed response tasks (Schultz et al. 1993
), and simultaneous visual discriminations (Hollerman and Schultz 1996
). The loss of response is not due to a developing general insensitivity to rewards, as activations following rewards delivered outside of tasks do not decrement during several months of experimentation (Mirenowicz and Schultz 1994
). The importance of unpredictability includes the time of reward, as demonstrated by transient activations following rewards that are suddenly delivered earlier or later than predicted (Hollerman and Schultz 1996
). Taken together, the occurrence of reward, including its time, must be unpredicted to activate dopamine neurons.
; Ljungberg et al. 1991
; Schultz et al. 1993
). When reward delivery is delayed for 0.5 or 1.0 s, a depression of neuronal activity occurs at the regular time of the reward, and an activation follows the reward at the new time (Hollerman and Schultz 1996
). Both responses occur only during a few repetitions until the new time of reward delivery becomes predicted again. By contrast, delivering reward earlier than habitual results in an activation at the new time of reward but fails to induce a depression at the habitual time. This suggests that unusually early reward delivery cancels the reward prediction for the habitual time. Thus dopamine neurons monitor both the occurrence and the time of reward. In the absence of stimuli immediately preceding the omitted reward, the depressions do not constitute a simple neuronal response but reflect an expectation process based on an internal clock tracking the precise time of predicted reward.
; Ljungberg et al. 1991
, 1992
; Mirenowicz and Schultz 1994
; Schultz 1986
; Schultz and Romo 1990
; P. Waelti, J. Mirenowicz, and W. Schultz, unpublished data). The first dopamine responses to conditioned light were reported by Miller et al. (1981)
in rats treated with haloperidol, which increased the incidence and spontaneous activity of dopamine neurons but resulted in more sustained responses than in undrugged animals. Although responses occur close to behavioral reactions (Nishino et al. 1987
), they are unrelated to arm and eye movements themselves, as they occur also ipsilateral to the moving arm and in trials without arm or eye movements (Schultz and Romo 1990
). Conditioned stimuli are somewhat less effective than primary rewards in terms of response magnitude and fractions of neurons activated. Dopamine neurons respond only to the onset of conditioned stimuli and not to their offset, even if stimulus offset predicts the reward (Schultz and Romo 1990
). Dopamine neurons do not distinguish between visual and auditory modalities of conditioned appetitive stimuli. However, they discriminate between appetitive and neutral or aversive stimuli as long as they are physically sufficiently dissimilar (Ljungberg et al. 1992
; P. Waelti, J. Mirenowicz, and W. Schultz, unpublished data). Only 11% of dopamine neurons, most of them with appetitive responses, show the typical phasic activations also in response to conditioned aversive visual or auditory stimuli in active avoidance tasks in which animals release a key to avoid an air puff or a drop of hypertonic saline (Mirenowicz and Schultz 1996
), although such avoidance may be viewed as "rewarding." These few activations are not sufficiently strong to induce an average population response. Thus the phasic responses of dopamine neurons preferentially report environmental stimuli with appetitive motivational value but without discriminating between different sensory modalities.
; Ljungberg et al. 1992
; Mirenowicz and Schultz 1994
) (Figs. 2 and 3). During a transient learning period, both rewards and conditioned stimuli elicit dopamine activations. This transfer from primary reward to the conditioned stimulus occurs instantaneously in single dopamine neurons tested in two well-learned tasks employing, respectively, unpredicted and predicted rewards (Romo and Schultz 1990
).

View larger version (20K):
[in a new window]
FIG. 3.
Dopamine response transfer to earliest predictive stimulus. Responses to unpredicted primary reward transfer to progressively earlier reward-predicting stimuli. All displays show population histograms obtained by averaging normalized perievent time histograms of all dopamine neurons recorded in the behavioral situations indicated, independent of the presence or absence of a response. Top: outside of any behavioral task, there is no population response in 44 neurons tested with a small light (data from Ljungberg et al. 1992
), but an average response occurs in 35 neurons to a drop of liquid delivered at a spout in front of the animal's mouth (Mirenowicz and Schultz 1994
). Middle: response to a reward-predicting trigger stimulus in a 2-choice spatial reaching task, but absence of response to reward delivered during established task performance in the same 23 neurons (Schultz et al. 1993
). Bottom: response to an instruction cue preceding the reward-predicting trigger stimulus by a fixed interval of 1 s in an instructed spatial reaching task (19 neurons) (Schultz et al. 1993
). Time base is split because of varying intervals between conditioned stimuli and reward. Reprinted from Schultz et al. (1995b)
with permission by MIT Press.
). Only randomly spaced sequential stimuli elicit individual responses. Also, extensive overtraining with highly stereotyped task performance attenuates the responses to conditioned stimuli, probably because stimuli become predicted by events in the preceding trial (Ljungberg et al. 1992
). This suggests that stimulus unpredictability is a common requirement for all stimuli activating dopamine neurons.
) suggest that dopamine neurons also are depressed when a conditioned, reward-predicting stimulus is predicted itself at a fixed time by a preceding stimulus but fails to occur because of an error of the animal. As with primary rewards, the depressions occur at the time of the usual occurrence of the conditioned stimulus, without being directly elicited by a preceding stimulus. Thus the omission-induced depression may be generalized to all appetitive events.
). Opening of an empty box fails to activate dopamine neurons but becomes effective in every trial as soon as the box occasionally contains food (Ljungberg et al. 1992
; Schultz 1986
; Schultz and Romo 1990
) or when a neighboring, identical box always containing food opens in random alternation (Schultz and Romo 1990
). The empty box elicits weaker activations than the baited box. Animals perform indiscriminate ocular orienting reactions to each box but only approach the baited box with their hand. During learning, dopamine neurons continue to respond to previously conditioned stimuli that lose their reward prediction when reward contingencies change (Schultz et al. 1993
) or respond to new stimuli resembling previously conditioned stimuli (Hollerman and Schultz 1996
). Responses occur even to aversive stimuli presented in random alternation with physically similar, conditioned appetitive stimuli of the same sensory modality, the aversive response being weaker than the appetitive one (Mirenowicz and Schultz 1996
). Responses generalize even to behaviorally extinguished appetitive stimuli. Apparently, neuronal responses generalize to nonappetitive stimuli because of their physical resemblance with appetitive stimuli.
). Loud clicks or large pictures immediately in front of an animal elicit strong novelty responses that decay but still induce measurable activations with >1,000 trials (Hollerman and Schultz 1996
; Horvitz et al. 1997
; Steinfels et al. 1983
). Figure 4 shows schematically the different response magnitudes with novel stimuli of different physical salience. Responses decay gradually with repeated exposure but may persist at reduced magnitudes with very salient stimuli. Response magnitudes increase again when the same stimuli are appetitively conditioned. By contrast, responses to novel, even large, stimuli subside rapidly when the stimuli are used for conditioning active avoidance behavior (Mirenowicz and Schultz 1996
). Very few neurons (<5%) respond for more than a few trials to conspicuous yet physically weak stimuli, such as crumbling of paper or gross hand movements of the experimenter.

View larger version (10K):
[in a new window]
FIG. 4.
Time courses of activations of dopamine neurons to novel, alerting, and conditioned stimuli. Activations after novel stimuli decrease with repeated exposure over consecutive trials. Their magnitude depends on the physical salience of stimuli as stronger stimuli induce higher activations that occasionally exceed those after conditioned stimuli. Particularly salient stimuli continue to activate dopamine neurons with limited magnitude even after losing their novelty without being paired with primary rewards. Consistent activations appear again when stimuli become associated with primary rewards. This scheme was contributed by Jose Contreras-Vidal.
; Schultz et al. 1993
). Response latencies (50-110 ms) and durations (<200 ms) are similar among primary rewards, conditioned stimuli, and novel stimuli. Thus the dopamine response constitutes a relatively homogeneous, scalar population signal. It is graded in magnitude by the responsiveness of individual neurons and by the fraction of responding neurons within the population.
; Mirenowicz and Schultz 1994
). During learning, novel, intrinsically neutral stimuli transiently induce responses that weaken soon and disappear (Fig. 4). Primary rewards occur unpredictably during initial pairing with such stimuli and elicit neuronal activations. With repeated pairing, rewards become predicted by conditioned stimuli. Activations after the reward decrease gradually and are transferred to the conditioned, reward-predicting stimulus. If, however, a predicted reward fails to occur because of an error of the animal, dopamine neurons are depressed at the time the reward would have occurred. During repeated learning of tasks (Schultz et al. 1993
) or task components (Hollerman and Schultz 1996
), the earliest conditioned stimuli activate dopamine neurons during all learning phases because of generalization to previously learned, similar stimuli, whereas subsequent conditioned stimuli and primary rewards activate dopamine neurons only transiently while they are uncertain and new contingencies are being established.

View larger version (10K):
[in a new window]
FIG. 5.
Schematic display of responses of dopamine neurons to 2 types of conditioned stimuli. Top: presentation of an explicit reward-predicting stimulus leads to activation after the stimulus, no response to the predicted reward, and depression when a predicted reward fails to occur. Bottom: presentation of a stimulus closely resembling a conditioned, reward-predicting stimulus leads to activation followed by depression, activation after the reward, and no response when no reward occurs. Activation after the stimulus probably reflects response generalization because of physical similarity. This stimulus does not explicitly predict a reward but is related to the reward via its similarity to the stimulus predicting the reward. In comparison with explicit reward-predicting stimuli, activations are lower and often are followed by depressions, thus discriminating between rewarded (CS+) and unrewarded (CS
) conditioned stimuli. This scheme summarizes results from previous and current experiments (Hollerman and Schultz 1996
; Ljungberg et al. 1992
; Mirenowicz and Schultz 1996
; Schultz and Romo 1990
; Schultz et al. 1993
; P. Waelti and W. Schultz, unpublished results).
). Novel stimuli are potentially appetitive. Novel or particularly salient stimuli induce activations that are frequently followed by depressions, similar to responses to generalizing stimuli.
, 1997
) and is tentatively formalized as
This suggestion can be extended to conditioned appetitive events that also are reported by dopamine neurons relative to prediction. Thus dopamine neurons may report an error in the prediction of all appetitive events, and Eq. 1 can be stated in the more general form
(1)
This generalization is compatible with the idea that most rewards actually are conditioned stimuli. With several consecutive, well-established reward-predicting events, only the first event is unpredictable and elicits the dopamine activation.
(2)
| |
CONNECTIVITY OF DOPAMINE NEURONS |
|---|
Origin of the dopamine response
Which anatomic inputs could be responsible for the selectivity and polysensory nature of dopamine responses? Which input activity could lead to the coding of prediction errors, induce the adaptive response transfer to the earliest unpredicted appetitive event and estimate the time of reward?
DORSAL AND VENTRAL STRIATUM.
The GABAergic neurons in the striosomes (patches) of the striatum project in a broadly topographic and partly overlapping, interdigitating manner to dopamine neurons in nearly the entire pars compacta of substantia nigra, whereas neurons of the much larger striatal matrix contact predominantly the nondopamine neurons of pars reticulata of substantia nigra, besides their projection to globus pallidus (Gerfen 1984
; Hedreen and DeLong 1991
; Holstein et al. 1986
; Jimenez-Castellanos and Graybiel 1989
; Selemon and Goldman-Rakic 1990
; Smith and Bolam 1991
). Neurons in the ventral striatum project in a nontopographic manner to both pars compacta and pars reticulata of medial substantia nigra and to the ventral tegmental area (Berendse et al. 1992
; Haber et al. 1990
; Lynd-Balta and Haber 1994
; Somogyi et al. 1981
). The GABAergic striatonigral projection may exert two distinctively different influences on dopamine neurons, a direct inhibition and an indirect activation (Grace and Bunney 1985
; Smith and Grace 1992
; Tepper et al. 1995
). The latter is mediated by striatal inhibition of pars reticulata neurons and subsequent GABAergic inhibition from local axon collaterals of pars reticulata output neurons onto dopamine neurons. This constitutes a double inhibitory link and results in net activation of dopamine neurons by the striatum. Thus striosomes and ventral striatum may monosynaptically inhibit and the matrix may indirectly activate dopamine neurons.
; Williams et al. 1993
), responses to reward-predicting stimuli (Hollerman et al. 1994
; Romo et al. 1992
) and sustained activations during the expectation of reward-predicting stimuli and primary rewards (Apicella et al. 1992
; Schultz et al. 1992
). However, the positions of these neurons relative to striosomes and matrix are unknown, and striatal activations reflecting the time of expected reward have not yet been reported.
; Miller et al. 1993
) could combine with rapid conduction to striatum and double inhibition of substantia nigra to induce the short dopamine response latencies of <100 ms. Whereas reward-related activity has not been reported for posterior association cortex, neurons in dorsolateral and orbital prefrontal cortex respond to primary rewards and reward-predicting stimuli and show sustained activations during reward expectation (Rolls et al. 1996
; Thorpe et al. 1983
; Tremblay and Schultz 1995
; Watanabe 1996
). Some reward responses in frontal cortex depend on reward unpredictability (Matsumoto et al. 1995
; L. Tremblay and W. Schultz, unpublished results) or reflect behavioral errors or omitted rewards (Niki and Watanabe 1979
; Watanabe 1989
). The cortical influence on dopamine neurons would even be faster through a direct projection, originating from prefrontal cortex in rats (Gariano and Groves 1988
; Sesack and Pickel 1992
; Tong et al. 1996
) but being weak in monkeys (Künzle 1978
).
NUCLEUS PEDUNCULOPONTINUS.
Short latencies of reward responses may be derived from adaptive, feature-processing mechanisms in the brain stem. Nucleus pedunculopontinus is an evolutionary precursor of substantia nigra. In nonmammalian vertebrates, it contains dopamine neurons and projects to the paleostriatum (Lohman and Van Woerden-Verkley 1978
). In mammals, this nucleus sends strong excitatory, cholinergic, and glutamatergic influences to a high fraction of dopamine neurons with latencies of ~7 ms (Bolam et al. 1991
; Clarke et al. 1987
; Futami et al. 1995
; Scarnati et al. 1986
). Activation of pedunculopontine-dopamine projections induces circling behavior (Niijima and Yoshida 1988
), suggesting a functional influence on dopamine neurons.
AMYGDALA.
A massive, probably excitatory input to dopamine neurons arises from different nuclei of the amygdala (Gonzalez and Chesselet 1990; Price and Amaral 1981
). Amygdala neurons respond to primary rewards and reward-predicting visual and auditory stimuli. The neuronal responses known so far are independent of stimulus unpredictability and do not discriminate well between appetitive and aversive events (Nakamura et al. 1992
; Nishijo et al. 1988
). Most responses show latencies of 140-310 ms, which are longer than in dopamine neurons, although a few responses occur at latencies of 60-100 ms.
DORSAL RAPHÉ.
The monosynaptic projection from dorsal raphé (Corvaja et al. 1993
; Nedergaard et al. 1988
) has a depressant influence on dopamine neurons (Fibiger et al. 1977
; Trent and Tepper 1991
). Raphé neurons show short-latency activations after high-intensity environmental stimuli (Heym et al. 1982
), allowing them to contribute to dopamine responses after novel or particularly salient stimuli.
SYNTHESIS.
A few, well-known input structures are the most likely candidates for mediating the dopamine responses, although additional inputs also may exist. Activations of dopamine neurons by primary rewards and reward-predicting stimuli could be mediated by double inhibitory, net activating input from the striatal matrix (for a simplified diagram, see Fig. 6). Activations also could arise from pedunculopontine nucleus or possibly from reward expectation-related activity in neurons of the subthalamic nucleus projecting to dopamine neurons (Hammond et al. 1983
; Matsumura et al. 1992
; Smith et al. 1990
). The absence of activation with fully predicted rewards could be the result of monosynaptic inhibition from striosomes, cancelling out simultaneously activating matrix input. Depressions at the time of omitted reward could be mediated by inhibitory inputs from neurons in striatal striosomes (Houk et al. 1995
) or globus pallidus (Haber et al. 1993
; Hattori et al. 1975
; Y. Smith and Bolam 1990
, 1991
). Convergence between different inputs before or at the level of dopamine neurons could result in the rather complex coding of reward prediction errors and the adaptive response transfer from primary rewards to reward-predicting stimuli.
|
Phasic dopamine influences on target structures
GLOBAL NATURE OF DOPAMINE SIGNAL.
Divergent projections. There are ~8,000 dopamine neurons in each substantia nigra of rats (Oorschot 1996
) and 80,000-116,000 in macaque monkeys (German et al. 1988
; Percheron et al. 1989
). Each striatum contains ~2.8 million neurons in rats and 31 million in macaques, resulting in a nigrostriatal divergence factor of 300-400. Each dopamine axon ramifies abundantly in a limited terminal area in striatum and has ~500,000 striatal varicosities from which dopamine is released (Andén et al. 1966
). This results in dopamine input to nearly every striatal neuron (Groves et al. 1995
) and a moderately topographic nigrostriatal projection (Lynd-Balta and Haber 1994
). The cortical dopamine innervation in monkeys is highest in areas 4 and 6, is still sizeable in frontal, parietal, and temporal lobes, and is lowest in the occipital lobe (Berger et al. 1988
; Williams and Goldman-Rakic 1993
). Cortical dopamine synapses are predominantly found in layers I and V-VI, contacting a large proportion of cortical neurons there. Together with the rather homogeneous response nature, these data suggest that the dopamine response advances as a simultaneous, parallel wave of activity from the midbrain to striatum and frontal cortex (Fig. 7).
|
; Gonon 1988
). This nonlinearity is mainly due to the rapid saturation of the dopamine reuptake transporter, which clears the released dopamine from the extrasynaptic region (Chergui et al. 1994
). The same effect is observed in nucleus accumbens (Wightman and Zimmerman 1990
) and occurs even with longer impulse intervals because of sparser reuptake sites (Garris et al. 1994b
; Marshall et al. 1990
; Stamford et al. 1988
). Dopamine release after an impulse burst of <300 ms is too short for activating the autoreceptor-mediated reduction of release (Chergui et al. 1994
) or the even slower enzymatic degradation (Michael et al. 1985
). Thus a bursting dopamine response is particularly efficient for releasing dopamine.
; Kawagoe et al. 1992
). At 40 µs after release onset, >90% of dopamine has left the synapse, some of the rest being later eliminated by synaptic reuptake (half onset time of 30-37 ms). At 3-9 ms after release onset, dopamine concentrations reach a peak of ~250 nM when all neighboring varicosities simultaneously release dopamine. Concentrations are homogeneous within a sphere of 4 µm diam (Gonon 1997
), which is the average distance between varicosities (Doucet et al. 1986
; Groves et al. 1995
). Maximal diffusion is restricted to 12 µm by the reuptake transporter and is reached in 75 ms after release onset (half transporter onset time of 30-37 ms). Concentrations would be slightly lower and less homogeneous in regions with fewer varicosities or when <100% of dopamine neurons are activated, but they are two to three times higher with impulse bursts. Thus the reward-induced, mildly synchronous, bursting activations in ~75% of dopamine neurons may lead to rather homogeneous concentration peaks in the order of 150-400 nM. Total increases of extracellular dopamine last 200 ms after a single impulse and 500-600 ms after multiple impulses of 20-100 ms intervals applied during 100-200 ms (Chergui et al. 1994
; Dugast et al. 1994
). The extrasynaptic reuptake transporter (Nirenberg et al. 1996
) subsequently brings dopamine concentrations back to their baseline of 5-10 nM (Herrera-Marschitz et al. 1996
). Thus in contrast to classic, strictly synaptic neurotransmission, synaptically released dopamine diffuses rapidly into the immediate juxtasynaptic area and reaches short peaks of regionally homogenous extracellular concentrations.
). The remaining 20% of striatal dopamine receptors belong to the adenylase cyclase-inhibiting D2 type of which 10-0% are in the low-affinity state and 80-90% in the high-affinity state, with similar affinities as D1 receptors. Thus D1 receptors overall have an ~100 times lower affinity than D2 receptors. Striatal D1 receptors are located predominantly on neurons projecting to internal pallidum and substantia nigra pars reticulata, whereas striatal D2 receptors are located mostly on neurons projecting to external pallidum (Bergson et al. 1995
; Gerfen et al. 1990
; Hersch et al. 1995
; Levey et al. 1993
). However, the differences in receptor sensitivity may not play a role beyond signal transduction, thus reducing the differences in dopamine sensitivity between the two types of striatal output neurons.
DOPAMINE MEMBRANE ACTIONS.
Dopamine actions on striatal neurons depend on the type of receptor activated, are related to the depolarized versus hyperpolarized states of membrane potentials and often involve glutamate receptors. Activation of D1 dopamine receptors enhances the excitation evoked by activation of N-methyl-D-aspartate (NMDA) receptors after cortical inputs via L-type Ca2+ channels when the membrane potential is in the depolarized state (Cepeda et al. 1993 DOPAMINE-DEPENDENT PLASTICITY.
Tetanic electrical stimulation of cortical or limbic inputs to striatum and nucleus accumbens induces posttetanic depressions lasting several tens of minutes in slices (Calabresi et al. 1992a PROCESSING IN STRIATAL NEURONS.
An estimated 10,000 cortical terminals and 1,000 dopamine varicosities contact the dendritic spines of each striatal neuron (Doucet et al. 1986 Dopamine neurons appear to report appetitive events according to a prediction error (Eqs. 1 and 2). Current learning theories and neuronal models demonstrate the crucial importance of prediction errors for learning.
Learning theories
RESCORLA-WAGNER MODEL.
Behavioral learning theories formalize the acquisition of associations between arbitrary stimuli and primary motivating events in classical conditioning paradigms. Stimuli gain associative strength over consecutive trials by being repeatedly paired with a primary motivating event
). Synaptically released dopamine acts on postsynaptic dopamine receptors at four anatomically distinct sites in the striatum, namely inside dopamine synapses, immediately adjacent to dopamine synapses, inside corticostriatal glutamate synapses, and at extrasynaptic sites remote from release sites (Fig. 8) (Levey et al. 1993
; Sesack et al. 1994
; Yung et al. 1995
). D1 receptors are localized mainly outside of dopamine synapses (Caillé et al. 1996
). The high transient concentrations of dopamine after phasic impulse bursts would activate D1 receptors in the immediate vicinity of the active release sites and activate and even saturate D2 receptors everywhere. D2 receptors would remain partly activated when the ambient dopamine concentration returns to baseline after phasic increases.

View larger version (21K):
[in a new window]
FIG. 8.
Influences of dopamine release on typical medium spiny neurons in the dorsal and ventral striatum. Dopamine released by impulses from synaptic varicosities activates a few synaptic receptors (probably of D2 type in the low-affinity state) and diffuses rapidly out of the synapse to reach low affinity D1 type receptors (D1?) that are located nearby, within corticostriatal synapses, or at a limited distance. Phasically increased dopamine activates nearby high-affinity D2 type receptors to saturation (D2?). D2 receptors remain partly activated by the ambient dopamine concentrations after the phasically increased release. Extrasynaptically released dopamine may get diluted by diffusion and activate high-affinity D2 receptors. It should be noted that, in variance with this schematic diagram, most D1 and D2 receptors are located on different neurons. Glutamate released from corticostriatal terminals reaches postsynaptic receptors located on the same dendritic spines as dopamine varicosities. Glutamate also reaches presynaptic dopamine varicosities where it controls dopamine release. Dopamine influences on spiny neurons in frontal cortex are comparable in many respects.
a; Giros et al. 1996
; Suaud-Chagny et al. 1995
). The enhancement would be particularly pronounced when rapid, burst-induced increases in dopamine concentration reach a peak before feedback regulation becomes effective. This mechanism would lead to a massively enhanced dopamine signal after primary rewards and reward-predicting stimuli. It also would increase the somewhat weaker dopamine signal after stimuli resembling rewards, novel stimuli, and particularly salient stimuli that might be frequent in everyday life. The enhancement by cocaine would let these nonrewarding stimuli appear as strong or even stronger than natural rewards without cocaine. Postsynaptic neurons could misinterpret such a signal as a particularly prominent reward-related event and undergo long-term changes in synaptic transmission.
, 1998
; Hernandez-Lopez et al. 1997
; Kawaguchi et al. 1989
). By contrast, D1 activation appears to reduce evoked excitations when the membrane potential is in the hyperpolarized state (Hernandez-Lopez et al. 1997
). In vivo dopamine iontophoresis and axonal stimulation induce D1-mediated excitations lasting 100-500 ms beyond dopamine release (Gonon 1997
; Williams and Millar 1991). Activation of D2 dopamine receptors reduces Na+ and N-type Ca2+ currents and attenuates excitations evoked by activation of NMDA or
-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors at any membrane state (Cepeda et al. 1995
; Yan et al. 1997
). At the systems level, dopamine exerts a focusing effect whereby only the strongest inputs pass through striatum to external and internal pallidum, whereas weaker activity is lost (Brown and Arbuthnott 1983
; Filion et al. 1988
; Toan and Schultz 1985
; Yim and Mogenson 1982
). Thus the dopamine released by the dopamine response may lead to an immediate overall reduction in striatal activity, although a facilitatory effect on cortically evoked excitations may be mediated via D1 receptors. The following discussion will show that the effects of dopamine neurotransmission may not be limited to changes in membrane polarization.
; Lovinger et al. 1993
; Pennartz et al. 1993
; Walsh 1993
; Wickens et al. 1996
). This manipulation also enhances the excitability of corticostriatal terminals (Garcia-Munoz et al. 1992
). Posttetanic potentiation of similar durations is observed in striatum and nucleus accumbens when postsynaptic depolarization is facilitated by removal of magnesium or application of
-aminobutyric acid (GABA) antagonists (Boeijinga et al. 1993
; Calabresi et al. 1992b
; Pennartz et al. 1993
). D1 or D2 dopamine receptor antagonists or D2 receptor knockout abolish posttetanic corticostriatal depression (Calabresi et al. 1992a
; Calabresi et al. 1997
; Garcia-Munoz et al. 1992
) but do not affect potentiation in nucleus accumbens (Pennartz et al. 1993
). Application of dopamine restores striatal posttetanic depression in slices from dopamine-lesioned rats (Calabresi et al. 1992a
) but fails to modify posttetanic potentiation (Pennartz et al. 1993
). Short pulses of dopamine (5-20 ms) induce long-term potentiation in striatal slices when applied simultaneously with tetanic corticostriatal stimulation and postsynaptic depolarization, complying with a three-factor reinforcement learning rule (Wickens et al. 1996
).
) and impaired by D1 and D2 receptor blockade (Frey et al. 1990
). Burst contingent but not burst noncontingent local applications of dopamine and dopamine agonists increase neuronal bursting in hippocampal slices (Stein et al. 1994
). In fish retina, activation of D2 dopamine receptors induces movements of photoreceptors in or out of the pigment epithelium (Rogawski 1987
). Posttrial injections of amphetamine and dopamine agonists into rat caudate nucleus improve performance in memory tasks (Packard and White 1991
). Dopamine denervations in the striatum reduce the number of dendritic spines (Arbuthnott and Ingham 1993
; Anglade et al. 1996
; Ingham et al. 1993
), suggesting that the dopamine innervation has persistent effects on corticostriatal synapses.
; Groves et al. 1995
; Wilson 1995
). The dense dopamine innervation becomes visible as baskets outlining individual perikarya in pigeon paleostriatum (Wynne and Güntürkün 1995
). Dopamine varicosities form synapses on the same dendritic spines of striatal neurons that are contacted by cortical glutamate afferents (Fig. 8) (Bouyer et al. 1984
; Freund et al. 1984
; Pickel et al. 1981
; Smith et al. 1994
), and some dopamine receptors are located inside corticostriatal synapses (Levey et al. 1993
; Yung et al. 1995
). The high number of cortical inputs to striatal neurons, the convergence between dopamine and glutamate inputs at the spines of striatal neurons, and the largely homogeneous dopamine signal reaching probably all striatal neurons are ideal substrates for dopamine-dependent synaptic changes at the spines of striatal neurons. This also may hold for the cortex where dendritic spines are contacted by synaptic inputs from both dopamine and cortical neurons (Goldman-Rakic et al. 1989
), although dopamine probably does not influence every cortical neuron.
). Many inputs from functionally heterogeneous cortical areas to the striatum are organized in segregated, parallel channels, as are the outputs from internal pallidum directed to different motor cortical areas (Alexander et al. 1986
; Hoover and Strick 1993
). However, afferents from functionally related but anatomically different cortical areas may converge on striatal neurons. For example, projections from somatotopically related areas of primary somatosensory and motor cortex project to common striatal regions (Flaherty and Graybiel 1993
, 1994
). Corticostriatal projections diverge into separate striatal "matrisomes" and reconverge in the pallidum, thus increasing the synaptic "surface" for modulatory interactions and associations (Graybiel et al. 1994
). This anatomic arrangement would allow the dopamine signal to determine the efficacy of highly structured, task-specific cortical inputs to striatal neurons and exert a widespread influence on forebrain centers involved in the control of behavioral action.
![]()
USING THE DOPAMINE REWARD PREDICTION ERROR SIGNAL
where V is current associative strength of the stimulus,
(3)
is maximum associative strength possibly sustained by the primary motivating event,
and
are constants reflecting the salience of conditioned and unconditioned stimuli, respectively (Dickinson 1980
; Mackintosh 1975
; Pearce and Hall 1980
; Rescorla and Wagner 1972
). The (
-V) term indicates the degree to which the primary motivating event occurs unpredictably and represents an error in the prediction of reinforcement. It determines the rate of learning, as associative strength increases when the error term is positive and the conditioned stimulus does not fully predict the reinforcement. When V =
, the conditioned stimulus fully predicts the reinforcer, and V will not further increase. Thus learning occurs only when the primary motivating event is not fully predicted by a conditioned stimulus. This interpretation is suggested by the blocking phenomenon, according to which a stimulus fails to gain associative strength when presented together with another stimulus that by itself fully predicts the reinforcer (Kamin 1969
). The (
-V) error term becomes negative when a predicted reinforcer fails to occur, leading to a loss of associative strength of the conditioned stimulus (extinction). Note that these models use the term "reinforcement" in the broad sense of increasing the frequency and intensity of specific behavior and do not refer to any particular type of learning.
DELTA RULE.
The Rescorla-Wagner model relates to the general principle of learning driven by errors between the desired and the actual output, such as the least mean square error procedure (Kalman 1960
; Widrow and Sterns 1985
). This principle has been applied to neuronal network models in the Delta rule, according to which synaptic weights (
) are adjusted by
|
(4) |
and x are learning rate and input activation, respectively (Rumelhart et al. 1986
), the actual output (a) is analogous to the prediction modified during learning (V), and the delta