|
|
||||||||
Kresge Hearing Research Institute, University of Michigan, Ann Arbor, Michigan 48109-0506
Submitted 29 October 2002; accepted in final form 5 February 2003
| ABSTRACT |
|---|
|
|
|---|
-chloralose-anesthetized cats. We assessed spatial sensitivity by examining the dependence of spike count and response latency on stimulus location. In addition, we used an artificial neural network (ANN) to assess the information about stimulus location carried by spike patterns of single units and of ensembles of 232 units. The results indicate increased spatial sensitivity, more uniform distributions of preferred locations, and greater tolerance to changes in stimulus intensity among PAF units relative to A1 units. Compared to A1 units, PAF units responded at significantly longer latencies, and latencies varied more strongly with stimulus location. ANN analysis revealed significantly greater information transmission by spike patterns of PAF than A1 units, primarily reflecting the information transmitted by latency variation in PAF. Finally, information rates grew more rapidly with the number of units included in neural ensembles for PAF than A1. The latter finding suggests more accurate population coding of space in PAF, made possible by a more diverse population of neural response types. | INTRODUCTION |
|---|
|
|
|---|
Previous work in our laboratory has used statistical pattern-recognition algorithms to demonstrate that the temporal firing patterns of individual neurons carry significant amounts of information about the locations of sound sources throughout 360° of azimuth (Middlebrooks et al. 1998
) and similarly broad ranges of elevation (Xu et al. 1998
). The results suggest an alternative to the view that the representation of space in auditory cortex relies upon a topographic organization of sharply tuned neurons. Specifically, the cortex appears to employ a distributed representation by panoramic neurons, whereby each neuron is involved in representing many different locations and each single location in space is represented by the coordinated responses of many different cortical neurons (Furukawa et al. 2000
). Panoramic coding by individual neurons involves stimulus-related changes in both the firing rates and temporal patterns of spikes, with first-spike latencies carrying much of the relevant information (Furukawa and Middle-brooks 2001
; Middlebrooks et al. 1998
).
Given the behavioral importance of sound localization, it is surprising that no cortical area has yet been distinguished as qualitatively specialized for localization. Certain auditory fields show connections with spatial structuresfor example, field anterior ectosylvian sulcus (AES) projects strongly to the superior colliculus (Meredith and Clemo 1989
)but little physiological specificity for sound-source location. Rauschecker (Rauschecker 1998
; Rauschecker and Tian 2000
) has proposed a hypothesis, based on analogy with the primate visual system, that the primate auditory system might contain separate cortical processing "streams" specialized for auditory object identification [the "what" stream, involving fields rostral to primary auditory cortex (A1)] and spatial processing (the "where" stream, involving caudal fields). This view has received some quantitative support from physiological studies in macaques, demonstrating an increased prevalence of direction-selective neurons in the caudolateral (Tian et al. 2001
) and caudomedial (Recanzone et al. 2000
) belt areas of auditory cortex. Nevertheless, no evidence has yet been shown for obvious qualitative differences in spatial sensitivity indicative of a higher level of spatial processing in particular cortical fields.
A number of studies in our laboratory have examined the spatial sensitivity of auditory cortical neurons in the cat (Furukawa and Middlebrooks 2001
; Furukawa et al. 2000
; Mickey and Middlebrooks 2001
; Middlebrooks et al. 1998
; Xu et al. 1998
), focusing on non-tonotopic field A2, field AES, and A1. The results of these studies reveal that the general characteristics of spatial sensitivity are quite similar between cortical fields, although minor quantitative differences in tuning and panoramic location coding exist between fields. The lack of fundamental differences between cortical fields suggests that perhaps no area specialized for sound localization exists within the cortex. One possible explanation for this is that the processing of spatial information is essentially complete within the auditory brain stem (Middlebrooks et al. 2002
). According to that view, the role of cortex in spatial processing is limited to the distribution of pre-processed information to appropriate perceptual, memory, and motor systems rather than the actual computation of source location. That model accounts for the current lack of evidence for specialization for spatial processing. Another alternative, however, is that specialized location coding exists in one of several cortical areas in which spatial sensitivity has yet to be examined. The present study considers one of those areas, the posterior auditory field (PAF), located posterior to A1 along the caudal bank of the posterior ectosylvian sulcus of the cat cortex.
PAF is a reasonable candidate for spatial coding, as several aspects of PAF responses suggest increased sensitivity to stimulus features that covary with location. First, neurons in PAF show an increased prevalence of complex frequency-tuning characteristics (Heil and Irvine 1998
; Loftus and Sutter 2001
). Frequency-response areas in PAF often possess multiple excitatory and inhibitory domains arranged in frequency and level. Because location-specific spectral cues (characterized by the directional transfer function, or DTF) contribute significantly to sound-localization, sensitivity to spectral shape could be a significant element of spatial sensitivity.
Second, PAF contains a high proportion of neurons that exhibit nonmonotonic responses to increases in sound pressure level. While some have suggested a role for nonmonotonic rate-level functions in intensity coding (Kitzes and Hollrigel 1996
; Phillips and Orman 1984
), complex sound processing (Phillips et al. 1995
), and tracking of amplitude transients (Heil and Irvine 1998
), an alternative view is that nonmonotonicity helps to compensate for the effects of increasing SPL on other tuning properties. In A1, nonmonotonic units tend to show sharper spatial tuning and tuning that is more resistant to increasing sound level than neurons with monotonic rate-level functions (Barone et al. 1996
; Imig et al. 1990
). From this, one might expect to find more stable spatial tuning widths in PAF than in other cortical fields.
Finally, response latencies among PAF units are prolonged in comparison to other fields (Phillips and Orman 1984
; Phillips et al. 1995
) and are sensitive to stimulus features including tone frequency (Loftus and Sutter 2001
). Overall, first-spike latencies in PAF are most commonly 2030 ms (ranging up to 80 ms or more), compared to 1012 ms in A1. Although the origin of delayed latencies in PAF is not currently known, it is possible that PAF units could use them for temporal encoding of spatial location or other stimulus features.
Anatomically, PAF receives its principal thalamocortical projections from the ventral division of the medial geniculate body (MGB); it also receives input from the MGB's medial and dorsal divisions (Huang and Winer 2000
; Morel and Imig 1987
). Reciprocal corticocortical projections exist between PAF and ipsilateral A1, secondary auditory field (A2), anterior auditory field (AAF), and ventral-posterior auditory field (VPAF), along with contralateral fields PAF and VPAF (Rouiller et al. 1991
). PAF additionally projects to limbic structures including the cingulate and parahippocampal cortices and the claustrum (Rouiller et al. 1990
).
In the present study, we recorded responses of PAF and A1 units to sounds whose locations differed in both azimuth and elevation. We focussed on estimating the spatial sensitivity of neurons in the two cortical fields based on measures of response rate and latency. In addition, we analyzed the stimulus-related information conveyed by temporal firing patterns using a pattern-recognition algorithm based on artificial neural networks. The results of the study complement previous studies of azimuth and elevation tuning in areas A2 and AES (Furukawa and Middlebrooks 2001
; Furukawa et al. 2000
; Middlebrooks et al. 1998
; Xu et al. 1998
) and reveal somewhat increased spatial sensitivity in PAF relative to other studied cortical fields, along with the appearance of an enhanced latency code for sound-source location.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Six purpose-bred male (4) and female (2) cats, weighing between 3.4 and 7.0 kg were used in this study. The two female cats were previously trained to detect acoustic stimuli in a behavioral study. Male cats participated only in this terminal experiment. All procedures complied with guidelines of the University of Michigan Committee on Use and Care of Animals and were essentially identical to those described previously (Middlebrooks et al. 1998
). Briefly, surgical anesthesia was induced and maintained with isofluorane (23%) in nitrous oxide (2 l/m) and oxygen (1 l/m). After surgery, cats were transferred to intravenous
-chloralose (1.5 mg/ml) in Ringer solution for unit recording. Dosage was
3 mg · kg-1 · h-1 and was adjusted to maintain an areflexive state. Atropine sulfate (0.10.2 ml im) was administered at regular intervals throughout the experiment to suppress mucosal secretions. After partial removal of the scalp and right temporalis muscle, a craniotomy of 1-cm diam exposed the right middle ectosylvian gyrus and posterior ectosylvian sulcus (PES). The animal was positioned in the center of the sound chamber with its head held by a bar attached to the skull fixture and its body suspended in a fabric sling. Thin wire supports maintained symmetric pinna placement throughout the experiment. A warm-water heating pad maintained body temperature at 37°C. An esophogeal stethoscope fitted with a thermometer was used to continuously monitor the cat's temperature, heart rate, and respiration. Experiments lasted from 2 to 4 days, after which the cats were euthanized. The right cortical hemisphere was then removed and immersed in buffered formalin for later visual confirmation of the region of cortex recorded.
Experimental apparatus and stimulus generation
The experimental apparatus and procedures for stimulus generation were essentially identical to those detailed previously (Middlebrooks et al. 1998
; Xu et al. 1998
). Recordings were made in a 2.6 x 2.6 x 2.5-m sound-attenuating chamber, the surfaces of which were lined with sound-absorbing foam to suppress reflections. Sounds were presented one at a time from calibrated loudspeakers located 1.2 m from the cat's head. A circular hoop held 18 loudspeakers in the horizontal plane that spanned 360° of azimuth in 20° increments. The speaker location directly in front of the cat was labeled 0° with positive azimuths to the cat's right side (ipsilateral to the recording site) and negative azimuths to the left. A second circular hoop in the vertical median plane held 14 loudspeakers that spanned 260° of elevation in increments of 20°, from 60° below the frontal horizon (-60°) up and over the head to 20° below the rear horizon (+200°). Experiments were controlled by a personal computer, and acoustic stimuli were synthesized digitally using equipment from Tucker-Davis Technologies (TDT; Gainesville, FL). All stimuli were generated with 16-bit precision at a 100-kHz sampling rate. A computer-controlled multiplexer permitted any one loudspeaker to be activated at a time. Stimuli were either 80-ms Gaussian noise bursts with abrupt onsets and offsets or 80-ms pure tones with 5-ms raised-cosine onset/offset ramps.
Data acquisition and spike sorting
Extracellular unit activity was recorded using multichannel silicon-substrate microprobes. These probes, provided by the University of Michigan Center for Neural Communication Technology (Anderson et al. 1989
), permit simultaneous recording from
16 cortical sites and are fabricated in several formats. The data presented here were obtained using single-shanked probes with linear arrays of either 8 recording sites spaced every 200 µm or 16 sites spaced every 100 or 150 µm. Impedances were between 1 and 4 M
on 16-channel probes (site area: 177 µm2) and 340360 k
on 8-channel probes (site area: 1,250 µm2). In eight instances, two such probes were placed simultaneously in different cortical areas, one in PAF and one in A1, and we recorded from eight sites on each probe. Otherwise, a single probe was used and we recorded from all 8 or 16 sites at a time. Activity at each site was amplified and digitized with a TDT model RA16 biological amplifier/multichannel DSP system. Signals were sampled at 25 kHz, bandpass filtered (0.24 kHz), resampled at 12.5 kHz, and stored on a computer disk for offline analysis. Spikes were monitored online using custom software to estimate the thresholds and frequency tuning of units prior to data collection.
Off-line spike sorting was performed using custom software based on principal component analysis. This approach, based on that used by Furukawa et al. (2000
), involved three steps. First, the signalsinterpolated and resampled at 50 kHzwere "denoised" using a multi-channel array-processing technique (Bierer and Anderson 1999
). This procedure eliminates signal components that are correlated across multiple recording channels. Because many noise sources (e.g., instrumentation noise and background neural activity) are highly correlated across the recording array, denoising acts to improve the effective signal-to-noise ratio of the recorded waveforms. Second, candidate spikes were identified as waveform peaks that exceeded a criterion level proportional to the background root mean square (RMS) level recorded prior to stimulation. Third, candidate spike waveforms were sorted using principal component (PC) analysis with eight PCs. Clusters in PC space were defined either by hand or by statistical cluster analysis to identify neural units. In
2% of cases, two discrete clusters corresponding to reliably discriminable single units were identified from a single recording channel. Otherwise, at most one cluster was defined per channel. Poststimulus times of spikes accepted in the clustering procedure were stored with 20-µs resolution.
In contrast to previous studies (Furukawa and Middlebrooks 2001
), we chose to record from as many sites as possible per penetration rather than to obtain recordings from clearly isolated single neurons. The spike-sorting procedure described in the preceding text was used to obtain the best possible isolation of neural signals; however, a reasonable concern with multi-channel recordings is that estimates of neural responses may be distorted by the presence of signals from multiple neurons on individual recording channels. Using relatively strict criteria for waveshape separability and interspike timing (Furukawa and Middlebrooks 2001
), 19 of the current recordings (5%) could be identified, with certainty, as isolated single neurons. The remainder were either clusters of two or more neighboring neurons having similar spike waveshapes or single units whose spikes varied in shape due to low signal-to-noise ratio. Consistent with our previous experience, we observed no systematic differences between the stimulus-tuning properties of well-isolated neurons and those with more limited isolation. Thus we do not distinguish between them in this report; the term "unit" is used in reference to both. Correspondingly, the term "single unit" is used to describe one such unit, in contrast to "ensembles" of (
2) such units. Please note that this terminology differs from that of some studies using single-unit recordings in which relatively accurate isolation of single neurons is assumed. When necessary, we use the term "single neuron" to identify well-isolated single neurons; data for these are presented in several figures to allow comparison to the larger population of recorded units.
Units that responded with less than one spike per trial, on average, to their most effective stimulus were rejected from further analysis as were units whose average response across all stimuli varied by more than a factor of two between the first and second halves of trials in a recording session. This screening procedure was carried out independently for responses to stimuli varying in azimuth and elevation (see Experimental procedure) and resulted in a total of 117 units recorded from 13 penetrations in A1 and 267 units recorded from 31 penetrations in PAF. Of the 384 total units recorded, 349 were successfully screened for azimuth responses and 324 for elevation responses.
Experimental procedure
Recordings in this study focussed on cortical areas PAF and A1, which were identified initially by the cortical sulcal pattern and secondarily by their responsiveness to pure-tone stimulation, tonotopic organization, and response latencies. Penetrations in area PAF proceeded in the dorsoventral or lateromedial direction along the caudal bank of the PES. In no case did we distinguish a clear reversal of tonotopy proceeding ventrally through PAF, indicative of entry into field VPAF (Reale and Imig 1980
). However, the most ventral penetrations tended to reveal broad regions of tuning to higher frequencies (>12 kHz), suggesting that some penetrations may have been located in the transition region between ventral PAF and dorsal VPAF. Penetrations in A1 passed obliquely into the middle ectosylvian gyrus, generally proceeding in a rostrocaudal direction. Search stimuli, consisting of broadband noise bursts and 0.5- to 30-kHz pure tones, were presented from loudspeakers located at 0 or -40° (contralateral) azimuth or +80° elevation (10° from overhead). The penetration depth was adjusted to maximize the number of active recording sites, with typically 1014 sites showing unit responses.
Study of the units in each penetration began by estimating their thresholds to noise bursts tested in 5-dB increments of SPL. The stimuli were presented from a location at which units responded reliably, most often from loudspeakers at azimuths of 0 or -40° in the horizontal plane or in the midsaggittal plane at +80° elevation. Typically, unit thresholds varied by <10 dB across sites on a single penetration, and the modal threshold was adopted as the representative threshold for the penetration. Responses to pure-tone stimuli were tested using tone frequencies varying in 1/3 - or 1/6-octave steps from 1 to 30 kHz; tone levels varied in 10-dB steps, typically from 0 to 50 dB SPL. Pure tones were always presented from 80° elevation; this overhead location was chosen because the spectrum of the cats' DTF tended to be flattest there, minimizing the effects of filtering by the pinna on the units' responses (Xu and Middlebrooks 2000
). Next, we measured the units' spatial sensitivities using 80-ms noise bursts 20, 30, and 40 dB above threshold, presented from 18 locations in the horizontal plane (-180 to +160° in +20° steps) and 14 locations in the midsagittal plane (-60 to +200°). Stimuli were presented in pseudorandom order such that each combination of SPL and location was presented once before all combinations were repeated in a different random order; 40 repetitions were completed for each penetration. In some cases, locations in azimuth and elevation were tested in separate blocks; in others, all 30 locations were intermixed (0 and +180° appear in both sets). Neural activity was recorded from 2050 ms before to 80200 ms after the stimulus onset. Measurement of spatial sensitivity was often followed by presentations of additional stimuli related to other research questions so that study at each penetration lasted from 2 to 10 h. Experiments yielded data from 5 to 13 (median = 9) penetrations.
Data analysis
SPATIAL SENSITIVITY ASSESSED BY ANALYSIS OF SPIKE COUNT AND RESPONSE LATENCY. After spike sorting, spike times were stored as latencies relative to the onset of sound at the loudspeaker. Arrival of sound at the cat's head followed a delay of
3.5 ms due to acoustical travel time. Spatial sensitivity was assessed by analyzing spike rates, response latencies, and the amount of stimulus-related information conveyed by spike patterns. We defined ct and lt as the spike count (number of spikes recorded) and the latency of the first spike, respectively, for a single trial t. For each unit, we calculated the stimulus-specific spike count C as the arithmetic mean of ct for trials matching a given combination of location and level. Similarly, the stimulus-specific response latency L was defined as the geometric mean of first spike latencies lt for each stimulus. Trials that failed to elicit at least one spike were omitted from the calculation of L (following Furukawa and Middlebrooks 2001
). Examples of C and L calculated for single units are shown in Fig. 3.
|
To facilitate the calculation of spatial tuning statistics, C was normalized to give the proportion of maximum response across location, ranging from 0 (at locations yielding no spikes) to 1 (at the location yielding the maximum number of spikes). Similarly, normalizing L by the minimum latency across location (Lmin) and inverting the result gives us
![]() |
ranges from a minimum of 0 (when L
) to a maximum of 1 (when L = Lmin). Values of C or
near 1 indicate effective stimulus locations, whereas values near 0 indicate stimuli that were ineffective at driving the unit. The common form of C and
facilitates the computation of several statistics of spatial sensitivity based on either the spike counts or response latencies of each unit. Spike-count modulation depth, tuning width, and spatial centroid (all defined in the following text) were computed from C as in previous studies (Furukawa et al. 2000
in place of C. Note that the normalization of C and
across location was performed separately for azimuth and elevation at each stimulus level as was the computation of spatial statistics described in the following text. Where appropriate, subscripts are used to indicate the type of statistic (e.g.,
Laz or
Lel).
Depth of response modulation by location (
). The depth (or range) of response modulation is the degree to which response latencies or spike counts vary across space. It is designated by
C for modulation of spike counts and
L for modulation of response latency, and computed as the range of variation in C or L
![]() | (1) |
![]() | (2) |
L is based on L, not
, and has units of milliseconds.
Caz and
Laz designate the modulation depth across azimuth;
Cel and
Lel correspond to variation in elevation.
Spatial tuning width (W). Spatial tuning width characterizes the range of locations that were effective in eliciting a strong or rapid response from a given unit. For each unit, values of C or
were linearly interpolated between locations at a resolution of 0.2°. Tuning width WC or WL was defined as the range of locations (not necessarily contiguous) associated with values of C > 0.5 (Middlebrooks et al. 1998
) or
> 0.75. The stricter criterion was adopted for
because
tended to be modulated somewhat less than C overall. W has units of degrees and is further identified by subscripts for count or latency and azimuth or elevation: WC,az, WL,el, etc.
Spatial centroid (
). Following Middlebrooks et al. (1998
), we calculated a spatial centroid for each unit. The centroid is the spatial center of mass of a unit's "peak response." As for the calculation of W, locations were interpolated to a resolution of 0.2°. The peak response was then defined as the group of contiguous locations with C or
> 0.75 and including the overall maximum C or
. A further requirement for the calculation of spatial centroid was that the response fall to <0.75 at some locations (i.e., some locations were not included in the peak). Units that were modulated by less than this amount were classified as having no centroid ("NC" in Figs. 4 and 8). Otherwise, the spatial centroid, designated by
C or
L, was computed by generating a set of vectors
i whose angles were the (interpolated) stimulus locations
i included in the peak response and whose lengths were the values of C or
at those locations. The centroid
was defined as the angle of the resultant
![]() | (3) |
![]() | (4) |
is further denoted by subscripts for location type; e.g.,
C,az and
C,el for azimuth and elevation, respectively.
|
|
SPATIAL SENSITIVITY ASSESSED BY NETWORK ANALYSIS OF SPIKE PATTERNS. We estimated the spatial sensitivity afforded by temporal patterns of neural response using a pattern-recognition algorithm based on artificial neural networks (ANNs). The approach was similar to that described previously (Furukawa and Middlebrooks 2001
). Briefly, we sorted the neural response patterns into two sets obtained from even- and odd-numbered trials. One set ("training") was used for setting ANN parameters; the other set ("test") was used for testing the accuracy of the obtained ANN solution. The separation of training and test sets in this way provided for cross-validation of the pattern-recognition scheme. Times of spikes recorded on each trial were expressed with 100-µs precision. Next, average ("bootstrapped") response patterns were formed from samples of spike patterns on eight trials, drawn randomly with replacement from the training or test set of 20 responses to each combination of stimulus location and sound level. Twenty such average response patterns were generated from the training set, and another 20 from the test set, for each unique stimulus combination. Average response patterns were then convolved with a Gaussian impulse (
= 1 ms) and resampled at a resolution of 2 ms to produce spike density functions (SDFs). The bootstrapping and convolution operation served to low-pass filter the spike patterns <137 Hz and smooth out the otherwise-sparse SDFs used as input to the ANN. We refer to the resulting SDFs as "single-unit spike patterns"; they represent the smoothed responses recorded at one recording site. To analyze the combined information in responses from multiple cortical sites, we concatenated single-unit spike patterns for each stimulus to form long vectors, referred to as "ensemble spike patterns."
In addition to these full spike patterns, we also generated test and training sets for two "reduced" input types. For these, the input to the network consisted of a single numerical value corresponding to the spike count or response latency associated with each set of bootstrapped trials. The latency input was computed as the geometric mean, across the eight bootstrapped patterns, of first-spike latencies. Trials that elicited no spikes were excluded from the computation. The spike count input was defined as the (arithmetic) mean number of spikes elicited during the eight trials chosen for bootstrapping. Both input types were normalized for presentation to the network such that a value of 1 corresponded to the maximum latency or spike count, across the entire stimulus ensemble, and 0 corresponded to the minimum. Aside from normalization, generation of network inputs and targets was identical to that used by Furukawa and Middlebrooks (2001
).
Previous studies employed ANN architectures based on the feed-forward multi-layer perceptron for localization (Furukawa et al. 2000
; Middlebrooks et al. 1998
; Xu et al. 1998
) or learning vector quantization (LVQ) for classification (Furukawa and Middlebrooks 2001
). Here, a fixed architecture based on radial basis function (RBF) networks (Ghosh and Nag 2001
; Wasserman 1993
) is used for classification. The approach is similar to the LVQ networks used by Furukawa and Middlebrooks (2001
) in that one classifier "unit" is assigned to each stimulus location, and SDFs are classified based on their similarity to these units' input weight vectors. In both cases, weight vectors were initially set to the mean of training-set SDFs at each location. Furukawa and Middlebrooks (2001
) used the LVQ algorithm to further optimize the weight vectors, whereas in this case, weight vectors remained fixed at the location-conditional mean SDFs. Each classifier unit outputs a scalar value inversely related to the distance (dot product) between its weight vector (the mean SDF for its location) and the input SDF. Normalized, these outputs can be interpreted as posterior probabilities, providing for a number of useful analyses. In the interest of comparing our results directly to earlier studies (Furukawa and Middlebrooks 2001
), however, we converted the output probabilities to a single classification by selecting the unit with maximum activation (i.e., the most likely stimulus location) as the network's response for each input.
Although there are precise correspondences between the classification approach used here and previous efforts, some may object to our use of the term "artificial neural network" in reference to this classifier because it lacks a number of traditional characteristics of ANNs, notably optimization of parameters and non-linear classification rules. Empirical verification of this method showed little benefit of introducing more complex architectures or network training with this dataset, so we adopted the simplest approach in this case; it is by no means guaranteed to generalize to other neural populations or stimulus conditions. While we recognize the limitations of the approach, we refer to the classifier as an ANN in the interest of relating to past and future work employing more general architectures.
Networks were designed and simulated using a customized version of the MATLAB Neural Network Toolbox (The Mathworks, Natick MA). ANN estimates of stimulus locations were expressed as joint stimulus-response probability matrices (confusion matrices), from which we calculated the total stimulus-related (TSR) transmitted information, in bits, to assess the accuracy of ANN performance. The computation of transmitted information was identical to that described previously (Furukawa and Middlebrooks 2001
). Transmitted information (mutual information) reflects the amount of reduction in uncertainty about stimulus location given the set of network responses. One bit of transmitted information implies perfect discrimination of two regions of space (e.g., left vs. right) or more continuous discrimination with some error. Perfect identification of 18 locations corresponds to 4.17 bits. For the present study, we calculated the transmitted information from network classifications based on single-unit spike-patterns (TSRS), ensemble spike patterns (TSRE), and reduced spike-patterns consisting of only spike counts (TSRC) or response latencies (TSRL) obtained from single-unit responses.
Interpreting the TSR information rates estimated in this manner requires consideration of two potential sources of bias: overestimation due to finite samples and underestimation due to suboptimal ANN performance. First, because there are a limited number of SDFs to classify, the probabilities expressed in the confusion matrix are not exactly uniform; this results in a positively biased information estimate. We studied the effects of this bias for each unit by randomly permuting the stimulus labels assigned to each SDF and recomputing TSR. The process excludes systematic stimulus-related sources of information from the TSR calculation. We computed the average of 100 such permutations to estimate bias for each unit and found the biases to be sufficiently small that any effects on the current analyses were negligible (for azimuth, the mean and maximum bias across units were 0.025 and 0.057 bits, respectively; for elevation, they were 0.017 and 0.042). Second, it is rather unlikely that the ANN classifier used in this study could capitalize on all potential sources of information contained in the SDFs. The network architecture was designed for simplicity and computational efficiency rather than statistical optimality. Different (non-linear) architectures, approaches to parameter optimization, or methods of data representation would likely have resulted in somewhat different levels of classification performance.1 Hence information estimates reported here represent lower bounds on the total stimulus-related information contained in the SDFs. Each of these factors (bias and suboptimality) certainly had an effect on the absolute magnitude of TSR estimates, although neither was sufficient to justify modification of the method. More importantly, our focus in this report is on comparing information rates between cortical fields rather than accurately estimating them in absolute terms. Because all information estimates are based on the same set of methodsand assuming that neither bias nor ANN performance differ between the neural populations being comparedthese effects should have no effect on the interpretation of the current results.
Response patterns of auditory cortical neurons vary with stimulus level, potentially confounding location-related changes in neural responses. Because the ANN analysis depends upon recognition of stimulus-related response patterns, networks trained at one stimulus level can not perform accurately when tested at another level. Networks trained with responses to sounds that vary in level, however, learn to recognize level-invariant features of the response patterns and thus to appropriately classify responses to sounds varying over a similar range of levels (Middlebrooks et al. 1998
). In this case, the ANN's fixed architecture forces it to recognize such features because each classifier unit is responsible for recognition of all SDFs corresponding to a single location, regardless of level. Without level-invariant features to distinguish stimuli of different locations, the network will fail to correctly classify some of the stimuli. Except where stated otherwise, ANN analyses in the present study were performed using varying levels, 2040 dB above the unit's threshold in 10-dB steps; estimates of transmitted information reflect the network's dependence on features invariant over this range of levels.
STATISTICAL TESTS OF HYPOTHESES. Non-parametric permutation tests were used to compare distributions of various spatial statistics among cortical fields, stimulus levels, etc. Under the null hypothesis that there are no differences between the distributions, labels identifying the category membership (e.g., cortical field) can be reassigned freely without affecting any computable statistic of the distributions. We estimated the sampling distributions of statistics of interest (unless stated otherwise, the difference between two medians) under the null hypothesis by randomly reassigning category labels in 5,000 different permutations. The proportion of these exceeding (or falling below) the value computed with the original labelling gives the probability of type I error (P value). Unless otherwise noted, P values given in the text refer to this method. We adopted a fixed criterion of P < 0.05 for statements of statistical significance; in reporting the results of permutation tests, however, P values refer to the computed proportion of type I errors rounded to one significant digit. The permutation test has sensitivity limited by the number of permutations employed. Here, the maximum sensitivity is 0.0002 (1 type I error in 5,000 permutations); P < 0.0002 indicates that the actual value was more extreme than any obtained by random permutation. Other standard statistical tests (e.g., ANCOVA, linear regression) used the MATLAB statistics toolbox (The Mathworks).
| RESULTS |
|---|
|
|
|---|
Consistent with previous studies, we found a higher proportion of units with nonmonotonic rate-level functions in PAF than A1. To quantify this proportion, we computed the monotonicity ratio (Sutter and Schreiner 1995
) for each unitthe ratio of mean spike count for stimuli presented at the highest tested level to the maximum mean spike count across stimulus level. Ratios of 1 are obtained for units with monotonic RLFs, whereas ratios less than 1 indicate some degree of nonmonotonicity. Based on responses to broadband noise, and adopting a criterion of 0.5, we found only 10/117 (9%) nonmonotonic units in A1, but 86/267 (32%) nonmonotonic units in PAF. This proportion is notably smaller than the 7090% reported in previous studies of PAF using pure-tone stimulation and pentobarbital or ketamine anesthesia (Heil and Irvine 1998
; Kitzes and Hollrigel 1996
; Phillips and Orman 1984
; Phillips et al. 1995
).
PAF units generally responded better to tones than to noise stimuli, although PAF units in this study did not entirely fail to respond to noise as did those reported by Phillips et al. (1995
). Some A1 units showed a similar preference for tones, but overall, A1 units were relatively more responsive to noise. To quantify this difference, we calculated the noise/tone ratio for each unit. Briefly, we recorded unit responses to 80-ms noise bursts presented from a frontal or overhead location and varying over a range of
60 dB in level. Similarly, we recorded responses to 80-ms pure tones (rise/fall times were 5 ms) varying over 5 octaves in frequency (roughly 130 kHz) and 60 dB in level. The noise/tone ratio was defined as the ratio of mean spike counts elicited by the most effective noise and pure-tone stimuli. Distributions of this ratio for the two fields are shown in Fig. 1A. Distributions were computed by kernel density estimation (KDE), the result of convolving the data with a rectangular window of width 0.2; plotted values are equivalent to those of a histogram with continuously varying bin centers. Overall, PAF units tended to favor optimal tones over optimal noises, whereas many A1 units responded to noise at least as well as they did to tones. However, the majority of PAF units responded to broadband noise with no less than half the spike count elicited by the best tonal stimulus. Figure 1B plots the monotonicity ratio against the noise/tone ratio for each unit, demonstrating a weak positive correlation between the two measures. Nonmonotonic units (falling below - - -) showed weaker responses to noise, whereas monotonic units (above - - -) showed a broader range of noise response. That there are fewer nonmonotonic units in A1 (
) than PAF (
) at least partially explains the difference in noise/tone ratio between the cortical fields. Each of these findings, that PAF shows a higher proportion of nonmonotonic units than A1 and that nonmonotonic units in PAF do not respond as well to broadband as narrowband stimuli, are consistent with the results of previous studies (Heil and Irvine 1998
; Phillips et al. 1995
).
|
Also in agreement with previous studies (Heil and Irvine 1998
; Kitzes and Hollrigel 1996
; Loftus and Sutter 2001
; Phillips and Orman 1984
; Phillips et al. 1995
), we found PAF units to respond with longer latency than A1 units. Distributions of overall response latencymedian L across azimuthare shown in Fig. 2 for PAF (solid lines), A1 (shaded region), and an additional population of 40 units previously recorded in A2 (- - -; unpublished observations from Furukawa and Middlebrooks 2001
). Median overall latency in PAF (33.2 and 29.5 ms at 20 and 40 dB above threshold) was significantly longer than in A1 (19.1 and 17.2 ms) or A2 (22.0 and 20.6 ms).
|
Example responses of PAF and A1 neurons to stimuli varying in spatial location are presented in Fig. 3. Generally, units in both areas were most responsive to contralateral azimuths (A, D, E, G, and H), although some responded best to ipsilateral (B and C) or midline azimuths (not shown). Units in both areas tended to be fairly non-selective to variation in stimulus elevation (J and L); units that did show a preference tended to favor elevations around +40° (I and K), presumably on the acoustic axis of the pinna. Compared to A1, units in PAF had longer response latencies that tended to vary more with stimulus location. In general, response latencies and spike counts appeared to vary in inverse proportion to one another (e.g., Fig. 3A), although this was not always the case in PAF (Fig. 3C).
Stimulus-dependent variations in spike count
Azimuth and elevation centroids based on spike counts (
C,az and
C,el, see METHODS) were calculated for each unit at 20 and 40 dB above threshold. Distributions of centroids in each cortical field appear in Fig. 4. A proportion of units were so broadly tuned that no centroid could be determined; these are indicated as "NC" in the figure. In all cases, A1 showed higher proportions of such units than did PAF. The majority of units in both cortical fields had azimuth centroids located in the contralateral hemifield (negative azimuths). Elevation centroids were distributed more uniformly, except at low sound levels, at which
40% of A1 units had centroids between +10 and +50°. We computed the
2 "goodness of fit" statistic, using 20° bins, as an index of each distribution's non-uniformity; we used standard permutation tests (see METHODS) to compare these. At 20 dB above threshold, the distribution of azimuth centroids was significantly more uniform in PAF than A1 (P < 0.0006, goodness of fit:
(17df)2 = 99.90 in PAF, 135.29 in A1). Distributions of elevation centroids at the same level were not significantly different in uniformity (P > 0.05), although PAF and A1 distributions were significantly different from one another (
2 contingency test:
2(13df) = 37.95, P 0.05). Distributions of
C centroids in PAF were similar at 20 and 40 dB above threshold, whereas in A1 the distributions flattened somewhat at the higher level. As a result, centroid distributions 40 dB above threshold did not differ significantly between the two cortical fields (
2 contingency test, azimuth:
(17df)2= 25.25, P > 0.05, elevation:
(13df)2 = 14.57, P > 0.05).
Distributions of
C, the depth of spike-count modulation across space, are plotted in Fig. 5A. In both cortical fields, spike-rate modulation was shallower for stimuli presented at 40 dB than 20 dB above threshold (P < 0.0002), indicating a reduction of spatial selectivity at high stimulus levels. At the higher level, however, distributions of modulation depths across both azimuth and elevation differed significantly between PAF and A1 (elevation: P < 0.0002; azimuth: P < 0.0006), with PAF showing deeper modulation of spike count across location. There was no significant difference between the two fields at the lower level (P > 0.05).
|
Figure 5B plots distributions of tuning width, WC. The format is the same as in Fig. 5A, and the results are comparable: tuning widths in both azimuth and elevation increased with stimulus level in both cortical fields. Also as in Fig. 5A, there were no significant differences between azimuth tuning width in PAF and A1 at the lower stimulus level (P > 0.05), although the fields did differ in elevation tuning width (P < 0.02). At the higher level, PAF units exhibited narrower tuning along both dimensions (azimuth: P < 0.01; elevation: P < 0.0002). Thus spatial sensitivity was more resistant to sound-level increases in PAF than in A1. Additionally, nonmonotonic units were more sharply tuned than monotonic unitsregardless of levelin PAF (azimuth: P < 0.001, elevation: P < 0.01). Similar differences were observed in A1, although the small number of nonmonotonic A1 units resulted in low statistical power and no significant differences except for azimuth tuning widths measured 40 dB above threshold (P < 0.01).
Stimulus-dependent variations in response latency
In addition to spike counts, we assessed spatial sensitivity by relating changes in response latency to stimulus locations. By analogy to the modulation of spike count (
C, Fig. 5A), we characterized units by their range of variation in response latency across stimulus locations (
L). As shown in Fig. 2, overall response latencies were significantly (
20 ms) longer in PAF than in A1 or A2. Of particular relevance to this study, however, these increased latencies do not result simply from delayed overall response times in PAF but are accompanied by increased stimulus-related variation. Figure 6 plots distributions of
L for PAF, A1, and A2 at 20 and 40 dB above threshold. In all cases, PAF units showed significantly more spatial variation of response latency than A1 (P < 0.0002 at all tested levels). Incidentally, identical results were obtained when the range of latency variation was expressed as a ratio relative to each unit's median latency (P < 0.0002 at both levels). The majority of A1 and A2 units had response latencies that varied <10 ms across azimuth (median
Laz in A1: 3.5 ms, A2: 8.6 ms), while a majority of PAF units had latencies varying by twice these amounts (median
Laz = 19.3 ms). Increasing stimulus level had the effect of reducing
L in both areas (P < 0.0002), but even at 40 dB above threshold, a significant number of PAF units demonstrated ranges of greater than or equal to 10 ms.
|
Two issues arise with respect to stimulus-related variation in response latency. The first is whether response latencies vary independently of spike counts. To assess this, we computed, for each unit, the correlation between L and C across location. Negative values indicate that locations eliciting many spikes also elicit short-latency responses; this is generally to be expected, as high spike count and short response latency are both general indicators of effective stimulation. Distributions across azimuth of the correlation of L with C, 20 dB above threshold, are plotted for PAF and A1 in Fig. 7A. Nearly all A1 units had strong negative correlations, most with values falling between -0.5 and -1. Most PAF units also showed negative correlations, but the distribution of values in PAF was centered nearer to zero (P < 0.002); a fair number of PAF units even showed positive correlations, indicating high spike counts at locations eliciting long-latency responses. Consistent with this result, the two fields differed in the proportion of units whose azimuth centroids fell in opposite hemifields when computed by count or latency (Fig. 7B). Together, these findings suggest that in PAFand not in A1there exist a number of units (e.g., Fig. 3C) whose response latencies and spike counts vary independently and may therefore constitute independent spatial codes.
|
A second issue is whether latencies vary specifically with stimulus features related to space or rather simply as a result of differences in the effective levels of stimuli located near or away from the pinna's acoustic axis. Many auditory units show response latencies that vary systematically with stimulus level, and effective levels are subject to the directional acoustics of the cat's pinnae (Middlebrooks and Pettigrew 1981
). We can predict the effects on response latency of these changes in level by computing the effective level for each stimulus position. To do so, we positioned an insert microphone in the ear canal contralateral to the recording site in one cat and measured directional impulse responses for each loudspeaker position. From these, we calculated the RMS gain for each stimulus location (corrected for the loudspeaker response). We thus calculated the effective levels of individual stimuli presented to each cat, for stimuli varying in azimuth and elevation as well as a separate set of stimuli varying in level but not location. Generally, the latter were presented from either straight ahead (0°) or nearly overhead (+80° elevation). By interpolating the latencies of level-varying stimuli according to the effective levels of location-varying stimuli, we calculated
, response latency predicted by effective level, for each stimulus. The corresponding range of latency variation 
was computed similarly to
L (Eq. 2). Overall, effective level predicted a smaller range of latency variation than was observed (permutation test on
L vs. 
, P < 0.0002 for azimuth, elevation at all tested levels). This was true for both PAF and A1 and suggests that variation of response latency across space is not predicted by purely monaural effects.
Figure 8 plots the distributions of spatial centroids (
L) based on latency. Across both azimuth and elevation, and at both tested levels, the proportion of units for which no centroid could be calculated (NC or "untuned" units) was higher in A1 than in PAF, consistent with the reduced latency variation in A1 (Fig. 6). Among tuned units, however, the distributions of centroids did not differ between the two fields (
2 contingency test, azimuth:
(17df)2 elevation:
(13df)= 18.4 @ 20 dB, 11.8 @ 40 dB, P > 0.05;2 = 18.0 @ 20 dB, 10.5 @ 40 dB, P > 0.05). These generally followed the basic acoustics of the pinna with shortest latencies observed in response to contralateral azimuths and around +60° in elevation. Distributions broadened somewhat and included more NC units 40 dB above threshold, consistent with the level effects apparent in Figs. 3 and 6. Additionally, distributions of azimuth centroids at the higher level appear shifted toward the midline, away from eccentric contralateral locations.
Distributions of tuning width WL are plotted in Fig. 9. Tuning widths were narrower in PAF than A1 (P < 0.0002 at all tested levels, azimuth and elevation) and were less affected by increasing SPL (permutation tests on WL (20 dB) - WL (40 dB), P < 0.0002 for azimuth and elevation).
|
ANN analysis of transmitted information
We used a pattern-recognition algorithm based on ANNs to analyze the mutual or transmitted information between stimulus locations and network estimates of location based on neural spike patterns. The TSR information provides an indication of the accuracy with which spike patterns serve to encode stimulus locations. The TSR information estimate served to quantify the units' overall spatial sensitivity. Distributions of the single-unit transmitted information (TSRS) provided by each unit's full spike pattern are plotted in Fig. 10. The results for networks trained on stimuli varying in azimuth compare favorably to those obtained in area A2 by Furukawa and Middlebrooks (2001
), with the majority of units in PAF and A1 providing 0.51 bits of information. PAF units provided significantly more information than did A1 units (median TSRS in PAF: 0.70 bits, A1: 0.58 bits, P < 0.0002). A similar result was obtained when networks were trained to classify stimuli varying in elevation (median TSRS in PAF: 0.47 bits, A1: 0.41 bits, P < 0.0002).
|
When networks were trained using reduced inputs conveying only spike counts or response latencies, information rates decreased (Fig. 11). The spike counts of PAF and A1 units transmitted similar amounts of TSRC information regarding stimulus azimuth (left, P > 0.05). In contrast, the two fields differed significantly in TSRL, estimated from response latencies (right, P < 0.001). The latter finding is consistent with the increased azimuth-dependent variation of response latency observed in PAF (Fig. 2) and strongly suggests that the difference in TSRS between PAF and A1 units (Fig. 10) is mediated by differences in latency coding.
|
We used stepwise regression to explore the relationship between information transmitted by full spike patterns (TSRS) and information transmitted by count or latency (TSRC or TSRL). Step-wise regression provides a method for computing the proportion of variance explained by one set of predictor variables, independent of that already explained by another set. The method involves computing differences in R2 for models using different sets of predictor variables and is often used to test the significance of each variable's contribution to model fit. Here, our interest is in partitioning the total TSRS varianceacross units in each areaaccording to the independent contributions of count and latency. Altogether, count and latency accounted for 84.9% of variance in TSRS across units in PAF and 80.9% in A1. This amount can be split into three independent classes: variance explained by count alone, variance explained by latency alone, and variance explained by both count and latency. Just over 51% of total variance in TSRS (in both areas) fell into the last category, suggesting that count and latency encode space more or less redundantly for a majority of units. In PAF, 26.7% of total variance was accounted for by latency (TSRL) independent of count, whereas only 6.8% was explained by count (TSRC) independent of latency. We conclude from this that the improved spatial coding in PAF is largely a result of information transmitted by latency. That is, PAF units that show a high degree of spatial coding make effective use of latency codes, even those units that also display effective rate-codingvery few units are effective rate coders but ineffective latency coders. In contrast, A1 appears to contain similar numbers of units adopting either strategy alone, with 11.7 and 18.0% of variance independently explained by latency and