|
|
||||||||
Innovative Methodology
1Center for Visual Science, Department of Brain and Cognitive Sciences, and 2Department of Neurobiology and Anatomy, and Center for Navigation and Communication Sciences, University of Rochester, Rochester, New York 14627
Submitted 14 November 2003; accepted in final form 8 January 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
To understand the response properties of neurons that are selective for complex stimuli one would like to determine which features of the sensory stimulus drive the neural responses. An understanding of these features could provide important insights into the perception of complex stimuli. Although a number of studies have addressed this issue it remains problematic for several reasons. First, stimuli used to define receptive fields in primary sensory areas, including pure tones and long white noise sequences, often do not produce reliable responses in higher-order sensory areas. For example, cortical neurons located in the lateral belt auditory region are often unresponsive to pure tones but can be driven with band-passed noise and species-specific vocalizations (Rauschecker 1998a
,b
; Rauschecker and Tian 2000
; Rauschecker et al. 1995
). Furthermore, auditory neurons located in the macaque vlPFC, which respond well to human and monkey vocalizations, often show no response to short sequences of white noise or pure tones (Romanski and Goldman-Rakic 2002
). Second, the selectivity of neurons in higher-order sensory areas for complex stimuli implies that the responses are strongly nonlinear functions of the sensory inputs (Bar-Yosef et al. 2002
; Lau et al. 2002
; Mechler et al. 2002
; Rauschecker et al. 1995
; Sahani and Linden 2002; Salinas et al. 2000
; Tanaka et al. 1991
). Therefore reverse correlation techniques (Marmarelis and Marmarelis 1978
), which in practice are limited to approximating arbitrary nonlinearities using first- or second-order polynomial expansions, may not be effective in approximating the real nonlinearities. Furthermore, accurate estimation of the terms in even a second-order expansion requires stimulus sequences that are prohibitively long and problematic to use in awake, behaving animal experiments. Thus defining receptive fields, or the features to which these neurons are responsive, is particularly challenging.
An alternative approach is to carry out feature elimination. In this approach, one first searches through a set of complex stimuli to find a stimulus to which the neuron responds strongly. After the stimulus to which the neuron has the strongest response has been identified, "features" are removed from this preferred stimulus, and the effect on the neural response is assessed. If the neuron responds as strongly to the reduced stimulus, it is concluded that the neuron was responding to the remaining features. In the auditory system, feature elimination has been done by high- or low-pass filtering of the stimulus, or by eliminating the spectral structure, while retaining the temporal envelope of the power (Rauschecker et al. 1995
). In the visual system, this has been done by replacing images with simplified versions composed of oriented bars (Tanaka et al. 1991
), by high- or low-pass filtering stimuli such as faces (Rolls et al. 1987
), or by finding the simplest geometrical shape that could drive a neural response (Kayaert et al. 2003
).
Although feature elimination is somewhat ad hoc, behavioral studies can provide insight into the dimensions or features that are relevant to neurons in higher-order cortical areas. Assuming that as we move up the sensory hierarchy the responses of neurons are more closely related to reportable percepts (Crick and Koch 1995
; Sheinberg and Logothetis 1997
), we might expect auditory neurons in vlPFC to be related to the animal's perception of communication calls. The acoustic features that are relevant in human speech perception have been studied intensely (Pickett 1999
). These studies have found that the prominent spectral features of phonemes, known as formants, play an important role in speech perception (Nearey 1989
). In other studies, researchers who have studied the electronic representation and transmission of speech have found that the relative phase between the frequency components in a vocalization can carry much of the information of the original sound (Oppenheim and Lim 1981
).
Guided by these behavioral insights, as well as theoretical considerations described in the DISCUSSION, we have developed a feature-elimination approach for exploring the features to which auditory neurons in the prefrontal cortex might respond. We have extracted principal components (PCs) and independent components (ICs) from a set of rhesus macaque vocalizations, which have been shown to be effective at driving neurons in the lateral belt auditory cortex (Rauschecker 1998a
; Rauschecker et al. 1995
; Tian et al. 2001
) and vlPFC (Romanski and Goldman-Rakic 2002
; Romanski et al. (2003
). SFN Abstract 722.13). The PCs and ICs extracted from each stimulus allow us to define features based on the second-order (in the case of PCs) and higher-order (in the case of ICs) statistics of each vocalization. Each PC or IC corresponds to a feature of the vocalization, and due to the way the components are defined; the complete set of PCs or ICs corresponds to all of the features present in the stimuli. By selecting a subset of the components, we can create a filtered version of the vocalization, which is composed of only those features that correspond to the components retained. In this paper, our goal is to understand and illustrate the features extracted by each technique, and the statistics of the stimuli produced by filtering with subsets of the PCs and ICs. We will also illustrate the use of the approach to examine the selectivity of an auditory cortical neuron recorded in vlPFC. Examination of the features extracted by each technique will show that the PCs correspond closely to the main Fourier features of the sounds, which are related to the formants of the vocalizations. Conversely, the ICs correspond to features that preserve the relative phase across a set of frequencies (Bell and Sejnowski 1996
). Because the features extracted by the 2 techniques can be characterized well, we can directly relate neural responses to PC- and IC-filtered stimuli to specific behavioral (Nearey 1989
; Nossair and Zahorian 1991
) and theoretical (Lewicki 2002
; Linsker 1988
) hypotheses. Finally, although this technique was developed and is illustrated on macaque vocalizations, it could easily be applied to visual images for studying the visual system.
| METHODS |
|---|
|
|
|---|
We will use a number of signal processing techniques, along with principal-component analysis (PCA) and independent-component analysis (ICA), to examine the statistical features extracted by each technique. To begin the analyses, all calls were preprocessed by filtering and down-sampling to 20 kHz, which retained most of the information in most of the calls. Timefrequency distributions were estimated by calculating a windowed spectrogram (Cohen 1995
), with a window width of 256 samples. These spectrograms were smoothed with a symmetric Gaussian window, with a SD of 5 samples.
We will also use bispectra to analyze the vocalizations. Although bispectra have not been used commonly to analyze auditory data, they provide insight into characteristics of the sounds beyond that which can be shown by spectrograms. For example, the spectrograms are only a representation of the second-order statistics of the sounds, whereas the bispectra are a measure of the third-order statistics. To understand the features to which the bispectrum is sensitive, note that when complex numbers are multiplied, as in the case of F(
1) and F(
2), their product has a phase equal to the sum of the phases of each component. When this is multiplied by the conjugate of the frequency equal to the sum of the frequencies, the result is the phase difference between the product of the frequencies, and the frequency equal to their sum. If the phase difference is consistent across the segments of the vocalization, the frequencies are said to be phase coupled. Therefore if the relation
![]() | (1) |
indicates the phase at the indicated frequency and C is a constant, there will be a peak in the bispectrum at
1,
2. Thus the bispectrum indicates the consistency of the phase relationship between a pair of frequencies and their sum. This fact can be used to illustrate a key difference between the spectrogram and the bispectra. As an example, consider 2 signals, both of which have power at 5, 10, and 15 Hz. The first signal has a constant relative phase between the frequencies as a function of time, whereas the second signal has a random relative phase between the frequencies as a function of time. Both signals will have the same spectrograms, but only the first signal will show a peak in the bispectra, whereas the second signal will not have any peaks in its bispectra.
The bispectra were estimated by calculating the expected value of the triple product for pairs of frequencies and their sum (Nikias and Mendel 1993
). The equation for estimation of the bispectra is
![]() | (2) |
All results presented in the paper will be presented in terms of the bicoherence, which is directly derived from the bispectrum. The bicoherence normalizes the power at each pair of frequencies in the bispectrum, by the power in the spectrum at each frequency, and is defined as
![]() | (3) |
) is the power at frequency
. The bicoherence reveals consistent phase relations between frequencies, independent of the power at those frequencies, whereas the power in the bispectrum is a function of both the phase and the power at the relevant frequencies. Therefore the bicoherence more clearly shows the higher harmonics, which are not as strong in the bispectra, because there is less power at high frequencies. As we will see in the following text, the representation extracted by the ICs is most clearly seen in the bicoherence plots. Principal and independent components
Both PCA and ICA have been treated extensively in the literature (Hyvarinen et al. 2001
; Johnson and Wichern 1998
). Both models assume that the observed variables to be modeled are the result of linear mixing among latent variables, which can be written as
![]() | (4) |
Both models were fit to a data matrix compiled by selecting a filter order, which established the frequency resolution of the analysis, and building trials out of time-shifted samples from the vocalizations. We used a filter order of 512, equivalent to 25.6 ms, which resolved the important features of the vocalizations. Each row of the matrix represents a separate random variable, and each column represents a trial. Therefore each column of the data matrix consisted of 512 samples from the stimulus, and subsequent trials were extracted from the vocalization by shifting one time step to the right. Specifically, the data matrix was given by
![]() | (5) |
|
![]() | (6) |
![]() | (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
![]() | (11) |
Because the ICA algorithm is subject to local maxima, it was run 4 times on each sound, and the best run for each sound was retained, with best defined below. Also, unlike the eigenvectors and their associated eigenvalues in the PC analysis, the ICs are not generated with any predefined rank. Because our goal was to use the ICs to extract the higher-order features from the sounds, we defined a cost function to measure the quality of the ICs extracted from the sounds. Our measure assessed the amount of power in the original bispectrum, which was preserved in the IC features selected. The cost function was
![]() | (12) |
Filtering with the principal and independent components
We had to develop a technique for filtering our original sounds using the PC and IC features. We did this by first projecting each column of our original data matrix, S, given in Eq. 5, into a subspace defined by the corresponding algorithm. The subspace is defined by placing each of the retained components into the columns of a smaller matrix Ws, where the number of columns of Ws is equal to the number of components retained, and the number of rows is equal to the filter order, in our case 512. In other words, Ws contains a subset of the columns of W. This is equivalent to setting the value of the latent variable v, given in Eq. 4, to zero, for some components. The projection is a matrix multiplication, where the data matrix given in Eq. 5 is projected into the column space of Ws. This is given explicitly by
![]() | (13) |
Electrophysiological recording methods
We recorded extracellular neuronal activity from the frontal lobe of awake, behaving macaque monkeys (Macaca mulatta) in response to auditory stimuli, which included species-specific vocalizations. Single-unit and multiunit activity were recorded from chronically implanted recording chambers centered over the vlPFC auditory region, which had been identified in previous anatomical and physiological studies (Romanski and Goldman-Rakic 2002
; Romanski et al. 1999
). All surgical, behavioral, and electrophysiological procedures were in accordance with National Institutes of Health guidelines and with University of Rochester Committee on Animal Resources (UCAR) and were described previously (Romanski and Goldman-Rakic 2002
). Neuronal activity was acquired and digitized during an active listening task where monkeys fixated a central point on a monitor while vocalization and nonvocalization stimuli were presented from speakers (Audix, PH-5vs), located 30 in. in front of the monkeys. Sounds were presented at 6075 dB SPL measured at the level of the monkey's ears. Neurons that were responsive to monkey calls were tested with a subset of calls, which included normal, PC-filtered, and IC-filtered versions. An example of a vlPFC neuronal response to the normal and filtered call versions is presented in this study to illustrate the utility of PC- and IC-filtered sounds for auditory physiological analysis. Detailed analyses of vlPFC neuronal responses to PC- and IC-filtered sounds will be presented in a future publication.
| RESULTS |
|---|
|
|
|---|
Second- and third-order statistics of vocalizations in the frequency domain
We began by estimating the spectral and bispectral statistics of the set of vocalizations. In the first analysis, we explored the spectrograms of the vocalizations, an approach that has been applied extensively in the auditory domain. In Fig. 1, the spectrograms of 3 example vocalizationsa coo, a girney, and a gruntare shown. The coo and the girney have the clearest higher-order harmonics, which can be seen as parallel horizontal lines in the spectrogram plots. The girney is characterized by more energy in the higher-order harmonics than the coo. The grunt, in contrast, is noisier. The harmonic structure of these calls is common in mammalian vocalizations, attributed to the anatomy and physiology of the vocal apparatus (Fant 1960
). The oscillation of the larynx during the production of voiced phonemes, like the coo and the girney, produces a series of air pressure pulses similar to a sawtooth function. This sequence produces the harmonic structure, with the spacing between the harmonics controlled by the fundamental frequency of the oscillation of the larynx. This basic feature of the calls is responsible for many of the aspects of the vocalizations explored below, given that it leads to phase locking across harmonically related frequencies. The grunt is more similar to an unvoiced phoneme, and thus has a less-distinct harmonic structure.
In Fig. 2, we show bicoherence plots for the same 3 calls shown in Fig. 1. The bicoherence plots show power where there is phase coupling across frequencies (see METHODS), and therefore they are a measure of the consistency of the phase across the harmonics seen in Fig. 1. The bicoherence plots of the coo and the girney show a grid of power at all multiples of the fundamental frequency. This is due to the strong phase locking across the harmonics. Other calls have a less-regular structure. For example, the grunt shows phase locking among a number of higher frequencies, but not the regular pattern of the coo. The girneys, as a class of calls, are more variable than the coos. Some show the strong harmonic regularity evident in this example whereas others are less regular. The coos on the other hand consistently have the regular harmonic structure.
|
Figure 3 shows the first 8 PC filters (i.e., those that describe the most variance), derived from the coo. For a stationary Gaussian process, and a sufficiently long filter window, the PCs would be equal to the Fourier components (Fuller 1996
). We see a close correspondence to the Fourier components in the PCs of the coo. Most of the power is concentrated at a single frequency, and the PC filters are matched 90° phase-shifted (sine/cosine or derivative) pairs. Deviation between the PC filters and Fourier components is a consequence of nonstationarities in the calls, as well as the truncation of the filter length at 512, although it can be seen that each PC filter is dominated by a single frequency.
|
|
|
|
|
|
|
|
The manipulations were developed for isolating essential features of macaque vocalizations, to which auditory neurons might be responsive. As an example of the application of this approach, Fig. 11 shows neural responses to unfiltered and filtered versions of a shrill bark vocalization, as well as the spectrograms for the calls. This example should not be considered representative of the population because individual neurons in vlPFC differ widely in their response to complex auditory stimuli. The spectrograms show that the 2 filtering techniques extract different features from the calls. The neural response to the IC-filtered call is almost the same as the neural response to the unfiltered call, whereas the neural response to the PC-filtered call is smaller. This is the case for this particular single unit, even though the PCs conserve more of the total variance of the original call (see Fig. 10A, Shrill Bark).
|
| DISCUSSION |
|---|
|
|
|---|
Behavioral studies in audition as well as vision, have explored the ability of subjects to make perceptual discriminations of stimuli that have been altered to preserve either their phase (Oppenheim and Lim 1981
) or their spectral power (Nearey 1997
). As we have shown above, the sounds generated by filtering with the PCs retain the prominent spectral features of the unfiltered sounds, which are often referred to as the formants (Fant 1960
), in human language studies. Formants, defined as the dominant peaks in the power spectra of phonemes, have been shown to be strong perceptual cues to the identification of phonemes and syllables (Nearey 1989
, 1997
). Therefore use of PC-filtered vocalizations in auditory neurophysiology can help determine whether formants, which appear to be important behaviorally for sound recognition, are the essential features driving vocalization-responsive neurons. A related approach, which has been used in auditory neurophysiology, involves replacing a sound with pure tones at the dominant frequencies (Bar-Yosef et al. 2002
; who also used a number of other manipulations). However, this approach also disrupts the relative phase between the formants. Although the PCs do not preserve the global phase structure of the vocalizations, they do preserve the relative phase of the frequencies that they extract. Therefore they can be used to test the hypothesis that it is not only the power at the formants, but the relative phase at the formants that is important.
Another feature that might be important for auditory neurons is the relative phase between frequencies. In visual processing, phase has been shown to be important for object recognition (Oppenheim and Lim 1981
; Piotrowski and Campbell 1982
), although only a crude phase resolution was found to be important. We have shown that the relative phase structure across frequencies can be preserved by the ICs. Neural responses in cortical auditory areas, including A1, are often nonlinear (Rauschecker et al. 1995
; Sahani and Linden 2002) and therefore can be sensitive to the higher-order statistical structure of the vocalizations. The ICs are sensitive to this structure as well, as can be seen by their ability to preserve third-order correlations, measured with the bicoherence, in the vocalizations. This makes ICs plausible candidates for higher-order neural representations of these sounds.
It is important to point out that the PC- and IC-filtered vocalizations differ considerably from vocalizations that could be produced by either scrambling the phase, or whitening the power spectra while retaining the relative phase. These manipulations would test a more specific hypothesis, that it was only the phase, or only the power spectrum, that was essential. As we have shown, the PCs do not scramble the phase, but rather retain only the relative phase at the frequencies corresponding to the PCs retained. Furthermore, the ICs do not whiten the power spectra while retaining the phase structure, given that the ICs do not introduce power at frequencies that were not present in the original stimuli. Although some aspects of these techniques are novel, others are similar to previously used approaches. For example, it is possible to design a zero-phase linear filter that would produce calls similar to those produced by the retention of a set of PCs, although this is not exactly true, given that the PCs do not correspond exactly to Fourier components. We have mainly developed the similarity between PCA and the Fourier domain to help interpretation of the features extracted by these techniques. It is more difficult to draw a direct analog between the ICs and previously used analyses because a linear filter cannot produce the same filtered calls that the ICs produce. Furthermore, both the PC and IC approaches provide not only a means of filtering out "interesting" features from the calls but also a means of defining, in a consistent way, what is interesting.
The PC- and IC-filtered vocalizations can also be related to reverse correlation methods used in the study of sensory processing. Linear filters, or the first-order component of the Volterra series, are not sensitive to relative phase between frequencies, but only to the amount of power in the frequencies that they pass. However, second-order filters, which also correspond to spectral temporal receptive fields (Aertsen and Johannesma 1981
), can be sensitive to the relative phase between frequencies (i.e., the relative timing of power at various frequencies is as important as the amount of power). Therefore a cell's relative sensitivity to either the preservation of the principal spectral or phase components of a call is related to the order of the nonlinearity of the cell's receptive field.
As a final motivation for the use of these techniques to explore sensory responses, principal and independent components (Bell and Sejnowski 1997
; Linsker 1988
), as well as related methods known under the general heading of generative models, have been used to model sensory processing. These models have been developed under the assumption that the goal of sensory processing is to maximize the mutual information between the peripheral sensory processors and the central representations of the sensory input (Nadal and Parga 1994
). These models have also been developed under the related assumption that feedback connections in sensory cortices are used to model the sensory input (Hinton and Ghahramani 1997
; Mumford 1994
). Previously, these approaches have been applied to data sets consisting of members of an entire class of sensory stimuli, for example, natural images or sounds. When they are applied in this way, the filters generated by these approaches have been presented as models for the receptive fields of early stages of sensory processing.
The analyses presented also provide a useful description of the higher-order statistics of the vocalizations. Bispectra as well as the bicoherence are insensitive to Gaussian noise because Gaussian signals have no cumulants beyond second order, and the bispectra can be computed from the third-order cumulant (Papoulis 1991
). Therefore the features of the calls that are represented by the bispectra can be easily detected even in colored Gaussian noise. It has been shown that neural responses to background Gaussian noise can be suppressed by tones that are played on top of this noise (Nelken et al. 1999
). This could be important for extracting information from the acoustic biotope, given that many of the unimportant sounds may simply add together to form a Gaussian background, attributed to the central limit theorem. As we have seen, however, nonhuman primate vocalizations contain rich higher-order structure, and therefore could be easily filtered out of the background, by focusing on the higher-order aspects of the sounds. This of course assumes that the structure evident in the bicoherence can be used to differentiate the calls, a subject of ongoing work.
Defining the essential features of a stimulus that drive neuronal responses is an important goal in sensory neurophysiology. Assuming that some neurons in higher-order cortical processing regions are selectively responsive to vocalizations, we can, in principle, restrict our search to only the dominant features of the vocalizations. PCA allows us to constrict our search to the portion of the distribution of sounds spanned by the prominent second-order statistics of the vocalizations, and ICA allows us to constrict our search to the portion of the distribution of sounds spanned by the prominent higher-order statistics of the sounds. By filtering the sounds with these features we can test neurons for residual responses to the sounds defined by these features. These techniques provide powerful tools for exploring the feature space to which high-level auditory or visual sensory neurons are responsive.
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: Bruno B. Averbeck, Center for Visual Science, Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627 (E-mail: baverbeck{at}cvs.rochester.edu).
| REFERENCES |
|---|
|
|
|---|
Bar-Yosef O, Rotman Y, and Nelken I. Responses of neurons in cat primary auditory cortex to bird chirps: effects of temporal and spectral context. J Neurosci 22: 8619-8632, 2002.
Baylis GC, Rolls ET, and Leonard CM. Selectivity between faces in the responses of a population of neurons in the cortex in the superior temporal sulcus of the monkey. Brain Res 342: 91-102, 1985.[CrossRef][ISI][Medline]
Bell AJ and Sejnowski TJ. Learning the higher-order structure of a natural sound. Netw Comput Neural Syst 7: 261-266, 1996.[CrossRef]
Bell AJ and Sejnowski TJ. The "independent components" of natural scenes are edge filters. Vision Res 37: 3327-3338, 1997.[CrossRef][ISI][Medline]
Cohen L. Time-Frequency Analysis. Upper Saddle River, NJ: Prentice-Hall, 1995.
Coifman RR and Donoho DL. Translation-Invariant De-Noising. Stanford, CA: Department of Statistics, Stanford University, 1995.
Crick F and Koch C. Are we aware of neural activity in primary visual cortex? Nature 375: 121-123, 1995.[CrossRef][Medline]
Davis MH and Johnsrude IS. Hierarchical processing in spoken language comprehension. J Neurosci 23: 3423-3431, 2003.
Desimone R, Albright TD, Gross CG, and Bruce C. Stimulus-selective properties of inferior temporal neurons in the macaque. J Neurosci 4: 2051-2062, 1984.[Abstract]
Fant G. Acoustic Theory of Speech Production. The Hague: Mouton, 1960.
Fuller WA. Introduction to Statistical Time Series. New York: Wiley, 1996.
Gross CG, Rocha-Miranda CE, and Bender DB. Visual properties of neurons in inferotemporal cortex of the macaque. J Neurophysiol 35: 96-111, 1972.
Hinton GE and Ghahramani Z. Generative models for discovering sparse distributed representations. Philos Trans R Soc Lond B Biol Sci 352: 1177-1190, 1997.[CrossRef][ISI][Medline]
Hyvarinen A, Karhunen J, and Oja E. Independent Component Analysis. New York: Wiley, 2001.
Hyvarinen A and Oja E. A fast fixed-point algorithm for independent component analysis. Neural Comput 9: 1483-1492, 1997.[Abstract]
Johnson RA and Wichern DW. Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice Hall, 1998.
Jones JP and Palmer LA. The two-dimensional spatial structure of simple receptive fields in cat striate cortex. J Neurophysiol 58: 1187-1211, 1987.
Kanwisher N, McDermott J, and Chun MM. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci 17: 4302-4311, 1997.
Kayaert G, Biederman I, and Vogels R. Shape tuning in macaque inferior temporal cortex. J Neurosci 23: 3016-3027, 2003.
Kobatake E and Tanaka K. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. J Neurophysiol 71: 856-867, 1994.
Lau B, Stanley GB, and Dan Y. Computational subunits of visual cortical neurons revealed by artificial neural networks. Proc Natl Acad Sci USA 99: 8974-8979, 2002.
Lewicki MS. Efficient coding of natural sounds. Nat Neurosci 5: 356-363, 2002.[CrossRef][ISI][Medline]
Linsker R. Self-organization in a perceptual network. IEEE Computer 21: 105-117, 1988.
Marmarelis P and Marmarelis V. Analysis of Physiological Systems: The White-Noise Approach. New York: Plenum Press, 1978.
Mechler F, Reich DS, and Victor JD. Detection and discrimination of relative spatial phase by V1 neurons. J Neurosci 22: 6129-6157, 2002.
Mumford DB. Neuronal architectures for pattern-theoretic problems. In: Large Scale Neuronal Theories of the Brain, edited by Koch C and Davis J. Boston, MA: MIT Press, 1994, p. 125-152.
Nadal JP and Parga N. Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer. Network 5: 565-581, 1994.
Nearey TM. Static, dynamic, and relational properties in vowel perception. J Acoust Soc Am 85: 2088-2113, 1989.[CrossRef][ISI][Medline]
Nearey TM. Speech perception as pattern recognition. J Acoust Soc Am 101: 3241-3254, 1997.[CrossRef][ISI][Medline]
Nelken I, Rotman Y, and Bar Yosef O. Responses of auditory-cortex neurons to structural features of natural sounds. Nature 397: 154-157, 1999.[CrossRef][Medline]
Nikias CL and Mendel JM. Signal processing with higher-order spectra. IEEE Signal Process Mag 10: 10-37, 1993.
Nossair ZB and Zahorian SA. Dynamic spectral shape features as acoustic correlates for initial stop consonants. J Acoust Soc Am 89: 2978-2991, 1991.[CrossRef]
Oppenheim AV and Lim JS. The importance of phase in signals. Proc IEEE 69: 529-541, 1981.
Papoulis A. Probability, Random Variables and Stochastic Processes. New York: McGraw-Hill Higher Education, 1991.
Pickett JM. The Acoustics of Speech Communication. Fundamentals, Speech Perception Theory, and Technology. Boston, MA: Allyn & Bacon, 1999.
Piotrowski LN and Campbell FW. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception 11: 337-346, 1982.[ISI][Medline]
Puce A, Allison T, Gore JC, and McCarthy G. Face-sensitive regions in human extrastriate cortex studied by functional MRI. J Neurophysiol 74: 1192-1199, 1995.
Rauschecker JP. Cortical processing of complex sounds. Curr Opin Neurobiol 8: 516-521, 1998a.[CrossRef][ISI][Medline]
Rauschecker JP. Parallel processing in the auditory cortex of primates. Audiol Neurootol 3: 86-103, 1998b.[CrossRef][Medline]
Rauschecker JP and Tian B. Mechanisms and streams for processing of "what" and "where" in auditory cortex. Proc Natl Acad Sci USA 97: 11800-11806, 2000.
Rauschecker JP, Tian B, and Hauser M. Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268: 111-114, 1995.
Rolls ET, Baylis GC, and Hasselmo ME. The responses of neurons in the cortex in the superior temporal sulcus of the monkey to band-pass spatial frequency filtered faces. Vision Res 27: 311-326, 1987.[CrossRef][ISI][Medline]
Romanski LM and Goldman-Rakic PS. An auditory domain in primate prefrontal cortex. Nat Neurosci 5: 15-16, 2002.[CrossRef][ISI][Medline]
Romanski LM, Hauser MD, and Averbeck BB. Auditory neurons in ventrolateral prefrontal cortex respond to behavioral relevance and caller identity of vocalizations. Soc Neurosci Abstr 33: 722.13, 2003.
Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic PS, and Rauschecker JP. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat Neurosci 2: 1131-1136, 1999.[CrossRef][ISI][Medline]
Roweis S and Ghahramani Z. A unifying review of linear Gaussian models. Neural Comput 11: 305-345, 1999.[Abstract]
Sahani M and Linden JF. How linear are auditory cortical responses? Advances in Neural Information Processing Systems (NIPS) 15: 109-116, 2003.
Salinas E, Hernandez A, Zainos A, and Romo R. Periodicity and firing rate as candidate neural codes for the frequency of vibrotactile stimuli. J Neurosci 20: 5503-5515, 2000.
Sheinberg DL and Logothetis NK. The role of temporal cortical areas in perceptual organization. Proc Natl Acad Sci USA 94: 3408-3413, 1997.
Tanaka K, Saito H, Fukada Y, and Moriya M. Coding visual images of objects in the inferotemporal cortex of the macaque monkey. J Neurophysiol 66: 170-189, 1991.
Tian B, Reser D, Durham A, Kustov A, and Rauschecker JP. Functional specialization in rhesus monkey auditory cortex. Science 292: 290-293, 2001.
Vouloumanos A, Kiehl KA, Werker JF, and Liddle PF. Detection of sounds in the auditory stream: event-related fMRI evidence for differential activation to speech and nonspeech. J Cogn Neurosci 13: 994-1005, 2001.