|
|
||||||||
1Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston; 2HarvardMassachusetts Institute of Technology Division of Health Sciences and Technology, Speech and Hearing Bioscience and Technology Program, Cambridge; and 3Department of Otology and Laryngology, Harvard Medical School, Boston, Massachuetts
Submitted 12 July 2004; accepted in final form 30 July 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
As a step toward understanding the significance of the neural activity patterns underlying the wide range of fMRI waveshapes, we sought to establish which physical sound features most strongly determine fMRI response waveshape in auditory cortex. In our previous study of waveshape, stimulus sound-time fraction (STF) and rate covaried, so either parameter could have determined waveshape (Harms and Melcher 2002
). Both our previous study and that of Giraud et al. (2000)
considered only broadband stimuli at one sound level, leaving open the possibility that either sound bandwidth or level has a substantial influence on waveshape.
We performed 2 complementary sets of experiments that used different strategies to handle the acoustic noise produced by the scanner gradient coils during fMRI (Ravicz et al. 2000
) while maintaining the temporal resolution (about 2 s) needed to capture fMRI response waveshape. The main set examined multiple stimuli per imaging session and handled the noise through reduced spatial coverage of auditory cortex. These experiments imaged a single slice that sampled posterior auditory cortex including primary areas on Heschl's gyrus (HG) and nonprimary areas of the immediately lateral superior temporal gyrus (STG). In these "single-slice" experiments, we first examined response waveshape for a wide range of prolonged (30-s) sounds (e.g., speech, music, burst trains) to understand the general types of sounds that produce phasic versus sustained responses. We then examined fMRI waveshape while systematically varying fundamental sound features: temporal envelope characteristics (rate, STF), bandwidth, and level. Our findings indicate that sound temporal envelope, rather than level or bandwidth, is the primary determinant of fMRI response waveshape in posterior auditory cortex. The results also show that the sensitivity to sound temporal envelope holds in both primary and nonprimary areas of posterior auditory cortex, but nonprimary areas show more pronounced phasic responses for some stimuli (i.e., higher-rate trains and continuous noise), indicating more prominent neural activity at sound onset and offset.
The second set of experiments tested whether the strong dependency of fMRI waveshape on sound temporal envelope occurs in regions of auditory cortex beyond the posterior extent sampled in the single-slice experiments. This second set of experiments used a reduced number of stimuli in exchange for wider spatial coverage (achieved by imaging multiple slices). The acoustic noise was handled by acquiring all of the imaged slices in a brief cluster and leaving a long interval between successive clusters (Edmister et al. 1999
; Hall et al. 1999
). Temporal resolution was restored by varying the timing of the imaging relative to the stimuli (which limited the number of stimuli that could be studied per session). These "multislice" experiments showed that sound temporal envelope strongly influences fMRI waveshape anteriorly, as well as posteriorly, in auditory cortex.
Overall, the present study establishes sound temporal envelope as a crucial variable controlling the waveshape of fMRI responses in auditory cortex and, correspondingly, the underlying multisecond time patterns of neural activity.
| METHODS |
|---|
|
|
|---|
26 yr). Seventeen of the subjects were male and 22 were right-handed. Subjects had no known audiological or neurological disorders. Most of the imaging sessions (25) were conducted expressly for the present study. However, some were part of previous investigations of repetition rate (5 sessions; Harms and Melcher 2002Acoustic stimulation paradigm
Stimuli were presented binaurally during 30-s "on" periods, alternated with 30-s "off" periods during which no auditory stimulus was presented. The stimulus in each "on" period was a train of noise or tone bursts, a train of clicks, continuous noise, orchestral music, or running speech. Four or 5 alternations between the "on" and "off" periods constituted a single scanning "run" (total duration 240 or 300 s). Three to 6 stimuli were studied in each single-slice experiment. Each stimulus in a session was presented an equal number of times (7 to 13), except for music, which was often presented just 4 times. Two stimuli, a high rate noise burst train (3240 presentations) and music (8 presentations), were studied in each multislice experiment. Fewer music presentations were used in both the single and multislice experiments because music generally evokes robust responses, so fewer presentations were necessary.
Stimuli
TRAINS OF BROADBAND NOISE BURSTS. Bursts of white noise were presented in 30-s trains at repetition rates of 2/s, 10/s, and 35/s. Each burst had a rise and fall time of 2.5 ms and was usually 25 ms in duration (measured at half-maximum), yielding sound-time fractions (STFs) of 5, 25, and 88% for the 2/s, 10/s, and 35/s trains, respectively. Other rate/STF combinations included 2/s trains with an STF of 50% (250-ms bursts) and 35/s trains with STFs of 25 and 50% (7.1- and 14.3-ms bursts). The bursts were identical within a train (i.e., "frozen"), but differed across trains.
TRAINS OF NARROWBAND BURSTS. Tone bursts and narrowband (third octave) noise bursts were presented in 30-s trains at either 2/s or 35/s (burst center frequency: 500 Hz or 4 kHz; rise and fall time: 2.5 ms; duration: 25 ms, yielding STFs of 5 and 88% for the 2/s and 35/s trains, respectively). The repeated narrowband noise bursts within a train were identical, but differed across trains (and runs).
CONTINUOUS NOISE. The continuous noise was white and uncorrelated across its entire 30-s duration. Thus there was no repetition in the temporal fine structure.
TRAINS OF CLICKS. Clicks were presented in 30-s trains at a rate of either 35/s or 100/s (click duration: about 100 µs).
RUNNING SPEECH.
A 30-s speech stimulus was created by concatenating "conversational" sentences spoken by a professional male speaker (Harvard IEEE Corpus; IEEE 1969
). The amplitude envelope of the speech was low-pass, with a power spectral density 10 dB down at 5 Hz relative to its peak at 1.3 Hz.
ORCHESTRAL MUSIC. The music stimulus was the first 30 s of the fourth movement in Beethoven Symphony No. 7. The maximum power in the music amplitude envelope occurred at 0.69 Hz, with harmonics at 1.2, 2.5, and 4.9 Hz that were all within 10 dB of the power at 0.69 Hz.
Stimulus level
Sound levels were always about 55 dB above threshold (SL), except in sessions that explicitly examined the effects of sound level. Sound levels were determined separately for each ear and stimulus (to within 5 dB) in the scanner room immediately before the imaging session. The resulting sound pressure levels (at the ear) ranged from about 60 to 90 dB SPL. During threshold determination (and functional imaging), there was an ongoing low-frequency background noise produced by the scanner coolant pump. However, the acoustic noise produced by the scanner gradient coils during imaging was not present.
In sessions that examined the effects of stimulus level, the level ranged from 30 to 75 dB SL (about 60 to 100 dB SPL). The stimuli were 1) 35/s (88% STF) noise burst trains (studied at 40, 55, and 70 dB SL; 2 sessions), 2) orchestral music (30, 50, and 60 dB SL; 5 sessions), and 3) continuous noise (35, 45, 55, 65, and 75 dB SL; 5 sessions; 34 levels were studied in any given session).
Task
Subjects were instructed to listen "attentively" to the stimuli. Subjects were monitored to ensure that they remained alert throughout an experiment (typically by a nonverbal signal from the subject at the end of each imaging run in response to a question from the experimenter).
In sessions from our previous investigation of sound level (using music and continuous noise), subjects performed an additional task. Specifically, at the beginning and end of each 30-s stimulus "on" period, subjects controlled a knob to turn on or off an array of lights (Melcher et al. 2000
). Because of this additional task, data from these sound level experiments are compared only with each other (see Waveshape in posterior auditory cortex: insensitivity to sound level in RESULTS).
Sound delivery
Stimuli were produced by a D/A board (controlled by LabVIEW), amplified, and fed to a pair of piezoelectric transducers. The transducers were either 1) incorporated directly into sound attenuating earmuffs placed over the subject's ears (sound system I; custom built by GEC Marconi; used for the 10 sessions from our previous investigation of sound level), or 2) housed in a shielded box adjacent to the scanner and coupled to earmuffs by air-filled tubes (system II; all remaining sessions). The frequency response of both systems, measured at the subject's ears, was low-pass with a cutoff frequency of 10 kHz (system I) or 6 kHz (system II).
The earmuffs of the sound delivery systems attenuated the acoustic noise produced by the scanner coolant pump and gradient coils (Ravicz and Melcher 2001
; Ravicz et al. 2000
). At the ear, pump noise levels were about 65 dB SPL, and peak gradient noise levels were 7095 dB SPL at about 1.0 kHz (1.5-T scanners) or about 1.4 kHz (3.0 T).
Imaging
Subjects were imaged using whole-body scanners and standard head coils while resting supine. To reduce head motion, a bite bar was custom molded to the subject's teeth and mounted to the head coil, or pillow and foam were packed snugly around the head. Each imaging session lasted about 2 h. Imaging sessions began with the acquisition of contiguous sagittal images of the whole head as a reference for functional slice selection. Also common to all sessions was the acquisition of T1-weighted anatomical image(s) (in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm) of the functionally imaged slice(s).
SINGLE-SLICE EXPERIMENTS.
The single-slice experiments were conducted on several different scanners: a 1.5- or 3.0-T General Electric scanner retrofitted for high-speed, echo-planar imaging (by Advanced NMR Systems; 1.5 T: 10 sessions; 3.0 T: 6 sessions), a 1.5-T General Electric Signa Horizon scanner (10 sessions), a 1.5-T Siemens Sonata scanner (5 sessions), and a 3.0-T Siemens Allegra scanner (6 sessions). No obvious differences in response waveshape were observed between imaging systems (Harms 2002
).
The near-coronal slice selected for functional imaging intersected the inferior colliculi and the posteromedial aspect of Heschl's gyri. When multiple transverse temporal gyri were present, the anterior one was intersected. Given this slice placement, primary auditory cortex was likely sampled in all subjects (Rademacher et al. 1993
, 2001
).
Functional images of the selected slice were acquired using a blood oxygenation leveldependent (BOLD) sequence (1.5 T: asymmetric spin echo, TE = 70 ms,
offset = 25 ms, flip = 90°; 3.0 T: gradient echo, TE = 30 ms, except one session used 40 ms and another used 50 ms, flip = 60 or 90°). Slice thickness was 7 mm with an in-plane resolution of 3.1 x 3.1 mm. The beginning of each functional run included 4 discarded images to ensure that image signal level had approached a steady state.
Although the present paper focuses on responses from auditory cortex, our experiments imaging a single slice were also designed to examine the inferior colliculus. Therefore functional images were generally collected using cardiac gating, which increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998
). Image acquisitions were synchronized to every other QRS complex in the subject's electrocardiogram, resulting in an average interimage interval (TR) of about 2 s. In the 2 sessions that did not use cardiac gating, the TR was 2 s.
MULTISLICE EXPERIMENTS. The multislice experiments were conducted on the 3.0-T General Electric scanner (gradient echo, TE = 30 ms, flip = 60°). The functional image volume consisted of 10 contiguous, near-coronal slices (in-plane resolution: 3.1 x 3.1 mm; slice thickness: 7 mm), one of which matched the slice plane used in the single-slice experiments. All slices of the functional volume were acquired in a brief interval (<1 s) once every 8 s (TR). The onset of the stimulus relative to the volume acquisitions was staggered by 2-s increments from run to run. Thus across the multiple runs for a given stimulus, the functional data in toto included samples acquired every 2 s relative to the stimulus. Waveshapes measured this way did not differ systematically from those obtained in the single-slice experiments [e.g., compare Fig. 10 waveshapes (multislice data) to those for the same stimuli in Fig. 2 (single-slice data)].
|
|
PREPROCESSING.
Before response detection, the following preprocessing steps were performed. For the single-slice experiments using cardiac gating, the image signal was retrospectively corrected to account for the image-to-image variations in signal strength (i.e., T1 effects) that result from fluctuations in heart rate (Guimaraes et al. 1998
). For all experiments, the images for each scanning run were then corrected for head movements that may have occurred (SPM95 software package; without spin history correction; Friston et al. 1995a
, 1996
). In the single-slice experiments these corrections for motion were necessarily limited to adjustments within the imaging plane. Finally, for experiments using cardiac gating, the time series for each imaging "run" and voxel was linearly interpolated to a consistent 2-s interval between images.
RESPONSE DETECTION.
Responses were detected using a general linear model (GLM) (Draper and Smith 1981
; Fomby et al. 1984
; Friston et al. 1995b
) and a set of basis functions designed to detect the wide range of response waveshapes known to occur in auditory cortex (Harms and Melcher 2003
). The basic idea behind the GLM is to model the signal versus time within each voxel as a weighted sum of basis functions, and then to identify "active" (i.e., "sound-sensitive") voxels based on the goodness of fit of this model. The basis set used here consisted of 5 components, designed to capture different aspects of response waveshape: Onset, Sustained, Ramp, Offset, and Undershoot (Fig. 1).
|
For each stimulus in a given imaging session, we created an "omnibus" activation map (using an F-statistic; Fomby et al. 1984
) that tested against the null hypothesis that none of the estimated amplitudes of the basis functions was significantly different from zero. "Active" voxels were defined as those with values of P < 0.001 (not corrected for multiple comparisons).
DEFINING REGIONS OF INTEREST. For the single-slice experiments examining posterior auditory cortex, the response waveshape was quantified for 2 anatomically defined regions of interest (ROIs): Heschl's gyrus (HG) and the superior temporal gyrus (STG). When Heschl's gyrus was evident, it constituted the HG ROI. Otherwise, the HG ROI covered the medial third of the superior temporal plane. The STG ROI was defined as the region lateral to the HG ROI, extending superiorly to the edge of the parietal lobe, and inferiorly to the superior edge of the superior temporal sulcus. Regions from both hemispheres were combined to create the final (bilateral) HG and STG ROIs for each session. [The number of voxels in the resulting ROIs averaged 55 ± 10 (mean ± SD) in HG, and 72 ± 13 in STG.] Additional analyses (not presented) confirmed that response waveshapes on the left and right sides separately exhibited the same rate and STF dependencies seen when the 2 sides were analyzed together.
RESPONSE QUANTIFICATION. For each ROI, responses were quantified in terms of individual basis function amplitudes or combinations thereof. First, for a given stimulus and ROI, the amplitudes of a given basis function were averaged across all the active voxels in the ROI. These average amplitudes were then converted to a "percentage change" scale by dividing by the estimated signal baseline (i.e., the value of the constant term in the GLM, averaged across runs and the same active voxels) and multiplying by 100. We denote the resulting amplitudes of the onset, sustained, ramp, and offset components as On, Sust, Ramp, and Off, respectively. Mid, a measure of the response amplitude near the middle of the "sound on" period, was defined as the sum of Sust plus one half of Ramp. For a given stimulus and ROI, we required a minimum of 3 active voxels to include that stimulus/ROI combination in the RESULTS. In all, 5 responses for STG and 2 for HG were excluded because the 3-voxel criterion was not met, leaving 156 responses for STG and 159 for HG. [The number of active voxels in these responses averaged 15.4 ± 7.6 (mean ± SD) in HG, and 23.3 ± 11.1 in STG.]
The response in each ROI was further quantified using a "waveshape index" (WI), a measure of overall response waveshape, designed to order responses along a continuum ranging from sustained to phasic (Harms and Melcher 2003
). The formulation was such that the WI was independent of overall response amplitude and stayed within a finite range. Specifically, for a given ROI
![]() | (1) |
|
|
|
|
|
| RESULTS |
|---|
|
|
|---|
The waveshape of fMRI responses depended strongly on the type of stimulus, as illustrated in Fig. 2. The response waveforms of Fig. 2 were obtained in single-slice imaging sessions and thus correspond to posterior auditory cortex (i.e., posterior HG and immediately lateral STG). Each waveform represents an average response (across multiple single-slice sessions) to one of 8 broadband stimuli of about 55 dB SL. In both HG and STG, the average response to 100/s clicks and 35/s noise bursts was highly phasic, whereas the response to 2/s noise bursts, speech, and music was primarily sustained. The phasic responses were characterized by a prominent signal decline (80120%) after an initial signal peak (at about 6 s), and a clear peak after sound offset (at about 36 s). In contrast, the sustained responses showed far less signal decline (2540%), and lacked a distinct peak after sound offset. The responses evoked by 35/s clicks, 10/s noise bursts, and continuous noise were "intermediate" in waveshape in that they displayed an "intermediate" degree of signal decline (5075%), and only a small "off-peak" (e.g., 35/s clicks and continuous noise in STG) or only the suggestion of a small, "hidden" off-response in the form of a slightly prolonged elevation in signal after stimulus termination (e.g., 35/s clicks and continuous noise in HG, and 10/s noise bursts). Overall, the stimuli in Fig. 2 produced a range of waveshapes, from highly phasic to highly sustained.
The stimulus-dependent variations in waveshape from phasic to sustained were well captured by the waveshape index (WI), as can be seen from the plots of WI for individual imaging sessions in Fig. 2. In keeping with the trend in the average waveforms, the WIs for 100/s clicks and 35/s noise bursts cluster toward the phasic end of the WI range (i.e., toward one), whereas the WIs for 2/s noise bursts, speech, and music cluster near the sustained end (i.e., toward zero). The WIs for 35/s clicks, 10/s noise bursts, and continuous noise generally occupy an intermediate range. In general, trends in WI were also reflected in the individual elements that together define the WI. For instance, in the case of 35/s and 2/s noise burst trains, the transient components of the basis setonset (On) and offset (Off)were almost always greater at 35/s than at 2/s (
19 of 22 cases in both HG and STG), whereas the midpoint signal level (Mid) was less at 35/s than at 2/s for all but one instance in HG and STG. Thus like the WI, the trends for the individual components indicated more phasic responses at 35/s. In the following presentation, trends for the individual components will be described only when they provide insights that are not revealed by the WI.
Waveshape in posterior auditory cortex: dependency on repetition rate
Because the stimuli represented in Fig. 2 have quite different temporal characteristics, but are similar in spectrum and level, the data strongly suggest that stimulus temporal characteristics are an important determinant of cortical response waveshape. The importance of one temporal parameter, stimulus repetition rate, can be seen clearly from Fig. 2 in that higher-rate trains (i.e., 100/s clicks and 35/s noise bursts) typically elicited phasic responses, whereas low-rate trains (2/s noise bursts) elicited more sustained responses. Speech and music fit with this trend because they are dominated by low modulation rates and produce sustained waveshapes (Fig. 2, far right).
The dependency of waveshape on rate was also evident in individual sessions. In every session that used 2/s, 10/s, and 35/s noise burst trains, the WI in both HG (7 sessions, 5 subjects) and STG (6 sessions, 5 subjects) increased with increasing rate, indicating a shift in response waveshape from sustained to phasic. In sessions that used 2/s and 35/s (but not 10/s) noise bursts (16 sessions, 16 subjects), the WI was greater for 35/s in all but one instance in HG, and in every instance in STG (P < 0.001, signed-rank test). For the 2 sessions (2 subjects) that used both 35/s and 100/s clicks, the WI for 100/s clicks was greater than that for 35/s clicks in both HG and STG. Thus the intrasession data strongly indicate a shift toward more phasic responses with increasing rate.
In the comparisons just described, burst or click duration remained constant across rates, so sound-time fraction (STF) covaried with rate. Because this raises the possibility that waveshape may be controlled by STF, and not rate, we examined the effects of rate while holding STF constant (which required varying the duration of individual bursts; 5 sessions, 5 subjects). Figure 3 shows that even when STF was held at 50%, the average response for a 35/s noise burst train (14.3-ms bursts) was still more phasic than that for a 2/s train (250-ms bursts). Moreover, in every session (5/5 for HG, P = 0.06; 4/4 for STG, P = 0.13, signed-rank test), the WI for 35/s was greater than that for 2/s, indicating a trend toward more phasic responses at 35/s. However, the response difference with equal STF was less pronounced than the difference between trains of the same 2 rates for which STF also changed (average difference in WI between 35/s and 2/s trains was about 0.24 with 50% STFs, compared with about 0.44 for 35/s, 88% vs. 2/s, 5% trains from the same sessions). Thus these results provide further support for the view that response waveshape depends strongly on stimulus rate, but suggest that STF may also influence response waveshape.
|
The effect of STF on auditory responses was directly examined by varying STF while holding rate constant (7 sessions; 6 subjects; single-slice experiments) (Fig. 4). Note that increasing STF corresponds to increasing duration of individual bursts when rate is held constant. In HG, the WI was greater for a 35/s noise burst train with an 88% STF (25-ms bursts) than for a 50% STF (14.3-ms bursts) in all 6 sessions with these data (P = 0.03, signed-rank test), indicating a clear effect of STF on response waveshape in HG. However, the changes in WI were generally small, consistent with the small differences between the average response waveforms for the 50 and 88% STFs (Fig. 4). In STG, the WI was greater for the 88% STF in 4 of 5 cases (P = 0.2). In the 2 sessions (2 subjects) with data for a larger STF differential of 25% (7.1-ms bursts) and 88% (25-ms bursts), responses were more phasic (i.e., higher WI) at the higher STF in both HG and STG, and this difference in waveshape was more pronounced (Fig. 4). This suggests that larger STF differentials result in larger, more robust changes in response waveshape. Overall, the 35/s noise burst data indicate a trend toward more sustained responses with decreasing STF.
There was also evidence of an effect of STF on response waveshape for low-rate (2/s) noise burst trains, but the effect was small, and apparent in only one of the components making up the WI. In particular, in 5 sessions (5 subjects) with responses to 2/s noise burst trains with STFs of 5 and 50%, On was greater at the 50% STF in all cases, in both HG and STG (P = 0.06, signed-rank test), whereas Off and Mid showed no change with STF. The net effect on the average response waveform was a slightly more pronounced "on-peak" at the higher STF, leading to a slightly more phasic response [cf., the 2/s (50% STF) average waveforms in Fig. 3 to the 2/s (5% STF) average waveforms in Fig. 2].
Altogether, the experiments varying repetition rate and STF indicate that rate and STF both influence the waveshape of responses from auditory cortex.
Waveshape in posterior auditory cortex: insensitivity to sound level
Unlike changes in stimulus temporal characteristics, variations of stimulus level over a 30- to 40-dB range did not result in strong, systematic changes in response waveshape. This is illustrated in Fig. 5, which shows responses from posterior HG for one session (a single-slice experiment). The left panel shows that phasic responses were elicited by 35/s noise burst trains (88% STF) regardless of level (40, 55, and 70 dB SL), whereas the right panel illustrates the markedly different responses produced by 35/s and 2/s noise burst trains of comparable level (70 dB SL). Any change in waveshape with level was far less than the change in waveshape with rate.
|
The changes in waveshape index (
WI) produced by changes in stimulus level or temporal characteristics can be seen in Fig. 6 (top). The
WI for changes in level were clustered around zero in both HG and STG (HG: P = 0.15, STG: P = 0.23, signed-rank test). In contrast, for the same sessions, the
WI for temporal changes were always the same sign (P = 0.004 in both HG and STG). Furthermore, in all but one instance (in HG), the absolute value of
WI was greater for the temporal changes than for the level changes. Altogether, the WI was altered in a systematic manner by stimulus temporal changes, but was insensitive to level changes.
Examination of the individual waveshape components revealed a similar insensitivity to level for the onset and midpoint, but some sensitivity for the offset. Like the WI, On and Mid did not change in a systematic direction when the level was changed (in both HG and STG, On: P > 0.3, Mid: P > 0.9), but did for a change in sound temporal characteristics (in both HG and STG, On: P
0.01, Mid: P = 0.004; Fig. 6). In addition, the absolute value of the change in Mid (
Mid) was almost always greater for sound temporal changes than for level changes. Unlike On and Mid, there was evidence for a consistent effect of level on Off in that the
Off attributed to a change in level were typically positive (HG: P = 0.01, STG: P = 0.15). However, the
Off arising from a stimulus temporal change were always positive, and on average larger than the changes attributed to level. Thus although level appears to have some effect on response offset, the influence of stimulus temporal properties was both greater (on average) and more consistent in the direction of the effect. Altogether, these analyses of
WI,
On,
Mid, and
Off indicate that response waveshape is affected in a more systematic manner, and to a greater degree, by changes in stimulus temporal characteristics than by changes in stimulus level.
Waveshape in posterior auditory cortex: insensitivity to sound bandwidth
In another set of experiments we compared the effects on response waveshape of stimulus bandwidth versus stimulus rate. The experiments (4 sessions, 4 subjects) used various combinations of broadband noise bursts, narrowband noise bursts, and tone bursts presented at either 2/s (5% STF) or 35/s (88% STF). These experiments all included broadband and narrowband stimuli of the same rate, as well as 2/s and 35/s stimuli of the same bandwidth, so that the effects of bandwidth (holding rate constant) and the effects of rate (holding bandwidth constant) could be compared within the same sessions. The
WI for changes in bandwidth (WI for the broadband minus the narrowband stimulus) were clustered around zero in both HG and STG (HG: P = 0.8, STG: P = 0.4, signed-rank test; Fig. 7, "Bandwidth Change" columns). In contrast, for the same sessions, the
WI for a change in rate (high minus low rate) were always positive (HG and STG: P = 0.02; Fig. 7, "Rate Change" columns), and in all but one instance (in STG) were greater than the
WI because of the bandwidth change. The lack of a consistent effect of bandwidth on waveshape was seen in a similar analysis of individual waveshape components (On, Mid, Off; not shown). Overall, the results indicate that stimulus bandwidth does not have a systematic effect on waveshape, in contrast to the highly systematic and robust effect of rate.
Waveshape in posterior auditory cortex: differences between HG and STG
Inspection of WI maps for the broad range of stimuli studied in the single-slice experiments revealed clear spatial variations in WI within posterior auditory cortex for certain stimuli. Sample maps showing some of the clearest spatial variations are shown in Fig. 8. In all of these cases, responses in STG tended to be more phasic than those in HG.
Waveshape differences between HG and STG were readily apparent in the overall database from the single-slice experiments, but only for stimuli that elicit moderately-phasic to phasic responses (Fig. 9). For low-rate stimuli, the WI did not differ significantly between HG and STG (2/s bursts with STFs of 5 or 50%: P = 0.14, signed-rank test; music and speech: P = 0.13; Fig. 9, left). For these stimuli, On, Mid, and Off all tended to be greater in STG than in HG, indicating a regional difference in response amplitude, but not waveshape. In contrast, higher-rate stimuli or continuous noise did show significant differences in WI between HG and STG (100/s clicks and 35/s bursts with a 88% STF: P < 0.001; 10/s noise bursts, 35/s clicks, and 35/s noise bursts with a 25 or 50% STF: P < 0.001; continuous noise: P = 0.06). This difference arose because On and Off tended to be larger in STG than in HG, whereas Mid showed no difference between the 2 structures. Thus analyses of overall waveshape and individual response components indicated more phasic responses in STG compared with HG for higher-rate stimuli or continuous noise, but not for low-rate stimuli.
Waveshape throughout auditory cortex for music and 35/s noise bursts
The results presented so far indicate that response waveshape depends strongly on stimulus temporal characteristics. However, the data supporting this conclusion are for the posterior aspect of auditory cortex sampled by the single-slice experiments. To test whether stimulus temporal characteristics are important to response waveshape throughout auditory cortex, 3 multislice experiments were performed using a low-rate stimulus (music) and a high-rate stimulus (35/s noise bursts, 88% STF). Although music and 35/s noise bursts have acoustic differences beyond rate (e.g., spectrum as a function of time and fine time structure), the rate of the dominant amplitude modulation is one of the primary acoustic differences between these 2 stimuli.
Figure 10 displays a spatial map of WI in left auditory cortex for a session using multiple slices. The results are representative of data for the right hemisphere and the other 2 subjects. For both music and 35/s noise bursts, active voxels occurred throughout the superior temporal lobe. However, the WIs for music and 35/s noise bursts occupied distinct ranges. In particular, the majority of active voxels for music had a WI <0.2, indicating primarily sustained responses. In contrast, the majority of active voxels for the 35/s noise burst train had a WI >0.4, indicating "intermediate" or phasic responses. The voxels activated by the 2 stimuli were largely overlapping, indicating that the same regions of cortex can show either phasic or sustained responses depending on the stimulus. The bottom of Fig. 10 illustrates the difference in response waveshape between these 2 stimuli (for the slice used in the "single-slice" experiments, denoted "0 mm").1 Overall, the dramatic difference in waveshape for music versus 35/s noise bursts occurs in widespread cortical areas, which indicates that stimulus temporal characteristics strongly influence waveshape throughout auditory cortex.
| DISCUSSION |
|---|
|
|
|---|
We also showed that the strong influence of sound temporal envelope characteristics on fMRI response waveshape holds in widespread regions of auditory cortex, not just the posterior aspect. Specifically, a low-rate stimulus (i.e., music) evoked sustained responses throughout the superior temporal lobe, whereas a high rate stimulus (35/s noise burst train) evoked phasic responses in equally widespread areas. The areas of the superior temporal lobe that responded to these low- and high-rate sounds were largely overlapping, reinforcing our previous finding that a given cortical area can shift its response from sustained to phasic as sound characteristics are changed (Harms and Melcher 2002
).
The sound-dependent shift in response waveshape is probably not attributable to hemodynamic factors, but instead reflects a shift in the time pattern of cortical neural activity (for detailed discussion see Harms and Melcher 2002
). Briefly, neural activity and image signal are linked through a chain of metabolic and hemodynamic events such that increases and decreases in neural activity result in concordant changes in image signal strength (Devor et al. 2003
; Heeger and Ress 2002
; Logothetis 2003
; Nemoto et al. 2004
). Synaptic activity (excitatory or inhibitory) can be a dominant source of changes in local brain metabolism and oxygen consumption and may frequently be the primary source of the fMRI response (Auker et al. 1983
; Caesar et al. 2003
; Logothetis et al. 2001
; Mathiesen et al. 1998
; Nudo and Masterton 1986
). However, strong correlations between neural discharge activity and fMRI response have also been reported (perhaps arising because the discharge activity is highly correlated with synaptic activity; Heeger et al. 2000
; Rees et al. 2000
; Smith et al. 2002
). For the purposes of the present discussion, we leave open the possibility that either or both synaptic activity and discharges "contribute" to the fMRI responses we measured. Regardless of whether synaptic or discharge activity drives the fMRI response, the resulting hemodynamic changes occur over the course of seconds. This means that fMRI provides a temporally low-pass filtered view of neural activity, in contrast to microelectrode recordings, for example, which show the fine time pattern of spiking in individual neurons. Thus the shift in fMRI waveshape from sustained to phasic reflects a change in the gross (second-by-second) time patterns of neural activity. The fact that a given region (or even a single voxel) can exhibit both sustained and phasic fMRI responses (depending on the stimulus) is strong additional evidence that the waveshape changes arise through changes in underlying neural activity because the coupling between neural activity and hemodynamic response is presumably constant within a given region. Most likely, changes in neural adaptation and the strength of neural off-responses underlie the sustained-to-phasic shift in fMRI waveshape, as discussed previously (Harms and Melcher 2002
). Because fMRI is an indicator of activity summed over many neurons, this shift in waveshape within a given area may occur because distinct, but spatially interspersed, neural populations (with differing response properties) are recruited in different stimulus regimes (Lu and Wang 2004
; Lu et al. 2001
). Alternatively, the same neurons may shift their mode of response as sound temporal envelope characteristics are varied (Creutzfeldt et al. 1980
; Steinschneider et al. 1998
).
The waveshape of cortical fMRI responses depends on an interplay between rate and STF
Our results show that cortical response waveshape depends on an interplay between rate and STF. For instance, phasic responses occurred for trains of repeated stimuli having both high rates and high STFs, whereas sustained responses occurred for trains in the opposite corner of the rate-STF space (i.e., low rates and low STFs). When the effects of rate and STF were in opposition the resulting responses were "intermediate" between the 2 extremes of phasic and sustained.
Response waveshape versus amplitude
Previous studies have reported variations in fMRI activation with sound level or bandwidth (Brechmann et al. 2002
; Hall et al. 2001
; Hart et al. 2003
; Jäncke et al. 1998
; Mohr et al. 1999
; Wessinger et al. 2001
). However, these studies examined the amplitude and spatial extent of the fMRI response, rather than response waveshape, and thus do not contradict the insensitivity of waveshape to sound level or bandwidth seen in the present study. Most of the previous studies used analysis procedures that assume a sustained response (e.g., correlation with a boxcar waveform), which means that a change in response waveshape could present as a change in amplitude or extent (Harms and Melcher 2003
). However, given the data of the present study, the previously reported level and bandwidth dependencies most likely reflect true changes in response amplitude and extent because waveshape presumably varied little with level or bandwidth.
Neural activity differences between cortical areas
HG and STG, sites of primary and nonprimary auditory areas, respectively, showed differences in fMRI waveshape that likely reflect differences in the underlying time pattern of neural activity. The differences were not apparent for low-rate stimuli, but were apparent for higher-rate stimuli and continuous noise, with STG showing more phasic responses than HG. This difference in fMRI waveshape indicates that the neural activity of STG, compared with that of HG, was concentrated more at sound onset and offset, and less during the sound. Similar regional differences in fMRI waveshape have also been described previously for a stimulus that falls in the range of our higher-rate stimuli (a 10/s train of near-tonal bursts; Seifritz et al. 2002
). These studies indicate that fMRI waveshape may provide a physiological marker for distinguishing primary from nonprimary auditory cortical areas.
An alternative, but unlikely, explanation for the waveshape differences between HG and STG is that they reflect regional differences in tissue hemodynamics (Chen et al. 1998
; Davis et al. 1998
). Although compelling arguments have been made that hemodynamics do not account for stimulus-dependent waveshape changes within a region (above; also Harms and Melcher 2002
), these arguments do not apply to comparisons of waveshape across regions. Nevertheless, a separate argument indicates that a hemodynamic explanation is unlikely. The argument follows from the fact that waveshape differences between HG and STG were seen for some sounds (higher-rate stimuli and continuous noise), but not for others (low-rate stimuli). If the waveshape differences between structures were attributed to hemodynamic factors, the hemodynamics themselves would have to be stimulus dependent (or equivalently, waveshape dependent because waveshape depends on stimulus). Even though a hemodynamic dependency on stimulus (or waveshape) cannot be entirely excluded, it has no precedent. We therefore conclude that the fMRI waveshape differences between HG and STG likely reflect differences in neural activity, rather than hemodynamics.
Sound perception, fMRI response waveshape, and cortical neural activity
The results suggest a possible relationship among sound temporal envelope, cortical response waveshape, and certain perceptual attributes of sound trains. Specifically, changes in sound temporal envelope characteristics produce changes in response waveshape that we hypothesize vary according to the degree of perceptual grouping of the individual, "elemental" stimuli of a train (e.g., the individual bursts). In this discussion, our assessments of perceptual grouping are based on informal listening (but agree with the pertinent psychophysical data; e.g., Miller and Taylor 1948
). For trains with a low rate and low STF (e.g., trains of 2/s tone or noise bursts with 5% STF), which elicited sustained responses, the individual stimuli of a train are easily distinguished (e.g., they can be easily counted). For trains with a high rate and high STF (e.g., trains of 35/s tone or noise bursts with 88% STF), which elicited highly phasic responses, the individual stimuli of a train fuse, resulting in a continuous (but modulated) percept in which the most salient perceptual changes are at the beginning and end of the overall train. For intermediate parameters (e.g., 10/s noise bursts), which elicited intermediate waveshapes, the perceptual characteristics of a train are also "intermediate" in that the elemental composition of the train can be appreciated, although the individual elements are not so readily distinguished (e.g., they cannot be easily counted).
The present study provides additional support for the previously suggested relationship between fMRI waveshape and the elemental versus fused nature of sound trains (Harms and Melcher 2002
). Importantly, we found that, in contrast to sound temporal envelope, variations in level or bandwidth had little or no effect on cortical response waveshape. At the same time, these variations have little or no effect on the elemental versus fused nature of a train (Harms et al., personal observations). Thus the sound features that most strongly influence the grouping of successive stimuli in a train (temporal envelope characteristics rather than level or bandwidth) are also the features that most strongly influence the waveshape of cortical fMRI responses and thus the time patterns of the underlying neural activity.
A relationship between fMRI response waveshape and the elemental versus fused quality of a sound train fits with the hypothesis that population neural activity in auditory cortex signals the beginning and end of perceptually distinct acoustic "events." For trains perceived as a series of distinct events, each successive event may elicit a transient neural response, leading to a sustained fMRI waveshape (due to the "low-pass" nature of the hemodynamic system linking neural activity to fMRI response). In contrast, for trains in which the individual elements fuse into one, the entire train may be treated as a single event and thus produce neural responses primarily at sequence onset and offset, and thus a phasic fMRI waveshape. Perceptually, the transition between the 2 extremes of discerning many distinct events versus a single event is gradual (Miller and Taylor 1948
), and presumably the corresponding transition in neural activity is also gradual. Thus for trains in the transition region, neural responses presumably reflect both individual events and the train as a whole to varying degrees, and thus produce "intermediate" fMRI waveshapes. Although this hypothesis relating neural activity and auditory events stems from fMRI data for sounds lasting tens of seconds, studies using other techniques provide evidence for heightened neural activity at the beginning and end of shorter-duration sounds (electrophysiology, magnetoencephalography; see DISCUSSION in Harms and Melcher 2002
). Therefore the relationship between neural activity and auditory events hypothesized here may also apply on shorter timescales.
The response waveshapes for the nontrain stimuli of the present study (music, speech, continuous noise) may also fit with the view of events and neural activity just described. Music and speech are readily heard as a series of distinct perceptual entities (beats or words, respectively). These stimuli produce highly sustained fMRI responses, possibly because each beat or word elicits an elemental neural response. In contrast, continuous noise, which contains no separable elements, produced more phasic responses, indicating that the entirety of the noise segment is treated as a single event. Interestingly, the fMRI response waveshapes for continuous noise were often less phasic than those for high-rate trains (i.e., the onset and offset of the noise were less strongly delineated by neural activity). This may be because continuous noise lacks regularly occurring temporal modulations to bind it, and therefore forms a less salient perceptual event (McAdams 1989
; Pollack 1951
).
Previously, we suggested that a population neural representation of the beginning and end of distinct perceptual events is weak or absent in the midbrain, begins to emerge in the thalamus, and is robust in auditory cortex (Harms and Melcher 2002
). The data for HG versus STG in the present study indicate that this progressive emergence across levels of the auditory system continues in the cortex, with the delineation of events being more accentuated in nonprimary, as compared with primary, auditory areas. Altogether, the present and previous results suggest a multistage neural processing scheme in which the auditory environment is segmented increasingly into distinct, and ultimately meaningful events.
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
1 The 3 multislice experiments were consistent with the observation in the single-slice experiments of waveshape differences between HG and STG for higher-rate stimuli because the WI for 35/s noise bursts was always greater in STG than in HG for the slice used in the single-slice experiments. ![]()
Address for reprint requests and other correspondence: M. P. Harms, Pfizer Inc., 700 Chesterfield Parkway West, Chesterfield, MO 63017 (mharms{at}alam.mit.edu)
| REFERENCES |
|---|
|
|
|---|
Brechmann A, Baumgart F, and Scheich H. Sound-level-dependent representation of frequency modulations in human auditory cortex: a low-noise fMRI study. J Neurophysiol 87: 423433, 2002.
Caesar K, Gold L, and Lauritzen M. Context sensitivity of activity-dependent increases in cerebral blood flow. Proc Natl Acad Sci USA 100: 42394244, 2003.
Chen W, Zhu XH, Toshinori K, Andersen P, and Ugurbil K. Spatial and temporal differentiation of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation. Magn Reson Med 39: 520527, 1998.[ISI][Medline]
Creutzfeldt O, Hellweg FC, and Schreiner C. Thalamocortical transformation of responses to complex auditory stimuli. Exp Brain Res 39: 87104, 1980.