JN AJP: Advances in Physiology Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 93: 210-222, 2005. First published August 11, 2004; doi:10.1152/jn.00712.2004
0022-3077/05 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
93/1/210    most recent
00712.2004v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (16)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Harms, M. P.
Right arrow Articles by Melcher, J. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Harms, M. P.
Right arrow Articles by Melcher, J. R.

Short-Term Sound Temporal Envelope Characteristics Determine Multisecond Time Patterns of Activity in Human Auditory Cortex as Shown by fMRI

Michael P. Harms1,2, John J. Guinan, Jr.1,2,3, Irina S. Sigalovsky1,2 and Jennifer R. Melcher1,2,3

1Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston; 2Harvard–Massachusetts Institute of Technology Division of Health Sciences and Technology, Speech and Hearing Bioscience and Technology Program, Cambridge; and 3Department of Otology and Laryngology, Harvard Medical School, Boston, Massachuetts

Submitted 12 July 2004; accepted in final form 30 July 2004


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Functional magnetic resonance imaging (fMRI) of human auditory cortex has demonstrated a striking range of temporal waveshapes in responses to sound. Prolonged (30 s) low-rate (2/s) noise burst trains elicit "sustained" responses, whereas high-rate (35/s) trains elicit "phasic" responses with peaks just after train onset and offset. As a step toward understanding the significance of these responses for auditory processing, the present fMRI study sought to resolve exactly which features of sound determine cortical response waveshape. The results indicate that sound temporal envelope characteristics, but not sound level or bandwidth, strongly influence response waveshapes, and thus the underlying time patterns of neural activity. The results show that sensitivity to sound temporal envelope holds in both primary and nonprimary cortical areas, but nonprimary areas show more pronounced phasic responses for some types of stimuli (higher-rate trains, continuous noise), indicating more prominent neural activity at sound onset and offset. It has been hypothesized that the neural activity underlying the onset and offset peaks reflects the beginning and end of auditory perceptual events. The present data support this idea because sound temporal envelope, the sound characteristic that most strongly influences whether fMRI responses are phasic, also strongly influences whether successive stimuli (e.g., the bursts of a train) are perceptually grouped into a single auditory event. Thus fMRI waveshape may provide a window onto neural activity patterns that reflect the segmentation of our auditory environment into distinct, meaningful events.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Recent studies in humans have demonstrated that the temporal waveshape of cortical functional magnetic resonance imaging (fMRI) activation can change dramatically with a change in sound characteristics. Prolonged (30-s) trains of repeated noise bursts with a low burst repetition rate (2/s) elicit "sustained" fMRI responses in auditory cortex (i.e., fMRI signal remains elevated throughout the train duration), whereas trains with a high rate (35/s) elicit "phasic" responses (i.e., fMRI signal displays prominent peaks just after train onset and offset and little elevation during the train; Harms and Melcher 2002Go). Somewhat similar response changes have also been reported for amplitude-modulated noise (Giraud et al. 2000Go). It is unlikely that these changes in response waveshape reflect the hemodynamic processes that link neural activity to fMRI response (Harms and Melcher 2002Go). Instead, the shift in waveshape from sustained to phasic likely indicates a change in the time pattern of the underlying neural activity, from distributed throughout a train to more concentrated at train onset and offset.

As a step toward understanding the significance of the neural activity patterns underlying the wide range of fMRI waveshapes, we sought to establish which physical sound features most strongly determine fMRI response waveshape in auditory cortex. In our previous study of waveshape, stimulus sound-time fraction (STF) and rate covaried, so either parameter could have determined waveshape (Harms and Melcher 2002Go). Both our previous study and that of Giraud et al. (2000)Go considered only broadband stimuli at one sound level, leaving open the possibility that either sound bandwidth or level has a substantial influence on waveshape.

We performed 2 complementary sets of experiments that used different strategies to handle the acoustic noise produced by the scanner gradient coils during fMRI (Ravicz et al. 2000Go) while maintaining the temporal resolution (about 2 s) needed to capture fMRI response waveshape. The main set examined multiple stimuli per imaging session and handled the noise through reduced spatial coverage of auditory cortex. These experiments imaged a single slice that sampled posterior auditory cortex including primary areas on Heschl's gyrus (HG) and nonprimary areas of the immediately lateral superior temporal gyrus (STG). In these "single-slice" experiments, we first examined response waveshape for a wide range of prolonged (30-s) sounds (e.g., speech, music, burst trains) to understand the general types of sounds that produce phasic versus sustained responses. We then examined fMRI waveshape while systematically varying fundamental sound features: temporal envelope characteristics (rate, STF), bandwidth, and level. Our findings indicate that sound temporal envelope, rather than level or bandwidth, is the primary determinant of fMRI response waveshape in posterior auditory cortex. The results also show that the sensitivity to sound temporal envelope holds in both primary and nonprimary areas of posterior auditory cortex, but nonprimary areas show more pronounced phasic responses for some stimuli (i.e., higher-rate trains and continuous noise), indicating more prominent neural activity at sound onset and offset.

The second set of experiments tested whether the strong dependency of fMRI waveshape on sound temporal envelope occurs in regions of auditory cortex beyond the posterior extent sampled in the single-slice experiments. This second set of experiments used a reduced number of stimuli in exchange for wider spatial coverage (achieved by imaging multiple slices). The acoustic noise was handled by acquiring all of the imaged slices in a brief cluster and leaving a long interval between successive clusters (Edmister et al. 1999Go; Hall et al. 1999Go). Temporal resolution was restored by varying the timing of the imaging relative to the stimuli (which limited the number of stimuli that could be studied per session). These "multislice" experiments showed that sound temporal envelope strongly influences fMRI waveshape anteriorly, as well as posteriorly, in auditory cortex.

Overall, the present study establishes sound temporal envelope as a crucial variable controlling the waveshape of fMRI responses in auditory cortex and, correspondingly, the underlying multisecond time patterns of neural activity.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Twenty-six subjects participated in 40 total imaging sessions (37 "single-slice"; 3 "multislice"). Subjects ranged in age from 21 to 38 yr (mean {cong} 26 yr). Seventeen of the subjects were male and 22 were right-handed. Subjects had no known audiological or neurological disorders. Most of the imaging sessions (25) were conducted expressly for the present study. However, some were part of previous investigations of repetition rate (5 sessions; Harms and Melcher 2002Go) or sound level (10 sessions; Sigalovsky et al. 2001Go). All studies were approved by the institutional committees on the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear and Infirmary, and Massachusetts General Hospital, and all subjects gave their written informed consent.

Acoustic stimulation paradigm

Stimuli were presented binaurally during 30-s "on" periods, alternated with 30-s "off" periods during which no auditory stimulus was presented. The stimulus in each "on" period was a train of noise or tone bursts, a train of clicks, continuous noise, orchestral music, or running speech. Four or 5 alternations between the "on" and "off" periods constituted a single scanning "run" (total duration 240 or 300 s). Three to 6 stimuli were studied in each single-slice experiment. Each stimulus in a session was presented an equal number of times (7 to 13), except for music, which was often presented just 4 times. Two stimuli, a high rate noise burst train (32–40 presentations) and music (8 presentations), were studied in each multislice experiment. Fewer music presentations were used in both the single and multislice experiments because music generally evokes robust responses, so fewer presentations were necessary.

Stimuli

TRAINS OF BROADBAND NOISE BURSTS. Bursts of white noise were presented in 30-s trains at repetition rates of 2/s, 10/s, and 35/s. Each burst had a rise and fall time of 2.5 ms and was usually 25 ms in duration (measured at half-maximum), yielding sound-time fractions (STFs) of 5, 25, and 88% for the 2/s, 10/s, and 35/s trains, respectively. Other rate/STF combinations included 2/s trains with an STF of 50% (250-ms bursts) and 35/s trains with STFs of 25 and 50% (7.1- and 14.3-ms bursts). The bursts were identical within a train (i.e., "frozen"), but differed across trains.

TRAINS OF NARROWBAND BURSTS. Tone bursts and narrowband (third octave) noise bursts were presented in 30-s trains at either 2/s or 35/s (burst center frequency: 500 Hz or 4 kHz; rise and fall time: 2.5 ms; duration: 25 ms, yielding STFs of 5 and 88% for the 2/s and 35/s trains, respectively). The repeated narrowband noise bursts within a train were identical, but differed across trains (and runs).

CONTINUOUS NOISE. The continuous noise was white and uncorrelated across its entire 30-s duration. Thus there was no repetition in the temporal fine structure.

TRAINS OF CLICKS. Clicks were presented in 30-s trains at a rate of either 35/s or 100/s (click duration: about 100 µs).

RUNNING SPEECH. A 30-s speech stimulus was created by concatenating "conversational" sentences spoken by a professional male speaker (Harvard IEEE Corpus; IEEE 1969Go). The amplitude envelope of the speech was low-pass, with a power spectral density 10 dB down at 5 Hz relative to its peak at 1.3 Hz.

ORCHESTRAL MUSIC. The music stimulus was the first 30 s of the fourth movement in Beethoven Symphony No. 7. The maximum power in the music amplitude envelope occurred at 0.69 Hz, with harmonics at 1.2, 2.5, and 4.9 Hz that were all within 10 dB of the power at 0.69 Hz.

Stimulus level

Sound levels were always about 55 dB above threshold (SL), except in sessions that explicitly examined the effects of sound level. Sound levels were determined separately for each ear and stimulus (to within 5 dB) in the scanner room immediately before the imaging session. The resulting sound pressure levels (at the ear) ranged from about 60 to 90 dB SPL. During threshold determination (and functional imaging), there was an ongoing low-frequency background noise produced by the scanner coolant pump. However, the acoustic noise produced by the scanner gradient coils during imaging was not present.

In sessions that examined the effects of stimulus level, the level ranged from 30 to 75 dB SL (about 60 to 100 dB SPL). The stimuli were 1) 35/s (88% STF) noise burst trains (studied at 40, 55, and 70 dB SL; 2 sessions), 2) orchestral music (30, 50, and 60 dB SL; 5 sessions), and 3) continuous noise (35, 45, 55, 65, and 75 dB SL; 5 sessions; 3–4 levels were studied in any given session).

Task

Subjects were instructed to listen "attentively" to the stimuli. Subjects were monitored to ensure that they remained alert throughout an experiment (typically by a nonverbal signal from the subject at the end of each imaging run in response to a question from the experimenter).

In sessions from our previous investigation of sound level (using music and continuous noise), subjects performed an additional task. Specifically, at the beginning and end of each 30-s stimulus "on" period, subjects controlled a knob to turn on or off an array of lights (Melcher et al. 2000Go). Because of this additional task, data from these sound level experiments are compared only with each other (see Waveshape in posterior auditory cortex: insensitivity to sound level in RESULTS).

Sound delivery

Stimuli were produced by a D/A board (controlled by LabVIEW), amplified, and fed to a pair of piezoelectric transducers. The transducers were either 1) incorporated directly into sound attenuating earmuffs placed over the subject's ears (sound system I; custom built by GEC Marconi; used for the 10 sessions from our previous investigation of sound level), or 2) housed in a shielded box adjacent to the scanner and coupled to earmuffs by air-filled tubes (system II; all remaining sessions). The frequency response of both systems, measured at the subject's ears, was low-pass with a cutoff frequency of 10 kHz (system I) or 6 kHz (system II).

The earmuffs of the sound delivery systems attenuated the acoustic noise produced by the scanner coolant pump and gradient coils (Ravicz and Melcher 2001Go; Ravicz et al. 2000Go). At the ear, pump noise levels were about 65 dB SPL, and peak gradient noise levels were 70–95 dB SPL at about 1.0 kHz (1.5-T scanners) or about 1.4 kHz (3.0 T).

Imaging

Subjects were imaged using whole-body scanners and standard head coils while resting supine. To reduce head motion, a bite bar was custom molded to the subject's teeth and mounted to the head coil, or pillow and foam were packed snugly around the head. Each imaging session lasted about 2 h. Imaging sessions began with the acquisition of contiguous sagittal images of the whole head as a reference for functional slice selection. Also common to all sessions was the acquisition of T1-weighted anatomical image(s) (in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm) of the functionally imaged slice(s).

SINGLE-SLICE EXPERIMENTS. The single-slice experiments were conducted on several different scanners: a 1.5- or 3.0-T General Electric scanner retrofitted for high-speed, echo-planar imaging (by Advanced NMR Systems; 1.5 T: 10 sessions; 3.0 T: 6 sessions), a 1.5-T General Electric Signa Horizon scanner (10 sessions), a 1.5-T Siemens Sonata scanner (5 sessions), and a 3.0-T Siemens Allegra scanner (6 sessions). No obvious differences in response waveshape were observed between imaging systems (Harms 2002Go).

The near-coronal slice selected for functional imaging intersected the inferior colliculi and the posteromedial aspect of Heschl's gyri. When multiple transverse temporal gyri were present, the anterior one was intersected. Given this slice placement, primary auditory cortex was likely sampled in all subjects (Rademacher et al. 1993Go, 2001Go).

Functional images of the selected slice were acquired using a blood oxygenation level–dependent (BOLD) sequence (1.5 T: asymmetric spin echo, TE = 70 ms, {tau} offset = –25 ms, flip = 90°; 3.0 T: gradient echo, TE = 30 ms, except one session used 40 ms and another used 50 ms, flip = 60 or 90°). Slice thickness was 7 mm with an in-plane resolution of 3.1 x 3.1 mm. The beginning of each functional run included 4 discarded images to ensure that image signal level had approached a steady state.

Although the present paper focuses on responses from auditory cortex, our experiments imaging a single slice were also designed to examine the inferior colliculus. Therefore functional images were generally collected using cardiac gating, which increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998Go). Image acquisitions were synchronized to every other QRS complex in the subject's electrocardiogram, resulting in an average interimage interval (TR) of about 2 s. In the 2 sessions that did not use cardiac gating, the TR was 2 s.

MULTISLICE EXPERIMENTS. The multislice experiments were conducted on the 3.0-T General Electric scanner (gradient echo, TE = 30 ms, flip = 60°). The functional image volume consisted of 10 contiguous, near-coronal slices (in-plane resolution: 3.1 x 3.1 mm; slice thickness: 7 mm), one of which matched the slice plane used in the single-slice experiments. All slices of the functional volume were acquired in a brief interval (<1 s) once every 8 s (TR). The onset of the stimulus relative to the volume acquisitions was staggered by 2-s increments from run to run. Thus across the multiple runs for a given stimulus, the functional data in toto included samples acquired every 2 s relative to the stimulus. Waveshapes measured this way did not differ systematically from those obtained in the single-slice experiments [e.g., compare Fig. 10 waveshapes (multislice data) to those for the same stimuli in Fig. 2 (single-slice data)].



View larger version (46K):
[in this window]
[in a new window]
 
FIG. 10. Spatial maps of WI for multiple near-coronal slices through left auditory cortex. Stimuli are music (left) and 35/s noise bursts (88% STF; right). Top: each panel shows a WI map (color; 3.1 x 3.1-mm resolution) superimposed on a T1-weighted anatomic image (grayscale; 1.6 x 1.6 mm; image has been interpolated). For each active voxel, the WI is displayed on a red (sustained) to yellow (phasic) scale. (Each color corresponds to one fifth of the total WI range.) Distances indicated at left are relative to the most posterior slice with a distinct Heschl's gyrus (which corresponds to the slice plane used in the single-slice experiments, denoted "0 mm"). Images are displayed in radiological convention, so the subject's left is displayed on the right in each panel. Bottom: responses to music and 35/s noise bursts (88% STF) for HG and STG, averaged across the active voxels of slice "0 mm." All data are from the same imaging session.

 


View larger version (42K):
[in this window]
[in a new window]
 
FIG. 2. Response waveshape varied from phasic to sustained depending on stimulus type. Responses for 8 stimuli are shown separately for HG and the superior temporal gyrus (STG). Top panels (for each structure): response waveforms averaged across sessions (solid curves). Dashed curves indicate ±1 SE. Bottom panels: waveshape index (WI) for individual sessions. WI is a measure of overall response waveshape, designed to order responses along a continuum ranging from sustained (low WI) to phasic (high WI). Some points have been displaced horizontally for clarity. For each structure and stimulus, response waveforms and WIs are based on the data from the same imaging sessions. Number of sessions is indicated at the bottom of the WI plots. Stimulus level: approximately 55 dB SL. Noise burst trains were broadband with sound-time fractions (STFs) of 5% (2/s), 25% (10/s), or 88% (35/s). All data are from single-slice experiments and correspond to posterior auditory cortex.

 
Image analysis

PREPROCESSING. Before response detection, the following preprocessing steps were performed. For the single-slice experiments using cardiac gating, the image signal was retrospectively corrected to account for the image-to-image variations in signal strength (i.e., T1 effects) that result from fluctuations in heart rate (Guimaraes et al. 1998Go). For all experiments, the images for each scanning run were then corrected for head movements that may have occurred (SPM95 software package; without spin history correction; Friston et al. 1995aGo, 1996Go). In the single-slice experiments these corrections for motion were necessarily limited to adjustments within the imaging plane. Finally, for experiments using cardiac gating, the time series for each imaging "run" and voxel was linearly interpolated to a consistent 2-s interval between images.

RESPONSE DETECTION. Responses were detected using a general linear model (GLM) (Draper and Smith 1981Go; Fomby et al. 1984Go; Friston et al. 1995bGo) and a set of basis functions designed to detect the wide range of response waveshapes known to occur in auditory cortex (Harms and Melcher 2003Go). The basic idea behind the GLM is to model the signal versus time within each voxel as a weighted sum of basis functions, and then to identify "active" (i.e., "sound-sensitive") voxels based on the goodness of fit of this model. The basis set used here consisted of 5 components, designed to capture different aspects of response waveshape: Onset, Sustained, Ramp, Offset, and Undershoot (Fig. 1).



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 1. Fit of basis functions to measured sustained (left) and phasic (right) responses. Responses (to 2/s and 35/s noise burst trains; dashed curves in bottom row) represent an average across sessions (specifically, the mean response waveforms for Heschl's gyrus [HG] from Fig. 2). Amplitudes of the basis functions were determined using the general linear model. Summation of the basis functions for each response type yields the solid curve in the bottom row. (Vertical scale of the individual basis functions is one half that of the summed response.) Note that the onset basis function models an initial transient response that is above the level of the sustained response. Undershoot function was included to help model signal decreases below baseline, which is not well illustrated by these particular waveforms. For this example, a constant term was not included in the linear model, leading to a small mismatch between the end of the measured and fitted responses.

 
The GLM was implemented separately for each imaging session, with the response to each stimulus within a session being represented by its own basis function coefficients. We assumed that response waveshape and magnitude were constant across all presentations of a given stimulus within a session. Furthermore, we included 3 additional functions as part of the overall basis set for each imaging run: a constant term for modeling the baseline signal level of a run, and both linear and quadratic functions for modeling any low-frequency signal drift that occurred over the duration of a run. These functions are unrelated to the stimulus-induced response and were therefore excluded in the creation of the activation maps (described below). Finally, as part of the GLM, an estimate of the noise covariance, based on voxels in auditory cortex, was used to prewhiten the data from the single-slice experiments, so as to bring the false-positive rate closer in line with its theoretically predicted value (Harms and Melcher 2003Go). No prewhitening was applied to the data from the multislice experiments because the residuals without prewhitening were already consistent with a hypothesis of "white" (uncorrelated) noise, presumably attributable to the long interval (8 s) between volume acquisitions.

For each stimulus in a given imaging session, we created an "omnibus" activation map (using an F-statistic; Fomby et al. 1984Go) that tested against the null hypothesis that none of the estimated amplitudes of the basis functions was significantly different from zero. "Active" voxels were defined as those with values of P < 0.001 (not corrected for multiple comparisons).

DEFINING REGIONS OF INTEREST. For the single-slice experiments examining posterior auditory cortex, the response waveshape was quantified for 2 anatomically defined regions of interest (ROIs): Heschl's gyrus (HG) and the superior temporal gyrus (STG). When Heschl's gyrus was evident, it constituted the HG ROI. Otherwise, the HG ROI covered the medial third of the superior temporal plane. The STG ROI was defined as the region lateral to the HG ROI, extending superiorly to the edge of the parietal lobe, and inferiorly to the superior edge of the superior temporal sulcus. Regions from both hemispheres were combined to create the final (bilateral) HG and STG ROIs for each session. [The number of voxels in the resulting ROIs averaged 55 ± 10 (mean ± SD) in HG, and 72 ± 13 in STG.] Additional analyses (not presented) confirmed that response waveshapes on the left and right sides separately exhibited the same rate and STF dependencies seen when the 2 sides were analyzed together.

RESPONSE QUANTIFICATION. For each ROI, responses were quantified in terms of individual basis function amplitudes or combinations thereof. First, for a given stimulus and ROI, the amplitudes of a given basis function were averaged across all the active voxels in the ROI. These average amplitudes were then converted to a "percentage change" scale by dividing by the estimated signal baseline (i.e., the value of the constant term in the GLM, averaged across runs and the same active voxels) and multiplying by 100. We denote the resulting amplitudes of the onset, sustained, ramp, and offset components as On, Sust, Ramp, and Off, respectively. Mid, a measure of the response amplitude near the middle of the "sound on" period, was defined as the sum of Sust plus one half of Ramp. For a given stimulus and ROI, we required a minimum of 3 active voxels to include that stimulus/ROI combination in the RESULTS. In all, 5 responses for STG and 2 for HG were excluded because the 3-voxel criterion was not met, leaving 156 responses for STG and 159 for HG. [The number of active voxels in these responses averaged 15.4 ± 7.6 (mean ± SD) in HG, and 23.3 ± 11.1 in STG.]

The response in each ROI was further quantified using a "waveshape index" (WI), a measure of overall response waveshape, designed to order responses along a continuum ranging from sustained to phasic (Harms and Melcher 2003Go). The formulation was such that the WI was independent of overall response amplitude and stayed within a finite range. Specifically, for a given ROI

(1)
(Before their use in this equation only, On, Off, and Mid were rectified, i.e., negative values were converted to zero.) Using this definition, the WI approaches one when the 2 transient components (On, Off) are similar in magnitude and are large relative to the midpoint response. Values near zero reflect a response dominated by the midpoint response (i.e., by the sustained and/or ramp components). The WI was used to quantify average responses within ROIs (Figs. 2, 4, 6, 7, 9) and responses within individual voxels (Figs. 8 and 10).



View larger version (37K):
[in this window]
[in a new window]
 
FIG. 4. Effect of STF on response waveshape for 35/s noise burst trains. Left panels: WIs for individual sessions. Right panels: responses averaged across sessions. Note that response waveshape becomes more phasic with increasing STF. The data are from seven sessions that examined at least two STFs (50%, 88% in 5 sessions; 25%, 88% in 1 session; 25%, 50%, 88% in 1 session). There are no STG data for one session because there were too few active voxels in STG (see METHODS). Noise bursts were broadband. Level: approximately 55 dB SL. All data are from single-slice experiments.

 


View larger version (30K):
[in this window]
[in a new window]
 
FIG. 6. Four quantitative measures of waveshape (WI, On, Mid, Off) showed little or no sensitivity to stimulus level, but a strong sensitivity to stimulus temporal characteristics. Each point is derived from a within-session comparison. Unfilled symbols in the "Level Change" columns indicate the largest change across 3–4 levels (30- to 40-dB range) in a given measure (top: WI, 2nd row: On, 3rd row: Mid, bottom: Off) while holding stimulus temporal characteristics constant. (Diamonds: 35/s noise burst train; up triangles: continuous noise; down triangles: music). Filled symbols in the "Temporal Change" columns indicate the change in a given measure arising from a change in stimulus temporal characteristics while holding sensation level constant to within 5 dB. [Circles: 35/s vs. 2/s noise burst trains at either 55 dB SL (empty circles) or 70 dB SL (circles with "+"); squares: continuous noise vs. music (50–55 dB SL)]. "Temporal Change" values were obtained in a subset of the sessions that yielded the "Level Change" values. All data are from single-slice experiments.

 


View larger version (34K):
[in this window]
[in a new window]
 
FIG. 7. WI showed little or no sensitivity to stimulus bandwidth, but a strong sensitivity to stimulus rate. Each point is derived from a within-session comparison. Unfilled symbols in the "Bandwidth Change" columns indicate the difference in WI between a broadband and narrowband stimulus of the same rate. Narrowband stimuli were either tone bursts or filtered (third octave) noise bursts, with a center frequency (Fc) of 500 Hz or 4 kHz. (Empty diamonds: rate = 35/s and Fc of the narrowband stimulus was 500 Hz; diamonds with "+": rate = 35/s, Fc = 4 kHz; empty squares: rate = 2/s, Fc = 500 Hz; squares with "+": rate = 2/s, Fc = 4 kHz). Filled symbols in the "Rate Change" columns indicate the difference in WI between 35/s and 2/s stimuli of the same bandwidth. (Circles: broadband stimuli; up triangles: narrowband with Fc = 500 Hz; down triangles: narrowband with Fc = 4 kHz). "Rate Change" values were obtained in the same sessions that yielded the "Bandwidth Change" values. All data are from single-slice experiments.

 


View larger version (23K):
[in this window]
[in a new window]
 
FIG. 9. WI in HG vs. STG. Each point corresponds to a particular imaging session and stimulus. Note that points for low-rate stimuli (left) lie on either side of the diagonal line, whereas points for higher-rate stimuli or continuous noise (right panel) tend to lie above the diagonal, indicating a greater WI in STG compared with HG. Low-rate stimuli are bursts (broadband or narrowband) with a rate of 2/s (unfilled circles) and music or speech (unfilled triangles). Higher-rate stimuli are 100/s clicks or 35/s bursts (broadband or narrowband) with STF of 88% (filled diamonds), and 10/s noise bursts, 35/s clicks, or 35/s noise bursts with STF of 25 or 50% (filled stars). Continuous noise data correspond to filled squares. All data are from single-slice experiments.

 


View larger version (54K):
[in this window]
[in a new window]
 
FIG. 8. WI maps illustrating a waveshape difference between HG and STG in posterior auditory cortex (single-slice experiments). Each panel shows a color map of WI superimposed on a grayscale anatomic image for either left auditory cortex (top, middle), or right auditory cortex (bottom; flipped horizontally). Maps are from 3 different subjects.

 
CALCULATING RESPONSE WAVEFORMS. Response waveforms were computed by averaging across all presentations of a given stimulus in a given imaging session. First, after image preprocessing, the time series for each imaging run and voxel was corrected for linear or quadratic drifts in signal, and normalized such that the time-averaged signal had the same (arbitrary) value for all voxels and runs. For single-slice experiments, the data were temporally smoothed (using a 3-point, zero-phase filter with coefficients 0.25, 0.5, and 0.25), and averaged across response "blocks." A "block" was a 70-s window (35 images) that included 10 s before stimulus onset, the 30 s coinciding with the stimulus, and the 30-s "off" period after the stimulus. All response blocks for a given stimulus were averaged to give an average signal versus time waveform for each voxel. These signal versus time waveforms were further averaged across the active voxels in HG or STG. The resulting waveform was then converted to percentage change in signal relative to baseline to yield the final response waveform for a given stimulus, ROI (HG or STG), and session. The baseline was defined as the average signal from t = –6 to 0 s, with time t = 0 s corresponding to the onset of the stimulus. For multislice experiments, response waveforms were also computed by averaging data across 70-s response blocks. However, this averaging accounted for the staggered timing between stimulus and volume acquisition from run to run, such that the resulting average response was sampled every 2 s. The average response for individual voxels was temporally smoothed, further averaged across active voxels, and converted to percentage change in signal as for the single-slice experiments.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Waveshape in posterior auditory cortex: dependency on stimulus type

The waveshape of fMRI responses depended strongly on the type of stimulus, as illustrated in Fig. 2. The response waveforms of Fig. 2 were obtained in single-slice imaging sessions and thus correspond to posterior auditory cortex (i.e., posterior HG and immediately lateral STG). Each waveform represents an average response (across multiple single-slice sessions) to one of 8 broadband stimuli of about 55 dB SL. In both HG and STG, the average response to 100/s clicks and 35/s noise bursts was highly phasic, whereas the response to 2/s noise bursts, speech, and music was primarily sustained. The phasic responses were characterized by a prominent signal decline (80–120%) after an initial signal peak (at about 6 s), and a clear peak after sound offset (at about 36 s). In contrast, the sustained responses showed far less signal decline (25–40%), and lacked a distinct peak after sound offset. The responses evoked by 35/s clicks, 10/s noise bursts, and continuous noise were "intermediate" in waveshape in that they displayed an "intermediate" degree of signal decline (50–75%), and only a small "off-peak" (e.g., 35/s clicks and continuous noise in STG) or only the suggestion of a small, "hidden" off-response in the form of a slightly prolonged elevation in signal after stimulus termination (e.g., 35/s clicks and continuous noise in HG, and 10/s noise bursts). Overall, the stimuli in Fig. 2 produced a range of waveshapes, from highly phasic to highly sustained.

The stimulus-dependent variations in waveshape from phasic to sustained were well captured by the waveshape index (WI), as can be seen from the plots of WI for individual imaging sessions in Fig. 2. In keeping with the trend in the average waveforms, the WIs for 100/s clicks and 35/s noise bursts cluster toward the phasic end of the WI range (i.e., toward one), whereas the WIs for 2/s noise bursts, speech, and music cluster near the sustained end (i.e., toward zero). The WIs for 35/s clicks, 10/s noise bursts, and continuous noise generally occupy an intermediate range. In general, trends in WI were also reflected in the individual elements that together define the WI. For instance, in the case of 35/s and 2/s noise burst trains, the transient components of the basis set—onset (On) and offset (Off)—were almost always greater at 35/s than at 2/s (≥19 of 22 cases in both HG and STG), whereas the midpoint signal level (Mid) was less at 35/s than at 2/s for all but one instance in HG and STG. Thus like the WI, the trends for the individual components indicated more phasic responses at 35/s. In the following presentation, trends for the individual components will be described only when they provide insights that are not revealed by the WI.

Waveshape in posterior auditory cortex: dependency on repetition rate

Because the stimuli represented in Fig. 2 have quite different temporal characteristics, but are similar in spectrum and level, the data strongly suggest that stimulus temporal characteristics are an important determinant of cortical response waveshape. The importance of one temporal parameter, stimulus repetition rate, can be seen clearly from Fig. 2 in that higher-rate trains (i.e., 100/s clicks and 35/s noise bursts) typically elicited phasic responses, whereas low-rate trains (2/s noise bursts) elicited more sustained responses. Speech and music fit with this trend because they are dominated by low modulation rates and produce sustained waveshapes (Fig. 2, far right).

The dependency of waveshape on rate was also evident in individual sessions. In every session that used 2/s, 10/s, and 35/s noise burst trains, the WI in both HG (7 sessions, 5 subjects) and STG (6 sessions, 5 subjects) increased with increasing rate, indicating a shift in response waveshape from sustained to phasic. In sessions that used 2/s and 35/s (but not 10/s) noise bursts (16 sessions, 16 subjects), the WI was greater for 35/s in all but one instance in HG, and in every instance in STG (P < 0.001, signed-rank test). For the 2 sessions (2 subjects) that used both 35/s and 100/s clicks, the WI for 100/s clicks was greater than that for 35/s clicks in both HG and STG. Thus the intrasession data strongly indicate a shift toward more phasic responses with increasing rate.

In the comparisons just described, burst or click duration remained constant across rates, so sound-time fraction (STF) covaried with rate. Because this raises the possibility that waveshape may be controlled by STF, and not rate, we examined the effects of rate while holding STF constant (which required varying the duration of individual bursts; 5 sessions, 5 subjects). Figure 3 shows that even when STF was held at 50%, the average response for a 35/s noise burst train (14.3-ms bursts) was still more phasic than that for a 2/s train (250-ms bursts). Moreover, in every session (5/5 for HG, P = 0.06; 4/4 for STG, P = 0.13, signed-rank test), the WI for 35/s was greater than that for 2/s, indicating a trend toward more phasic responses at 35/s. However, the response difference with equal STF was less pronounced than the difference between trains of the same 2 rates for which STF also changed (average difference in WI between 35/s and 2/s trains was about 0.24 with 50% STFs, compared with about 0.44 for 35/s, 88% vs. 2/s, 5% trains from the same sessions). Thus these results provide further support for the view that response waveshape depends strongly on stimulus rate, but suggest that STF may also influence response waveshape.



View larger version (33K):
[in this window]
[in a new window]
 
FIG. 3. Effect of rate (2/s vs. 35/s) on response waveshape for noise burst trains with equal STF (50%). Left panels: WIs for individual sessions. Right panels: responses averaged across sessions. Data are from 5 sessions that examined both rates in the same imaging session. There are no STG data for one session because there were too few active voxels in STG (see METHODS). Noise bursts were broadband. Level: approximately 55 dB SL. All data are from single-slice experiments.

 
Waveshape in posterior auditory cortex: dependency on sound-time fraction

The effect of STF on auditory responses was directly examined by varying STF while holding rate constant (7 sessions; 6 subjects; single-slice experiments) (Fig. 4). Note that increasing STF corresponds to increasing duration of individual bursts when rate is held constant. In HG, the WI was greater for a 35/s noise burst train with an 88% STF (25-ms bursts) than for a 50% STF (14.3-ms bursts) in all 6 sessions with these data (P = 0.03, signed-rank test), indicating a clear effect of STF on response waveshape in HG. However, the changes in WI were generally small, consistent with the small differences between the average response waveforms for the 50 and 88% STFs (Fig. 4). In STG, the WI was greater for the 88% STF in 4 of 5 cases (P = 0.2). In the 2 sessions (2 subjects) with data for a larger STF differential of 25% (7.1-ms bursts) and 88% (25-ms bursts), responses were more phasic (i.e., higher WI) at the higher STF in both HG and STG, and this difference in waveshape was more pronounced (Fig. 4). This suggests that larger STF differentials result in larger, more robust changes in response waveshape. Overall, the 35/s noise burst data indicate a trend toward more sustained responses with decreasing STF.

There was also evidence of an effect of STF on response waveshape for low-rate (2/s) noise burst trains, but the effect was small, and apparent in only one of the components making up the WI. In particular, in 5 sessions (5 subjects) with responses to 2/s noise burst trains with STFs of 5 and 50%, On was greater at the 50% STF in all cases, in both HG and STG (P = 0.06, signed-rank test), whereas Off and Mid showed no change with STF. The net effect on the average response waveform was a slightly more pronounced "on-peak" at the higher STF, leading to a slightly more phasic response [cf., the 2/s (50% STF) average waveforms in Fig. 3 to the 2/s (5% STF) average waveforms in Fig. 2].

Altogether, the experiments varying repetition rate and STF indicate that rate and STF both influence the waveshape of responses from auditory cortex.

Waveshape in posterior auditory cortex: insensitivity to sound level

Unlike changes in stimulus temporal characteristics, variations of stimulus level over a 30- to 40-dB range did not result in strong, systematic changes in response waveshape. This is illustrated in Fig. 5, which shows responses from posterior HG for one session (a single-slice experiment). The left panel shows that phasic responses were elicited by 35/s noise burst trains (88% STF) regardless of level (40, 55, and 70 dB SL), whereas the right panel illustrates the markedly different responses produced by 35/s and 2/s noise burst trains of comparable level (70 dB SL). Any change in waveshape with level was far less than the change in waveshape with rate.



View larger version (35K):
[in this window]
[in a new window]
 
FIG. 5. Response waveforms in posterior HG illustrating little sensitivity to level, but strong sensitivity to rate. Left: responses to a 35/s noise burst train at 3 stimulus levels (40, 55, and 70 dB SL). Right: responses to 2/s and 35/s trains at a common level of approximately 70 dB SL. Noise bursts were broadband. STF = 5% (2/s) or 88% (35/s). All responses are from the same imaging session (a single-slice experiment).

 
Three sets of experiments examined the influence of stimulus level on response waveshape, and their results, as quantified below, support the impression from Fig. 5. Two sets of experiments examined the effect of sound level on waveshape and compared it with the effect of a change in stimulus temporal characteristics. One of these sets used 35/s noise bursts (88% STF) of various levels (40, 55, 70 dB SL) and 2/s noise bursts (5% STF; 55 dB SL) for comparison (2 subjects; 2 sessions, which includes the session represented in Fig. 5). The second set used continuous noise of various levels (35–75 dB SL) and music (50 dB SL) for comparison (5 sessions, 5 subjects). A third set of experiments examined the effect of stimulus level on waveshape using music, but did not include a comparison stimulus with different temporal characteristics (5 sessions, 5 subjects). (Four of the 5 subjects in the second set of experiments were also subjects in the third set of experiments.) Values for the WI and its components (On, Mid, Off) were obtained for each experiment and used to compare the effects of level and stimulus temporal characteristics on waveshape. For each session and measure (WI, On, Mid, Off), we determined the change in value across levels and, when data were available, we also determined the change across stimuli of differing temporal characteristics. The change across levels was obtained by subtracting the minimum value (of WI, On, Mid, or Off) across levels from the maximum value, and assigning a positive (negative) sign if the maximum value occurred at the higher (lower) level. The change across temporal characteristics was calculated as the difference in WI, On, Mid, or Off between the standard and comparison stimulus (i.e., continuous noise minus music, or 35/s minus 2/s noise bursts) at comparable sensation levels. The changes across level and temporal characteristics are plotted in the "Level Change" and "Temporal Change" columns, respectively, of Fig. 6.

The changes in waveshape index ({Delta}WI) produced by changes in stimulus level or temporal characteristics can be seen in Fig. 6 (top). The {Delta}WI for changes in level were clustered around zero in both HG and STG (HG: P = 0.15, STG: P = 0.23, signed-rank test). In contrast, for the same sessions, the {Delta}WI for temporal changes were always the same sign (P = 0.004 in both HG and STG). Furthermore, in all but one instance (in HG), the absolute value of {Delta}WI was greater for the temporal changes than for the level changes. Altogether, the WI was altered in a systematic manner by stimulus temporal changes, but was insensitive to level changes.

Examination of the individual waveshape components revealed a similar insensitivity to level for the onset and midpoint, but some sensitivity for the offset. Like the WI, On and Mid did not change in a systematic direction when the level was changed (in both HG and STG, On: P > 0.3, Mid: P > 0.9), but did for a change in sound temporal characteristics (in both HG and STG, On: P ≤ 0.01, Mid: P = 0.004; Fig. 6). In addition, the absolute value of the change in Mid ({Delta}Mid) was almost always greater for sound temporal changes than for level changes. Unlike On and Mid, there was evidence for a consistent effect of level on Off in that the {Delta}Off attributed to a change in level were typically positive (HG: P = 0.01, STG: P = 0.15). However, the {Delta}Off arising from a stimulus temporal change were always positive, and on average larger than the changes attributed to level. Thus although level appears to have some effect on response offset, the influence of stimulus temporal properties was both greater (on average) and more consistent in the direction of the effect. Altogether, these analyses of {Delta}WI, {Delta}On, {Delta}Mid, and {Delta}Off indicate that response waveshape is affected in a more systematic manner, and to a greater degree, by changes in stimulus temporal characteristics than by changes in stimulus level.

Waveshape in posterior auditory cortex: insensitivity to sound bandwidth

In another set of experiments we compared the effects on response waveshape of stimulus bandwidth versus stimulus rate. The experiments (4 sessions, 4 subjects) used various combinations of broadband noise bursts, narrowband noise bursts, and tone bursts presented at either 2/s (5% STF) or 35/s (88% STF). These experiments all included broadband and narrowband stimuli of the same rate, as well as 2/s and 35/s stimuli of the same bandwidth, so that the effects of bandwidth (holding rate constant) and the effects of rate (holding bandwidth constant) could be compared within the same sessions. The {Delta}WI for changes in bandwidth (WI for the broadband minus the narrowband stimulus) were clustered around zero in both HG and STG (HG: P = 0.8, STG: P = 0.4, signed-rank test; Fig. 7, "Bandwidth Change" columns). In contrast, for the same sessions, the {Delta}WI for a change in rate (high minus low rate) were always positive (HG and STG: P = 0.02; Fig. 7, "Rate Change" columns), and in all but one instance (in STG) were greater than the {Delta}WI because of the bandwidth change. The lack of a consistent effect of bandwidth on waveshape was seen in a similar analysis of individual waveshape components (On, Mid, Off; not shown). Overall, the results indicate that stimulus bandwidth does not have a systematic effect on waveshape, in contrast to the highly systematic and robust effect of rate.

Waveshape in posterior auditory cortex: differences between HG and STG

Inspection of WI maps for the broad range of stimuli studied in the single-slice experiments revealed clear spatial variations in WI within posterior auditory cortex for certain stimuli. Sample maps showing some of the clearest spatial variations are shown in Fig. 8. In all of these cases, responses in STG tended to be more phasic than those in HG.

Waveshape differences between HG and STG were readily apparent in the overall database from the single-slice experiments, but only for stimuli that elicit moderately-phasic to phasic responses (Fig. 9). For low-rate stimuli, the WI did not differ significantly between HG and STG (2/s bursts with STFs of 5 or 50%: P = 0.14, signed-rank test; music and speech: P = 0.13; Fig. 9, left). For these stimuli, On, Mid, and Off all tended to be greater in STG than in HG, indicating a regional difference in response amplitude, but not waveshape. In contrast, higher-rate stimuli or continuous noise did show significant differences in WI between HG and STG (100/s clicks and 35/s bursts with a 88% STF: P < 0.001; 10/s noise bursts, 35/s clicks, and 35/s noise bursts with a 25 or 50% STF: P < 0.001; continuous noise: P = 0.06). This difference arose because On and Off tended to be larger in STG than in HG, whereas Mid showed no difference between the 2 structures. Thus analyses of overall waveshape and individual response components indicated more phasic responses in STG compared with HG for higher-rate stimuli or continuous noise, but not for low-rate stimuli.

Waveshape throughout auditory cortex for music and 35/s noise bursts

The results presented so far indicate that response waveshape depends strongly on stimulus temporal characteristics. However, the data supporting this conclusion are for the posterior aspect of auditory cortex sampled by the single-slice experiments. To test whether stimulus temporal characteristics are important to response waveshape throughout auditory cortex, 3 multislice experiments were performed using a low-rate stimulus (music) and a high-rate stimulus (35/s noise bursts, 88% STF). Although music and 35/s noise bursts have acoustic differences beyond rate (e.g., spectrum as a function of time and fine time structure), the rate of the dominant amplitude modulation is one of the primary acoustic differences between these 2 stimuli.

Figure 10 displays a spatial map of WI in left auditory cortex for a session using multiple slices. The results are representative of data for the right hemisphere and the other 2 subjects. For both music and 35/s noise bursts, active voxels occurred throughout the superior temporal lobe. However, the WIs for music and 35/s noise bursts occupied distinct ranges. In particular, the majority of active voxels for music had a WI <0.2, indicating primarily sustained responses. In contrast, the majority of active voxels for the 35/s noise burst train had a WI >0.4, indicating "intermediate" or phasic responses. The voxels activated by the 2 stimuli were largely overlapping, indicating that the same regions of cortex can show either phasic or sustained responses depending on the stimulus. The bottom of Fig. 10 illustrates the difference in response waveshape between these 2 stimuli (for the slice used in the "single-slice" experiments, denoted "0 mm").1 Overall, the dramatic difference in waveshape for music versus 35/s noise bursts occurs in widespread cortical areas, which indicates that stimulus temporal characteristics strongly influence waveshape throughout auditory cortex.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Our findings show that the waveshape of fMRI responses in posterior auditory cortex is strongly dependent on sound temporal characteristics, but not sound level or bandwidth. Specifically, fMRI response waveshape varied from highly sustained to highly phasic with changes in sound repetition rate. Waveshape also varied systematically with STF. Together, rate and STF constitute 2 of the primary determinants of sound temporal envelope. [Additional aspects of the envelope that we did not independently manipulate include stimulus rise time, temporal regularity, and the shape (e.g., trapezoidal vs. sinusoidal) of the modulating waveform.] In contrast to sound envelope, response waveshape in posterior auditory cortex varied neither systematically nor substantially with large changes in bandwidth or changes in level. Because we specifically studied moderate to high sound levels, we cannot say whether the insensitivity of waveshape to level extends down to threshold. However, over a wide level range commonly experienced in everyday life, it is clear that sound temporal envelope was by far the dominant determinant of response waveshape.

We also showed that the strong influence of sound temporal envelope characteristics on fMRI response waveshape holds in widespread regions of auditory cortex, not just the posterior aspect. Specifically, a low-rate stimulus (i.e., music) evoked sustained responses throughout the superior temporal lobe, whereas a high rate stimulus (35/s noise burst train) evoked phasic responses in equally widespread areas. The areas of the superior temporal lobe that responded to these low- and high-rate sounds were largely overlapping, reinforcing our previous finding that a given cortical area can shift its response from sustained to phasic as sound characteristics are changed (Harms and Melcher 2002Go).

The sound-dependent shift in response waveshape is probably not attributable to hemodynamic factors, but instead reflects a shift in the time pattern of cortical neural activity (for detailed discussion see Harms and Melcher 2002Go). Briefly, neural activity and image signal are linked through a chain of metabolic and hemodynamic events such that increases and decreases in neural activity result in concordant changes in image signal strength (Devor et al. 2003Go; Heeger and Ress 2002Go; Logothetis 2003Go; Nemoto et al. 2004Go). Synaptic activity (excitatory or inhibitory) can be a dominant source of changes in local brain metabolism and oxygen consumption and may frequently be the primary source of the fMRI response (Auker et al. 1983Go; Caesar et al. 2003Go; Logothetis et al. 2001Go; Mathiesen et al. 1998Go; Nudo and Masterton 1986Go). However, strong correlations between neural discharge activity and fMRI response have also been reported (perhaps arising because the discharge activity is highly correlated with synaptic activity; Heeger et al. 2000Go; Rees et al. 2000Go; Smith et al. 2002Go). For the purposes of the present discussion, we leave open the possibility that either or both synaptic activity and discharges "contribute" to the fMRI responses we measured. Regardless of whether synaptic or discharge activity drives the fMRI response, the resulting hemodynamic changes occur over the course of seconds. This means that fMRI provides a temporally low-pass filtered view of neural activity, in contrast to microelectrode recordings, for example, which show the fine time pattern of spiking in individual neurons. Thus the shift in fMRI waveshape from sustained to phasic reflects a change in the gross (second-by-second) time patterns of neural activity. The fact that a given region (or even a single voxel) can exhibit both sustained and phasic fMRI responses (depending on the stimulus) is strong additional evidence that the waveshape changes arise through changes in underlying neural activity because the coupling between neural activity and hemodynamic response is presumably constant within a given region. Most likely, changes in neural adaptation and the strength of neural off-responses underlie the sustained-to-phasic shift in fMRI waveshape, as discussed previously (Harms and Melcher 2002Go). Because fMRI is an indicator of activity summed over many neurons, this shift in waveshape within a given area may occur because distinct, but spatially interspersed, neural populations (with differing response properties) are recruited in different stimulus regimes (Lu and Wang 2004Go; Lu et al. 2001Go). Alternatively, the same neurons may shift their mode of response as sound temporal envelope characteristics are varied (Creutzfeldt et al. 1980Go; Steinschneider et al. 1998Go).

The waveshape of cortical fMRI responses depends on an interplay between rate and STF

Our results show that cortical response waveshape depends on an interplay between rate and STF. For instance, phasic responses occurred for trains of repeated stimuli having both high rates and high STFs, whereas sustained responses occurred for trains in the opposite corner of the rate-STF space (i.e., low rates and low STFs). When the effects of rate and STF were in opposition the resulting responses were "intermediate" between the 2 extremes of phasic and sustained.

Response waveshape versus amplitude

Previous studies have reported variations in fMRI activation with sound level or bandwidth (Brechmann et al. 2002Go; Hall et al. 2001Go; Hart et al. 2003Go; Jäncke et al. 1998Go; Mohr et al. 1999Go; Wessinger et al. 2001Go). However, these studies examined the amplitude and spatial extent of the fMRI response, rather than response waveshape, and thus do not contradict the insensitivity of waveshape to sound level or bandwidth seen in the present study. Most of the previous studies used analysis procedures that assume a sustained response (e.g., correlation with a boxcar waveform), which means that a change in response waveshape could present as a change in amplitude or extent (Harms and Melcher 2003Go). However, given the data of the present study, the previously reported level and bandwidth dependencies most likely reflect true changes in response amplitude and extent because waveshape presumably varied little with level or bandwidth.

Neural activity differences between cortical areas

HG and STG, sites of primary and nonprimary auditory areas, respectively, showed differences in fMRI waveshape that likely reflect differences in the underlying time pattern of neural activity. The differences were not apparent for low-rate stimuli, but were apparent for higher-rate stimuli and continuous noise, with STG showing more phasic responses than HG. This difference in fMRI waveshape indicates that the neural activity of STG, compared with that of HG, was concentrated more at sound onset and offset, and less during the sound. Similar regional differences in fMRI waveshape have also been described previously for a stimulus that falls in the range of our higher-rate stimuli (a 10/s train of near-tonal bursts; Seifritz et al. 2002Go). These studies indicate that fMRI waveshape may provide a physiological marker for distinguishing primary from nonprimary auditory cortical areas.

An alternative, but unlikely, explanation for the waveshape differences between HG and STG is that they reflect regional differences in tissue hemodynamics (Chen et al. 1998Go; Davis et al. 1998Go). Although compelling arguments have been made that hemodynamics do not account for stimulus-dependent waveshape changes within a region (above; also Harms and Melcher 2002Go), these arguments do not apply to comparisons of waveshape across regions. Nevertheless, a separate argument indicates that a hemodynamic explanation is unlikely. The argument follows from the fact that waveshape differences between HG and STG were seen for some sounds (higher-rate stimuli and continuous noise), but not for others (low-rate stimuli). If the waveshape differences between structures were attributed to hemodynamic factors, the hemodynamics themselves would have to be stimulus dependent (or equivalently, waveshape dependent because waveshape depends on stimulus). Even though a hemodynamic dependency on stimulus (or waveshape) cannot be entirely excluded, it has no precedent. We therefore conclude that the fMRI waveshape differences between HG and STG likely reflect differences in neural activity, rather than hemodynamics.

Sound perception, fMRI response waveshape, and cortical neural activity

The results suggest a possible relationship among sound temporal envelope, cortical response waveshape, and certain perceptual attributes of sound trains. Specifically, changes in sound temporal envelope characteristics produce changes in response waveshape that we hypothesize vary according to the degree of perceptual grouping of the individual, "elemental" stimuli of a train (e.g., the individual bursts). In this discussion, our assessments of perceptual grouping are based on informal listening (but agree with the pertinent psychophysical data; e.g., Miller and Taylor 1948Go). For trains with a low rate and low STF (e.g., trains of 2/s tone or noise bursts with 5% STF), which elicited sustained responses, the individual stimuli of a train are easily distinguished (e.g., they can be easily counted). For trains with a high rate and high STF (e.g., trains of 35/s tone or noise bursts with 88% STF), which elicited highly phasic responses, the individual stimuli of a train fuse, resulting in a continuous (but modulated) percept in which the most salient perceptual changes are at the beginning and end of the overall train. For intermediate parameters (e.g., 10/s noise bursts), which elicited intermediate waveshapes, the perceptual characteristics of a train are also "intermediate" in that the elemental composition of the train can be appreciated, although the individual elements are not so readily distinguished (e.g., they cannot be easily counted).

The present study provides additional support for the previously suggested relationship between fMRI waveshape and the elemental versus fused nature of sound trains (Harms and Melcher 2002Go). Importantly, we found that, in contrast to sound temporal envelope, variations in level or bandwidth had little or no effect on cortical response waveshape. At the same time, these variations have little or no effect on the elemental versus fused nature of a train (Harms et al., personal observations). Thus the sound features that most strongly influence the grouping of successive stimuli in a train (temporal envelope characteristics rather than level or bandwidth) are also the features that most strongly influence the waveshape of cortical fMRI responses and thus the time patterns of the underlying neural activity.

A relationship between fMRI response waveshape and the elemental versus fused quality of a sound train fits with the hypothesis that population neural activity in auditory cortex signals the beginning and end of perceptually distinct acoustic "events." For trains perceived as a series of distinct events, each successive event may elicit a transient neural response, leading to a sustained fMRI waveshape (due to the "low-pass" nature of the hemodynamic system linking neural activity to fMRI response). In contrast, for trains in which the individual elements fuse into one, the entire train may be treated as a single event and thus produce neural responses primarily at sequence onset and offset, and thus a phasic fMRI waveshape. Perceptually, the transition between the 2 extremes of discerning many distinct events versus a single event is gradual (Miller and Taylor 1948Go), and presumably the corresponding transition in neural activity is also gradual. Thus for trains in the transition region, neural responses presumably reflect both individual events and the train as a whole to varying degrees, and thus produce "intermediate" fMRI waveshapes. Although this hypothesis relating neural activity and auditory events stems from fMRI data for sounds lasting tens of seconds, studies using other techniques provide evidence for heightened neural activity at the beginning and end of shorter-duration sounds (electrophysiology, magnetoencephalography; see DISCUSSION in Harms and Melcher 2002Go). Therefore the relationship between neural activity and auditory events hypothesized here may also apply on shorter timescales.

The response waveshapes for the nontrain stimuli of the present study (music, speech, continuous noise) may also fit with the view of events and neural activity just described. Music and speech are readily heard as a series of distinct perceptual entities (beats or words, respectively). These stimuli produce highly sustained fMRI responses, possibly because each beat or word elicits an elemental neural response. In contrast, continuous noise, which contains no separable elements, produced more phasic responses, indicating that the entirety of the noise segment is treated as a single event. Interestingly, the fMRI response waveshapes for continuous noise were often less phasic than those for high-rate trains (i.e., the onset and offset of the noise were less strongly delineated by neural activity). This may be because continuous noise lacks regularly occurring temporal modulations to bind it, and therefore forms a less salient perceptual event (McAdams 1989Go; Pollack 1951Go).

Previously, we suggested that a population neural representation of the beginning and end of distinct perceptual events is weak or absent in the midbrain, begins to emerge in the thalamus, and is robust in auditory cortex (Harms and Melcher 2002Go). The data for HG versus STG in the present study indicate that this progressive emergence across levels of the auditory system continues in the cortex, with the delineation of events being more accentuated in nonprimary, as compared with primary, auditory areas. Altogether, the present and previous results suggest a multistage neural processing scheme in which the auditory environment is segmented increasingly into distinct, and ultimately meaningful events.


    GRANTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
This study was supported by National Institute of Deafness and Other Communication Disorders Grants PO1DC-00119 and T32DC-00038. M. P. Harms was partially funded by an Athinoula A. Martinos Scholarship. This work was also supported in part by the National Center for Research Resources (P41RR-14075) and the Mental Illness and Neuroscience Discovery (MIND) Institute.


    ACKNOWLEDGMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
The authors thank M. Tramo, A. Dale, P. Cariani, and A. Oxenham for helpful comments and suggestions, and B. Norris for considerable assistance in figure preparation.


    FOOTNOTES
 
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1 The 3 multislice experiments were consistent with the observation in the single-slice experiments of waveshape differences between HG and STG for higher-rate stimuli because the WI for 35/s noise bursts was always greater in STG than in HG for the slice used in the single-slice experiments. Back

Address for reprint requests and other correspondence: M. P. Harms, Pfizer Inc., 700 Chesterfield Parkway West, Chesterfield, MO 63017 (mharms{at}alam.mit.edu)


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Auker CR, Meszler RM, and Carpenter DO. Apparent discrepancy between single-unit activity and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504–1516, 1983.[Abstract/Free Full Text]

Brechmann A, Baumgart F, and Scheich H. Sound-level-dependent representation of frequency modulations in human auditory cortex: a low-noise fMRI study. J Neurophysiol 87: 423–433, 2002.[Abstract/Free Full Text]

Caesar K, Gold L, and Lauritzen M. Context sensitivity of activity-dependent increases in cerebral blood flow. Proc Natl Acad Sci USA 100: 4239–4244, 2003.[Abstract/Free Full Text]

Chen W, Zhu XH, Toshinori K, Andersen P, and Ugurbil K. Spatial and temporal differentiation of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation. Magn Reson Med 39: 520–527, 1998.[ISI][Medline]

Creutzfeldt O, Hellweg FC, and Schreiner C. Thalamocortical transformation of responses to complex auditory stimuli. Exp Brain Res 39: 87–104, 1980.