|
|
||||||||
The Journal of Neurophysiology Vol. 86 No. 6 December 2001, pp. 2761-2788
Copyright ©2001 by the American Physiological Society
1Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461; and 2Department of Surgery, Division of Neurosurgery, University of Iowa College of Medicine, Iowa City, Iowa 52242
| |
ABSTRACT |
|---|
|
|
|---|
Fishman, Yonatan I., Igor O. Volkov, M. Daniel Noh, P. Charles Garell, Hans Bakken, Joseph C. Arezzo, Matthew A. Howard, and Mitchell Steinschneider. Consonance and Dissonance of Musical Chords: Neural Correlates in Auditory Cortex of Monkeys and Humans. J. Neurophysiol. 86: 2761-2788, 2001. Some musical chords sound pleasant, or consonant, while others sound unpleasant, or dissonant. Helmholtz's psychoacoustic theory of consonance and dissonance attributes the perception of dissonance to the sensation of "beats" and "roughness" caused by interactions in the auditory periphery between adjacent partials of complex tones comprising a musical chord. Conversely, consonance is characterized by the relative absence of beats and roughness. Physiological studies in monkeys suggest that roughness may be represented in primary auditory cortex (A1) by oscillatory neuronal ensemble responses phase-locked to the amplitude-modulated temporal envelope of complex sounds. However, it remains unknown whether phase-locked responses also underlie the representation of dissonance in auditory cortex. In the present study, responses evoked by musical chords with varying degrees of consonance and dissonance were recorded in A1 of awake macaques and evaluated using auditory-evoked potential (AEP), multiunit activity (MUA), and current-source density (CSD) techniques. In parallel studies, intracranial AEPs evoked by the same musical chords were recorded directly from the auditory cortex of two human subjects undergoing surgical evaluation for medically intractable epilepsy. Chords were composed of two simultaneous harmonic complex tones. The magnitude of oscillatory phase-locked activity in A1 of the monkey correlates with the perceived dissonance of the musical chords. Responses evoked by dissonant chords, such as minor and major seconds, display oscillations phase-locked to the predicted difference frequencies, whereas responses evoked by consonant chords, such as octaves and perfect fifths, display little or no phase-locked activity. AEPs recorded in Heschl's gyrus display strikingly similar oscillatory patterns to those observed in monkey A1, with dissonant chords eliciting greater phase-locked activity than consonant chords. In contrast to recordings in Heschl's gyrus, AEPs recorded in the planum temporale do not display significant phase-locked activity, suggesting functional differentiation of auditory cortical regions in humans. These findings support the relevance of synchronous phase-locked neural ensemble activity in A1 for the physiological representation of sensory dissonance in humans and highlight the merits of complementary monkey/human studies in the investigation of neural substrates underlying auditory perception.
| |
INTRODUCTION |
|---|
|
|
|---|
Despite the ubiquity and
importance of music in human culture, our understanding of the
physiological bases of music perception is still in its infancy. A
fundamental feature of music is harmony, which refers to
characteristics of simultaneous note combinations or "vertical"
musical structure (i.e., chords). It has been recognized since
antiquity that certain chords sound more pleasant than others (Pythagoras, ca. 600 BC, in Apel 1972
). Chords composed
of tones related to each other by simple (small-integer) frequency
ratios, e.g., octave (2:1) and perfect fifth (3:2), are typically
judged to be harmonious, smooth, or consonant, whereas chords composed of tones related to each other by complex (large-integer) ratios, e.g.,
minor second (256:243) and major seventh (243:128), are considered
unpleasant, rough, or dissonant.
In considering consonance and dissonance, it is important to
distinguish between musical consonance/dissonance, i.e., of a given
sound evaluated within a musical context, and psychoacoustic, or
sensory consonance/dissonance, i.e., of a given sound evaluated in
isolation (see Plomp and Levelt 1965
; Terhardt
1974b
, 1977
). Musical consonance/dissonance is culturally
determined, as evidenced by its variation across cultures and
historical periods (see Apel 1972
; Burns and Ward
1982
). In contrast, judgments of sensory consonance/dissonance
are culturally invariant and largely independent of musical training
(Butler and Daston 1968
). Moreover, rodents, birds,
monkeys, and human infants discriminate isolated musical chords on the
basis of sensory consonance and dissonance similarly to expert human
listeners and experienced musicians (Fannin and Braud
1971
; Hulse et al. 1995
; Izumi
2000
; Schellenberg and Trainor 1996
;
Zentner and Kagan 1996
). These findings indicate that
sensory consonance/dissonance is likely shaped by relatively basic
auditory processing mechanisms that are not music specific and that can be studied in experimental animals.
Several psychoacoustic theories have been proposed to explain why
musical intervals characterized by simple frequency ratios sound more
consonant than intervals characterized by complex frequency ratios (see
Plomp and Levelt 1965
for review). The most prominent of
these theories, first promoted by Helmholtz (1954)
,
states that dissonance is related to the sensation of "beats" and
"roughness." These perceptual phenomena occur when two or more
simultaneous components of a complex sound are separated from one
another in frequency by less than the width of an auditory filter or
"critical bandwidth" (10-20% of center frequency) (Zwicker
et al. 1957
) and are hence unresolved by the auditory system.
Unresolved frequency components interact in the auditory periphery,
producing fluctuations in the amplitude of their composite waveform
envelope that are perceived as beats (fluctuations below 20 Hz) or
roughness (fluctuations from 20 to 250 Hz) (Kameoka and
Kuriyagawa 1969a
,b
; Plomp and Levelt 1965
;
Plomp and Steeneken 1968
; Terhardt 1968a
,b
,
1974a
,b
, 1978
). The rate of these amplitude fluctuations equals
the difference in frequency between the components. The disappearance
of roughness for stimuli with amplitude fluctuation rates exceeding
~250 Hz is thought to be due to the low-pass characteristic of the
auditory nervous system (Plomp and Steeneken 1968
;
Terhardt 1974a
, 1978
).
The beats/roughness theory is impressive in its ability to predict the
perceived dissonance of musical intervals on the basis of a relatively
low-level psychoacoustic phenomenon. For intervals composed of harmonic
complex tones, as produced by most musical instruments, dissonance
depends on the ratio of the fundamental frequencies
(f0s) of the tones: dissonance is maximal when
the f0s of the complex tones form large-integer
ratios and minimal when they form small-integer ratios (Kameoka
and Kuriyagawa 1969b
; Plomp and Levelt 1965
).
This pattern arises because chords composed of complex tones forming
large-integer f0 ratios have fewer harmonics in
common and more harmonics lying within the same critical band than
chords composed of complex tones forming small-integer
f0 ratios. Of these unresolved pairs of
harmonics, the number with difference frequencies below 250 Hz is
greater for intervals characterized by large-integer
f0 ratios than for intervals characterized by small-integer f0 ratios. The summation of
roughness contributed by each unresolved pair of frequencies separated
by >20 Hz and by <250 Hz determines the overall perceived dissonance
of musical intervals composed of complex tones (Kameoka and
Kuriyagawa 1969b
; Plomp and Levelt 1965
;
Terhardt 1974a
, 1978
). Consequently, musical intervals
with large-integer f0 ratios produce more
roughness and therefore more dissonance.
The neurophysiological basis of sensory consonance/dissonance
perception is largely unknown. Bilateral lesions of auditory cortical
areas in humans and animals are associated with deficits in pitch
perception (Whitfield 1980
; Zatorre 1988
)
and a range of music perception impairments (e.g.,
Liegeois-Chauvel et al. 1998
; Peretz et al.
1994
), including aberrant consonance/dissonance perception
(Peretz et al. 2001
; Tramo et al. 1990
).
Several physiological studies have suggested that roughness may be
represented in primary auditory cortex (A1) by neuronal responses
phase-locked to the amplitude-modulated temporal envelope of complex
sounds (Bieser and Muller-Preuss 1996
; Schulze
and Langner 1997
; Steinschneider et al. 1998
).
This hypothesis is supported by the correlation found between the
magnitude of neuronal ensemble phase-locking to the AM frequency (=
difference frequency) of harmonic complex tones in A1 of the awake
monkey and the degree of roughness perceived by human listeners.
Specifically, phase-locking is maximal at stimulus modulation
frequencies at which roughness is maximal and dissipates at stimulus
modulation frequencies at which roughness disappears (Fishman et
al. 2000a
). Given the involvement of A1 in music perception
and assuming the validity of Helmholtz's beats/roughness theory of
sensory dissonance, it follows that if the hypothesized mechanism
underlying the physiological representation of roughness is correct,
then the perceived dissonance of musical chords should correlate with
the magnitude of A1 activity phase-locked to the difference
frequencies. The present study tests this hypothesis by examining
phase-locked neuronal ensemble activity evoked by musical chords with
varying degrees of consonance and dissonance in A1 of the awake macaque
monkey. Macaques share similarities in basic auditory cortical anatomy
and physiology with humans (Galaburda and Pandya 1983
;
Galaburda and Sanides 1980
; Steinschneider et al.
1994
, 1999
) and are able to discriminate musical chords on the
basis of sensory consonance/dissonance (Izumi 2000
),
making them appropriate animal models for investigating neural
representation of sensory consonance and dissonance in the central
auditory system.
Correlation between patterns of cortical activity in an animal model
and psychoacoustical features of consonance/dissonance perception
leaves in question, however, whether these neural response patterns are
applicable to the human brain. A stronger argument for the relevance of
these physiological responses could be made if physiological findings
similar to those obtained in the animal model are observed in human
neural responses. Therefore, in parallel to the studies in monkeys,
auditory-evoked potentials (AEPs) evoked by musical chords were also
recorded directly from the auditory cortex of two patients undergoing
surgical evaluation for medically intractable epilepsy. This
cross-species approach has already been used to advantage in the study
of auditory cortical representation of the voice onset time phonetic
feature (Steinschneider et al. 1999
) and offers several
significant benefits. Clearly, it bolsters the relevance of the animal
results by testing the suitability of the macaque as a model in which
to examine neural correlates of higher perceptual processes.
Furthermore, if a similarity between human and animal physiological
response patterns can be demonstrated, the more refined sampling and
analysis inherent in animal physiological studies can help to
characterize the detailed mechanisms underlying the neural
representation of the perceptual process under study.
| |
METHODS |
|---|
|
|
|---|
Monkey surgery and electrophysiological recordings
Three adult male monkeys (Macaca fascicularis) were
studied using previously reported methods (Steinschneider et al.
1992
, 1994
, 1998
). Animals were housed in our Association for
Assessment and Accreditation of Laboratory Animal Care-accredited
Animal Institute under daily supervision by veterinary staff. All
experiments were conducted in accordance with institutional and federal
guidelines governing the experimental use of primates. Briefly, using
aseptic surgical techniques under general anesthesia (pentobarbital,
initial and supplementary doses of 20 and 5 mg/kg iv, respectively),
holes were drilled in the exposed skull to accommodate epidural
matrices consisting of adjacent 18-gauge stainless steel tubes.
Matrices were stereotaxically positioned to target A1 and were oriented at an angle of 30° from normal to approximate the anterior-posterior tilt of the superior temporal plane. This orientation guided electrode penetrations roughly perpendicular to the cortical surface, thereby fulfilling one of the major technical requirements of one-dimensional current-source density (CSD) analysis (Vaughan and Arezzo
1988
). Matrices and Plexiglas bars used for painless head
immobilization during the recording sessions were held in place by a
pedestal of dental acrylic fixed to the skull by inverted screws keyed into the bone. Animals were given peri- and postoperative analgesic, antibiotic, and anti-inflammatory medications. Recordings began 2 weeks
after surgery and were conducted in an electrically shielded, sound-attenuated chamber with the animals awake and comfortably restrained.
Intracortical recordings were obtained using linear-array multi-contact
electrodes containing 14 recording contacts, evenly spaced at 150-µm
intervals (Barna et al. 1981
). Individual contacts were
constructed from 25-µm-diameter stainless steel wires, each with an
impedance of ~200 k
. An epidural stainless steel guide tube
positioned over the occipital cortex served as a reference electrode.
Field potentials were recorded using unity-gain headstage preamplifiers, and amplified 5,000 times by differential amplifiers with a frequency response down 3 dB at 3 Hz and 3 kHz. Signals were
digitized at a sampling rate between 2 and 4 kHz (depending on the
analysis time used) and averaged by computer (Neuroscan software and
hardware, Neurosoft) to yield AEPs. To derive multiunit activity (MUA),
signals were simultaneously high-pass filtered above 500 Hz, amplified
an additional eight times, and full-wave rectified prior to
digitization and averaging. MUA is a measure of the summed action
potential activity of neuronal aggregates within a sphere of about
50-100 µm in diameter surrounding each recording contact
(Brosch et al. 1997
; Vaughan and Arezzo
1988
). For some electrode penetrations, raw data were stored on
a 16-channel digital tape recorder (Model DT-1600, MicroData
Instrument; sample rate: 6 kHz) for off-line analyses. Due to
limitations of the acquisition computer, the sampling rates used were
below the Nyquist frequency corresponding to the 3 kHz upper cutoff of
the amplifiers. However, empirical testing revealed negligible signal
distortion due to aliasing, as most of the spectral energy in the MUA
lies below 1 kHz. Using shorter analysis windows and fewer channels, raw data re-digitized at 6 kHz, yielded nearly identical averaged waveforms as data sampled at the lower rate. Absence of aliasing was
also confirmed by low-pass filtering the MUA at 800 Hz (96 dB/octave
roll-off) following rectification and prior to digitization at 2 kHz,
using digital filters (RP2 modules, Tucker Davis Technologies) acquired
after the completion of this study. Differences between unfiltered and
low-pass filtered MUA signals were negligible (see Fig. 2). To further
confirm the validity of MUA measures, off-line multi-unit cluster
analyses of unrectified high-pass filtered data were performed for some
sites. Peristimulus time histograms (PSTHs) were constructed with a
binwidth of 1 ms. Triggers for spike acquisition were set at 2.5 times
the amplitude of the background "hash" of lower-amplitude,
high-frequency activity.
One-dimensional CSD analyses characterized the laminar pattern of net
current sources and sinks within A1 generating the AEPs. CSD was
calculated using a three-point algorithm that approximates the second
spatial derivative of voltage recorded at each recording contact
(Freeman and Nicholson 1975
; Nicholson and
Freeman 1975
). Current sinks represent net inward transmembrane
current flow associated with local depolarizing excitatory postsynaptic
potentials or passive, circuit-completing current flow associated with
hyperpolarizing potentials at adjacent sites. Current sources represent
net outward transmembrane currents associated with active
hyperpolarization or passive current return associated with adjacent
depolarizing potentials. The corresponding MUA profile is used to help
distinguish these possibilities: current sinks coincident with
increases in MUA likely reflect depolarizing synaptic activity, whereas
current sources associated with concurrent reductions in MUA from
baseline levels likely reflect hyperpolarizing events rather than
passive current return for adjacent synaptic depolarization.
Electrodes were manipulated with a microdrive and positioned using on-line examination of click-evoked potentials as a guide. Pure tone and chord stimuli were delivered when the electrode channels bracketed the inversion of early AEP components and the largest MUA, typically occurring during the first 50 ms within lamina IV (LIV) and lower lamina III (LLIII), was situated in the middle channels. Evoked responses to 75 presentations of the stimuli were averaged with an analysis window (including a 25-ms prestimulus baseline interval) of 300 ms for pure tones and 520 ms for musical chord stimuli.
Human electrophysiological recordings
Intracranial AEPs were recorded in one man (subject 1) and one woman (subject 2). Both subjects had medically intractable epilepsy, were right-handed, and required placement of multiple temporal lobe electrodes to determine the location of seizure onsets. Experimental procedures were approved by the University of Iowa Human Subjects Review Board and the National Institutes of Health. Informed consent was obtained from the subjects prior to their participation. Subjects underwent surgical implantation of intracranial electrodes (Radionics, Burlington, MA) to acquire diagnostic electroencephalographic (EEG) data required for planning subsequent surgical treatment. Subjects did not undergo any additional risk by participating in this study.
Subject 2 had depth electrodes (Howard et al.
1996a
,b
) implanted in the right Heschl's gyrus and planum
temporale. Data from this subject using different stimulus protocols
have been reported (Steinschneider et al. 1999
). Bipolar
recordings at three locations were obtained from closely spaced
recording contacts (impedance, 200 k
, 2.5-4.2 mm inter-contact
distance) placed sterotaxically along the long axis of Heschl's gyrus.
Spectral sensitivity of two of these sites, site 1 (the most
posteromedial site) and site 3 (the most anterolateral site), was
assessed via independent analysis of multiple unit responses. Maximal
tone responses of units at sites 1 and 3 were 2,125 ± 252 and
736 ± 91 (SD) Hz, respectively, consistent with findings that
higher frequencies are represented at more posteromedial locations in
human A1 (Howard et al. 1996a
; Steinschneider et
al. 1999
). Subject 1 had three depth electrodes
implanted in the right superior temporal gyrus: the first in Heschl's
gyrus, the second in the planum temporale, and the third in a more
posterior location within the planum temporale. Click-evoked responses
recorded at the location of the most posterior electrode were of low
amplitude, and, consequently, musical chord-evoked responses were not
recorded at this electrode. Responses at the Heschl's gyrus and planum
temporale electrodes were recorded from two higher-impedance (200 k
)
and one lower-impedance (30 k
) recording contacts (2.5-4.2 mm
inter-contact distance). Spectral sensitivities of sites in
subject 1 were not determined. The reference electrode was a
subdural electrode located on the ventral surface of the ipsilateral,
anterior temporal lobe.
Recording sessions took place in a quiet room in the Epilepsy Monitoring Unit of the University of Iowa Hospitals and Clinics with the subjects lying comfortably in their hospital beds. Subjects were awake and alert throughout the recordings. For both subjects, sweeps exhibiting high-amplitude epileptic spikes at any time point within the analysis window were rejected by the acquisition computer or discarded following visual inspection of the data.
AEPs were recorded at a gain of 5,000 using headstage amplification followed by differential amplification (BAK Electronics). Field potentials were filtered (band-pass, 2-500 Hz; roll-off, 6 dB/octave), digitized (1.0- or 2.050-kHz sampling rate), and averaged, with an analysis window of 500 ms (including a 25-ms prestimulus baseline interval) in the case of subject 2 and 1,000 ms (including a 325-ms prestimulus baseline interval) in the case of subject 1. Averages were generated from 50 to 75 stimulus presentations. Raw EEG and timing pulses were stored on a multi-channel tape recorder (Racal) for off-line analysis.
Stimuli
MONKEY RECORDINGS. Frequency response functions (FRFs), based on pure tone responses, were used to characterize the frequency tuning of the cortical sites. Pure tones ranging from 0.2 to 17.0 kHz were generated and delivered at a sampling rate of 100 kHz by a PC-based system using SigGen and SigPlay (Tucker Davis Technologies). Pure tones were 175 ms in duration with 10-ms linear ramps. Stimulus onset asynchrony (SOA) for pure tone presentation was 658 ms. All stimuli were monaurally delivered at 60 dB SPL via a dynamic headphone to the ear contralateral to the recorded hemisphere. Sounds were introduced to the ear through a 3-in-long, 60-ml plastic tube attached to the headphone. Sound intensity was measured with a Bruel and Kjaer sound level meter (type 2236) positioned at the opening of the plastic tube. The frequency response of the headphone was flattened (±3 dB) from 0.2 to 17.0 kHz by a graphic equalizer (Rane).
Musical chords were synthesized by summation of appropriate pure tone components (all in sine phase) using Turbosynth sound-synthesizing software on a Macintosh computer, edited using SoundDesigner software, and presented in pseudorandom order using ProTools (Digidesign) or SigGen and SigPlay (Tucker Davis Technologies) software and hardware. Each chord was composed of two simultaneous harmonic complex tones, each containing the f0 and the second through the tenth harmonic (all of equal amplitude). The f0 of one of the complex tones defined the base tone (root) of the two-tone chord, while that of the second complex tone defined the musical interval. Intervals were presented in three different octave ranges (forming 3 stimulus sets), such that the f0 of the base tone was 128, 256, or 512 Hz, corresponding to C one octave below middle C, middle C, and C one octave above Middle C, respectively. Each stimulus set presented in a given electrode penetration was composed of eight different musical intervals with varying degrees of dissonance. Intervals were confined to one octave and were constructed according to the Pythagorean, or "pure fifth," system of tuning (interval ratios obtained from Apel 1972
|
HUMAN RECORDINGS.
In the case of subject 1, stimuli were delivered to the left
ear (contralateral to the recorded hemisphere) by an insert earphone (Etymotic Research). In the case of subject 2, stimuli were
delivered to the left ear by an external headphone (Koss, model K240DF) coupled to a 4-cm cushion. Stimuli were presented at a comfortable listening level (60-70 dB SPL). In the case of subject 1,
musical interval stimuli were identical to those used in the monkey
recordings and were presented in pseudorandom order with a SOA of 2,000 ms. Due to time constraints, only a subset of the chords presented in
the monkey studies was presented in the human studies. In the case of
subject 2, two-tone chords were generated using a keyboard synthesizer (Roland, model JV-35) in organ mode. Keyboard-generated sounds were edited and presented in the same manner as sounds created
by addition of frequency components, except that their total duration
was 375 ms. Spectral analyses of the organ sounds indicated the
presence of multiple harmonic components (see Fig. 18). In contrast to
the chords constructed from sine wave addition, keyboard generated
chords were based on equal temperament tuning (the tuning system
conventionally used in modern Western music), thereby allowing
qualitative comparison between neural responses evoked by intervals
derived from different systems of tuning (Pythagorean vs. equal
temperament). Keyboard-generated chords were presented in pseudorandom
order with a SOA of 658 ms. Subjects were informally asked to relate
their impression of the musical chords following the acquisition of a
block of electrophysiological responses (e.g., "Did the chord sound
pleasant or unpleasant?"). Patients' subjective evaluations of the
chords were consistent with those reported in psychoacoustic studies on
consonance and dissonance (Butler and Daston 1968
;
Kameoka and Kuriyagawa 1969b
; Malmberg
1918
).
Monkey histology
At the end of the recording period, monkeys were deeply
anesthetized with pentobarbital sodium and transcardially perfused with
10% buffered formalin. Tissue was sectioned (80 µm thickness) and
stained for acetylcholinesterase and Nissl substance to reconstruct the
electrode tracks and to identify A1 according to previously published
criteria (Hackett et al. 1998
; Merzenich and
Brugge 1973
; Morel et al. 1993
; Wallace
et al. 1991a
). Field R was demarcated from A1 by a reversal in
the best frequency gradient (Merzenich and Brugge 1973
;
Morel et al. 1993
). The earliest sink/source configuration was used to locate LIV (Steinschneider et al.
1992
). Other laminar locations were then determined by their
relationship to LIV and the measured widths of laminae within A1 for
each electrode penetration histologically identified.
Data analysis
MONKEY RECORDINGS.
The best frequency (BF) of the cortical site sampled in a given
electrode penetration was defined as the pure tone frequency eliciting
the largest peak amplitude MUA within LLIII during the first 50 ms
following stimulus onset. Determination of BF was generally based on
MUA averaged across two to three LLIII electrode contacts. Use of peak
amplitude initial MUA as a measure of BF yielded the expected
anterolateral to posteromedial topographic gradients of low- to
high-frequency representation in all animals (Merzenich and
Brugge 1973
; Morel et al. 1993
; Recanzone
et al. 2000
).
HUMAN RECORDINGS. Spectral analysis (FFT) was also used to quantify phase-locked activity in AEPs recorded in human auditory cortex. For subject 1, the FFT was applied to the same window as that examined in the analysis of electrophysiological data obtained from monkey A1 (175-445 ms). This analysis window excluded ON and OFF response components. A shorter FFT analysis window (175-370 ms), consistent with shorter stimuli, was applied to the data from subject 2. Phase-locked activity was quantified using two complementary measures. The first measure was similar to that used to quantify monkey data: the peak of the amplitude spectrum from 10 to 300 Hz (no significant oscillatory activity was observed at frequencies >150 Hz). The second measure was the area under the amplitude spectrum from 10 to 150 Hz, which thereby includes spectral peaks corresponding to multiple difference frequencies or harmonics of a single difference frequency. Accordingly, an increase in oscillatory phase-locked activity at multiple frequencies, and potentially relevant for the encoding of roughness and sensory dissonance, would thus be represented by an increase in the integral of the amplitude spectrum of the "steady-state" response within the 10- to 150-Hz frequency range. As in the analysis of monkey data, statistical significance of spectral peaks was assessed by comparing mean spectra of AEPs evoked by non-octave intervals with mean spectra of octave-evoked AEPs.
| |
RESULTS |
|---|
|
|
|---|
Neural ensemble activity evoked by musical chords in monkey A1
Results are based on 32 perpendicularly oriented (error, <20%) electrode penetrations into A1. 256 and 512 Hz base tone stimulus sets (each comprised of 8 intervals) were each presented in 17 penetrations. Due to the comparatively low number of sampled cortical sites with BFs <1,280 Hz (the highest frequency component of the base tone in the 128-Hz octave stimulus), chords with base tones of 128 Hz were presented in only four electrode penetrations. As this small sample size precluded meaningful interpretation of statistical measures, data based on 128 Hz base tone interval-evoked responses are not discussed.
Figure 2 shows representative CSD and MUA laminar response profiles evoked by the two most consonant and the two most dissonant musical chords presented: octave and perfect fifth, minor second and major second, respectively (base tone = 256 Hz; BF = 1,600 Hz). CSD and MUA waveforms in each quadrant of the figure represent neuronal activity recorded simultaneously at 150-µm interval depths within A1. The dashed boxes superimposed on the LLIII responses delineate the temporal window subjected to spectral analysis.
|
All of the musical chords elicit a stereotypical laminar pattern of activity characterized by short-latency current sinks (below-baseline deflections in the CSD) located in the thalamorecipient zone (lamina IV and LLIII, Initial sink), and slightly later supragranular sinks located in mid- and ULIII (supragranular sink). These sinks are coincident with above-baseline bursts of MUA in lamina III and IV (MUA ON), indicating that they primarily represent current flow associated with depolarizing synaptic potentials. The LLIII and ULIII sinks are balanced by deeper and more superficial current sources (above-baseline deflections in the CSD, e.g., P28 source) that, together with the sinks, form current dipole configurations consistent with initial activation of pyramidal cells in LLIII and delayed polysynaptic activation of pyramidal cell elements in ULIII.
While initial components of responses evoked by the consonant and
dissonant stimuli are qualitatively very similar, later portions of the
responses differ considerably. Both of the dissonant intervals evoke
prominent oscillations in the MUA and CSD that are phase-locked to the
predicted difference frequencies (minor 2nd: 13.6 Hz; major 2nd: 32 Hz). Neuronal beating patterns are evident both in the thalamorecipient
zone and in ULIII, with maximal phase-locked activity typically
occurring in LLIII. This laminar distribution is consistent with that
observed in previous investigations of phase-locked neural ensemble
activity in macaque A1 (Fishman et al. 2000a
;
Steinschneider et al. 1998
). In contrast, little or no
oscillatory activity is evoked by the consonant intervals. Waveforms of
LLIII MUA low-pass filtered at 800 Hz (96 dB/octave roll-off)
prior to digitization are superimposed on those of unfiltered data. MUA
waveforms generated under these two conditions are virtually identical,
confirming the absence of significant signal aliasing. This is further
demonstrated by the nearly flat difference waveforms, shown below the
superimposed waveforms.
Figure 3 shows representative
chord-evoked responses (base tone = 256 Hz) recorded at a site
with a BF of 1,000 Hz. LLIII MUA and CSD waveforms of chord-evoked
responses (from 175 to 445 ms poststimulus onset) and associated
amplitude spectra are depicted in Fig. 3, A and
B, respectively. The most dissonant intervals (minor and
major 2nd) evoke robust phase-locked oscillations in both the MUA and
CSD, which are manifested as prominent peaks in the associated
amplitude spectra at predicted difference frequencies (indicated by
arrows). In contrast, responses evoked by the most consonant intervals
(octave and perfect 5th) display little or no oscillatory activity and
are characterized by comparatively flat amplitude spectra. Intervals of
intermediate dissonance (e.g., major 7th, minor 7th, and augmented 4th)
also evoke oscillatory phase-locked responses. Even the perfect fourth
evokes oscillatory activity, consistent with the theoretical prediction
that perfect fourths become disproportionately more dissonant, compared
with octaves and perfect fifths, when the base tone of the chord is lower than ~300 Hz (see Fig. 12 in Plomp and Levelt
1965
).
|
Figure 4 shows representative chord-evoked responses (base tone = 512 Hz) recorded at a site with a BF of 4,000 Hz. Similar to the pattern of responses evoked by chords with base tones of 256 Hz, the most dissonant 512-Hz base tone chords (minor and major 2nd) evoke phase-locked oscillations in both the MUA and CSD, which are represented as prominent peaks in the associated amplitude spectra at predicted difference frequencies (indicated by arrows). In contrast, responses evoked by the most consonant chords (octave and perfect 5th) are characterized by a virtual absence of rapid oscillations and by comparatively flat amplitude spectra. Intervals of intermediate dissonance also evoke oscillatory responses phase-locked to predicted difference frequencies.
|
Oscillatory phase-locked activity is visible not only in the averaged
responses but also in responses evoked by individual stimulus
presentations. Statistical significance of phase-locked activity in the
dissonant-chord-evoked responses relative to octave-evoked responses is
demonstrated in Figs. 5
and 6 for
two representative A1 sites (the same as those
shown in Figs. 2 and 4, respectively). The figures show mean (±SE)
LLIII MUA and CSD waveforms and corresponding mean (±SE) amplitude
spectra of responses evoked by the two most dissonant chords (minor and
major 2nd) and by the two most consonant chords (octave and perfect
5th). Major peaks in the mean spectrum of responses evoked by the minor
second and the major second occur at predicted difference frequencies
(
). Means at the peaks are significantly larger than means at
corresponding frequencies in the spectrum of octave-evoked responses
(one-tailed t-test; t and P values are
shown in the figures). In contrast, the mean spectrum of perfect
fifth-evoked responses (above 10 Hz) is not significantly different
from that of octave-evoked responses (P > 0.05). No
significant differences between mean spectra are observed at
frequencies >150 Hz. Peaks at 60 Hz are present in the mean spectra of
Fig. 5 due to the fact that mean spectra of responses to individual
stimulus presentations include 60-Hz line noise, which disappears with
time domain averaging.
|
|
Similar chord-evoked oscillatory response patterns are observed when responses are analyzed using more conventional neurophysiological techniques. Figure 7A shows PSTHs based on multiunit spike activity recorded in LLIII at three representative sites in A1. Data from two of these sites are represented in Figs. 2-6. Similarly to MUA results, PSTHs of the minor- and the major-second-evoked responses display periodic oscillations that are absent in the PSTHs of the octave- and perfect-fifth-evoked responses. Dissonant-chord-evoked oscillations are manifested as peaks in corresponding amplitude spectra at predicted difference frequencies or their harmonics (Fig. 7B). In contrast, spectra of consonant-chord-evoked responses are characterized by a general absence of significant peaks at frequencies >10 Hz.
|
Figures 2-7 illustrate a general pattern of musical chord-evoked
responses in A1: dissonant intervals evoke oscillatory phase-locked activity, whereas consonant intervals evoke comparatively little or no
phase-locked activity. However, the relative magnitude of phase-locked
activity differs across sites. According to the roughness theory of
sensory dissonance, the total dissonance of a musical chord reflects
the sum of roughness contributed by each pair of unresolved frequency
components (Kameoka and Kuriyagawa 1969b
; Plomp
and Levelt 1965
; Terhardt 1974a
, 1978
).
Accordingly, the overall dissonance of a musical chord should be
represented by response patterns averaged across the tonotopic map in A1.
To quantify the average relative phase-locked activity across the cortical sites sampled, the peak of the amplitude spectrum of each chord-evoked response was first expressed as a percentage of the minimum peak amplitude of the eight chord-evoked response spectra obtained in each electrode penetration. Normalized peak spectrum amplitudes were subsequently averaged across penetrations. The resultant mean normalized amplitudes, plotted as a function of musical interval (ordered from left to right according to interval width) for each of the three response measures examined are shown in Fig. 8. On average, the octave and perfect fifth evoke comparatively little phase-locked activity, whereas the minor and major second generally evoke the highest amplitude phase-locked responses. The perfect fourth, the third most consonant interval, also yields comparatively little phase-locked activity when presented within the octave above middle C (i.e., with a base tone of 512 Hz) but becomes physiologically more "dissonant" when presented within the octave of middle C. Differences among mean normalized amplitudes across interval conditions are statistically significant (repeated-measures ANOVA: all F > 8.5, P < 0.00001).
|
To examine the extent to which the magnitude of phase-locked neuronal ensemble activity in A1 correlates with the perceived dissonance of the musical intervals, spectra of the eight musical interval-evoked responses from each electrode penetration were ranked according to their peak amplitude (1 = lowest amplitude, 8 = highest amplitude). Physiological ranks from each penetration were then compared with perceptual ranks of the intervals (1 = least dissonant, 8 = most dissonant; see details in METHODS). For all three of the response measures examined, mean rank of spectral amplitude tends to increase with the perceived dissonance of the chords (Fig. 9). This relationship, quantified by Spearman rank-order correlation analysis based on raw data (n = 17 penetrations) and emphasized by the superimposed linear regression lines, is statistically significant (r values are indicated in the figure; P < 0.00001) for all response components and octave ranges examined. The strongest correlation between neural and perceptual measures is seen for LLIII CSD, while the weakest association is seen for ULIII CSD.
|
AEPs evoked by musical chords in human auditory cortex
AEPs recorded directly from human auditory cortex display strikingly similar response patterns to those observed in monkey A1. Figure 10 shows musical chord-evoked AEPs recorded at a single site within Heschl's gyrus (subject 1; base tone = 256 Hz). AEPs evoked by the most dissonant chords (minor 2nd, major 2nd, and major 7th) display prominent oscillations phase-locked to the predicted difference frequencies, which are manifested as peaks (indicated by arrows) in the corresponding amplitude spectra (middle). In contrast, responses evoked by the octave and by the perfect 5th are characterized by an absence of rapid oscillations and by comparatively flat, low-amplitude spectra (above ~10 Hz).
|
Statistical significance of the spectral peaks was assessed by comparing the mean spectrum of AEPs evoked by the non-octave chords with that of AEPs evoked by octaves (shown superimposed in Fig. 10, right; error bars represent SE). Means at the peaks of the dissonant chord-evoked response spectra are significantly larger than means at corresponding frequencies in the mean spectrum of octave-evoked responses (one-tailed t-test; P values are indicated in the figure). Mean spectra of AEPs evoked by the perfect fifth and the augmented fourth are not significantly different from the mean spectrum of octave-evoked AEPs (P > 0.05). No significant differences between the mean spectrum of non-octave chord-evoked responses and that of octave-evoked responses were observed at frequencies >50 Hz. Similar phase-locked response patterns are displayed by AEPs averaged across the three recording sites in Heschl's gyrus of subject 1 (Fig. 11), indicating that oscillatory activity is synchronized (i.e., displays phase coherence) over a considerable distance across the cortical tissue. As quantified in Fig. 12, the magnitude of oscillatory phase-locked activity at each of the three sites and in the averaged data tends to increase with increasing dissonance of the chords. The only major deviation from this trend is the greater oscillatory activity evoked by the major seventh relative to that evoked by the major second.
|
|
Figure 13 depicts
AEPs evoked by chords with base tones of 128 Hz recorded at the same
site in Heschl's gyrus as that shown in Fig. 10. In this case, with
the exception of the octave, all intervals, including the perfect
fifth, evoke oscillatory responses. This pattern is consistent with the
observation that in octave ranges below middle C, all intervals, except
octaves with base tones >100 Hz, sound rougher and more dissonant than
their higher octave counterparts (Plomp and Levelt
1965
). This may explain why, in lower octave ranges, intervals
smaller than octaves, including perfect fifths, tend to be avoided in
music composition (Plomp and Levelt 1965
). Oscillatory
phase-locked responses are manifested as peaks in corresponding
amplitude spectra at predicted difference frequencies (indicated by
arrows). Many of these spectral peaks are statistically significant
relative to the mean spectrum of octave-evoked responses
(right-hand column; one-tailed t-test; P values are indicated in the figure; same conventions as in
Fig. 10). No significant differences between mean spectra are observed at frequencies >75 Hz (P > 0.05).
|
As in the case of responses evoked by intervals with base tones of 256 Hz, similar phase-locked response patterns are evident when AEPs evoked by intervals with base tones of 128 Hz are averaged across the three Heschl's gyrus recording sites (Fig. 14), indicating that oscillatory activity is synchronized over a considerable distance across the cortical tissue. Responses are quantified in Fig. 15, which again shows that, with the exception of the comparatively high values obtained for augmented fourth-evoked responses, the magnitude of oscillatory activity tends to increase with increasing dissonance of the chords. This trend is not apparent, however, for AEPs recorded at the most medial location in Heschl's gyrus, Site 3.
|
|
Chord-evoked AEPs recorded simultaneously at posterior electrode
contacts located in the planum temporale display little or no
oscillatory activity, even when elicited by the most dissonant chords.
This is illustrated in Fig. 16,
which shows representative AEPs recorded at a
single site in the planum temporale (base tone = 128 Hz).
Phase-locked activity is largely absent except for low-amplitude (but
statistically significant, relative to octave-evoked responses;
P < 0.005) oscillations at 64 Hz in the perfect
fifth-evoked response. Above 10 Hz, mean spectra of AEPs evoked by all
non-octave intervals are not significantly different from the mean
spectrum of octave-evoked AEPs (P > 0.05), except for
the mean spectrum of the perfect fifth-evoked response. AEPs evoked by
chords with base tones of 256 Hz (whose difference frequencies are
double those of the 128-Hz base tone chords) in the planum temporale are characterized by a similar absence of oscillatory activity, even in
the case of perfect fifth-evoked responses (data not shown). Correspondingly, mean spectra of all non-octave 256-Hz base tone chord-evoked responses are not significantly different from mean spectra of octave-evoked responses (P > 0.05; data not
shown). This markedly diminished sensitivity to temporal features of
the stimuli is remarkable given the comparatively large amplitude of
the AEPs (e.g., the P70 component
indicated in the octave
responses
recorded in the planum temporale is approximately twice the
amplitude of that recorded in Heschl's gyrus).
|
In both Heschl's gyrus and the planum temporale, the amplitude of the
P70 component tends to increase with increasing width of the musical
intervals (i.e., as interval ratio increases from the minor 2nd, the
smallest interval, to the octave, the largest interval), as illustrated
in Fig. 17A, left. Previous
physiological studies in monkey A1 have shown that the amplitude of
intracortical AEP components increases with increasing frequency
separation between the harmonics of a complex tone spectrally centered
at the BF, consistent with a manifestation of critical band masking phenomena (Fishman et al. 2000b
). Because the number of
pairs of resolved harmonics in the musical chords increases with
interval width (Fig. 17A, right), we hypothesized
that the enhancement in P70 amplitude with increases in
interval width may reflect similar critical band masking effects. In
support of this hypothesis, P70 amplitudes in both Heschl's gyrus and
the planum temporale are correlated with the number of pairs of
spectrally resolved harmonics in the chords (Fig. 17B).
Linear regression lines superimposed on the scatter-plots emphasize
this relationship. Spearman correlation coefficients for AEPs recorded
at each of the three Heschl's gyrus and three planum temporale
electrode contacts are indicated in the scatter-plot insets.
Coefficients >0.83 are statistically significant (n = 6, P < 0.05). Amplitudes of chord-evoked responses in
A1 of the monkey are unrelated to interval width (data not shown). A
possible explanation for why such effects in monkey A1 are not observed
in the present study is considered in the DISCUSSION.
|
The relative perceived consonance/dissonance of musical intervals tuned according to the Pythagorean or pure fifth tuning system does not differ substantially from that of intervals tuned according to the equal temperament system, which divides the octave into 12 equal semitones. Accordingly, patterns of oscillatory activity evoked by equal temperament intervals were similar to those evoked by Pythagorean intervals. Figure 18B shows AEPs evoked by chords played an octave below middle C on an electronic keyboard (tuned in equal temperament) at three sites within Heschl's gyrus of subject 2. AEPs evoked by the minor and major second display oscillations that are mani