Objects. We investigated the role of the fusiform cortex in music processing with the use of PET, focusing on the perception of sound richness. Method. Musically naïve subjects listened to familiar melodies with three kinds of accompaniments: (i) an accompaniment composed of only three basic chords (chord condition), (ii) a simple accompaniment typically used in traditional music text books in elementary school (simple condition), and (iii) an accompaniment with rich and flowery sounds composed by a professional composer (complex condition). Using a PET subtraction technique, we studied changes in regional cerebral blood flow (rCBF) in simple minus chord, complex minus simple, and complex minus chord conditions. Results. The simple minus chord, complex minus simple, and complex minus chord conditions regularly showed increases in rCBF at the posterior portion of the inferior temporal gyrus, including the LOC and fusiform gyrus. Conclusions. We may conclude that certain association cortices such as the LOC and the fusiform cortex may represent centers of multisensory integration, with foreground and background segregation occurring at the LOC level and the recognition of richness and floweriness of stimuli occurring in the fusiform cortex, both in terms of vision and audition.

1. Introduction

Historically, the style of music has developed from simple to complex. Such development was typically classified as a change from monophony, that is, music for a single voice or part [1], to polyphony, in which two or more strands sound simultaneously, or to homophony in which there was a clear distinction between melody and accompanying harmony [1]. In music with a monophonic style, only the melody is produced and there is no accompaniment. In homophony, to which most nursery and folk songs of western music belong, music consists of melody and its accompaniment. As music with homophonic or polyphonic styles has developed, harmonies have become more complex. For example, music of Mozart or Haydn in the 18th century rarely utilized dissonant chords, while the 20th century music of Ravel or Debussy had several kinds of chords including dissonant ones. Listening to homophonic music is different from listening to monophonic music according to the following. First, with homophonic music, listeners discriminate melody and its accompaniment. Even if the melody and the accompaniment are played by the same instrument (i.e., with the identical timbre), we can easily and instantaneously perceive the melody and the accompaniment. The neural basis of this is still unknown, but we previously reported in a positron emission tomography (PET) activation study that the lateral occipital complex (LOC), which participates in foreground and background segregation in vision, plays an important role in the discrimination between melody and its accompaniment [2]. The melody and the accompaniment could be regarded, in auditory terms, as the foreground and background, respectively. We suggested that the same neural substrates carried out similar functions beyond the simple discrimination of sensory modalities. Second, the sounds of homophonic music could be richer than monophonic music. The quality of sound is generally called “timbre.” The timbre is operationally defined as the attribute that distinguishes sounds of equal pitch, loudness, location, and duration [3]. The term “timbre” not only relates to the individual musical instrument, but also relates to expressing the characteristics of the sound of musical pieces. For example, it is generally considered that the timbre of impressionist music of Ravel or Debussy is richer and more flowery than the classical music of Mozart or Haydn. In the above-mentioned PET study, the melody with accompaniment also activated the fusiform cortex (in addition to the LOC) compared to the melody without the accompaniment [2]. We interpreted the activation of the fusiform cortex to reflect the rich sound from the accompaniment, but much still remains to be done to identify the role of that area in listening to music.

Over the past few decades, a considerable number of PET activation studies have been made on various aspects of music, sounds, and the brain [46], not only in healthy subjects [4, 6] but also in patients with tinnitus [5]. Based on our previous researches, we performed another PET study that investigated brain region activity while subjects listened to melodies with various kinds of accompaniments. Musically naïve subjects listened to melodies of familiar nursery songs with various degrees of sound richness of the accompaniment. According to a visual analogue scale (VAS), for each piece of music we also ascertained to what extent the subjects felt the sound was rich. Using a PET subtraction technique, brain regions that were significantly activated by sound richness were identified.

2. Subjects and Methods

2.1. Subjects

Ten right-handed male volunteers (mean age years; range 20–24) participated in the study. All were students at the Schools of Engineering or Mining, Akita University, and met criteria for Grison’s second level of musical culture [7]. None had received any formal or private musical education, and none had any signs or history of neurological, cardiovascular, or psychiatric disease. All subjects gave written informed consent after the purpose and procedure of the examination had been fully explained. The study was approved by the Ethics Committee of the Research Institute for Brain and Blood Vessels, Akita, Japan, and all experiments were conducted in accordance with the Declaration of Helsinki.

2.2. Task Procedures

The stimuli in this experiment were six melodies of well-known Japanese nursery songs. All subjects were very familiar with these melodies. For each melody, the following three kinds of accompaniment were composed: (i) an accompaniment composed by using only three basic chords (tonic, dominant, and subdominant chord), one of which was set on each bar (chord condition), (ii) a simple accompaniment that is typically used in the traditional music text books in Japanese elementary schools (simple condition), and (iii) an accompaniment with rich and flowery sounds composed by a professional composer [8] (complex condition). The (i) chord and (ii) simple condition accompaniments were composed by one of the authors (Masayuki Satoh). The accompaniment of simple condition consisted of quarter tones of a chord on a whole note of fundamental tone. The first beat of each cord in the bar was rest, so only the fundamental tone was played at the first beat. All musical stimuli were played using the “FINALE” software [9]. The author Masayuki Satoh wrote musical scores of musical pieces used in this experiment on the “FINALE,” and the software played each piece with piano timbre. Each performance was recorded on a compact disc. Melodies with the three types of accompaniments were randomly presented. Subjects were instructed to listen to each melody, and PET measurements were obtained while listening to these melodies (procedures described below). Subjects were required to make a sign with the index finger of the right hand as the melody of each song finished. All stimuli were presented binaurally via inset stereo earphones.

The instruction to the subjects was as follows: Close your eyes. You will listen to a melody of a familiar nursery song. If you feel that the melody has finished, please make a sign with the index finger of your right hand.

2.3. Positron Emission Tomography Measurements

The protocol used in this study has been previously described in detail [2, 1012]. Briefly, PET data were acquired in 3D acquisition mode using Headtome V (Shimadzu, Kyoto, Japan). Scans were performed in a darkened room with subjects lying supine with eyes closed. Nine CBF measurements were determined for each subject, three during the chord, three during the simple, and three during the complex condition. Employing 15O-labeled water () intravenous bolus technique [13], emission data were collected for 90 seconds for each measurement following intravenous bolus injection of about 15 mL (40 mCi) . A musical piece was initiated 15 seconds prior to data acquisition, followed by another musical piece, and this in total continued for about 120 seconds. Emission data were corrected for attenuation by acquiring 10 minutes of transmission data utilizing 68Ge orbiting rod source performed prior to the activation scans. A wash-out period of approximately 10 minutes was allowed between successive scans. For anatomic reference, all subjects underwent axial T1-weighted imaging (T1WI) and T2-weighted imaging (T2WI) using a 1.5 T magnetic resonance system (Vision, Siemens, Germany). T1WI (TR/TE = 665/14 ms) and T2WI (TR/TE = 3600/96 ms) were obtained using a slice thickness of 5 mm with an interslice gap of 1 mm.

2.4. Data Analysis

PET data analysis was performed on a SGI Indy running IRIX 6.5 (Silicon Graphics, California), using an automated PET activation analysis package [14] composed of six main processing stages which has been previously described in detail [2, 1012]. The six main stages consisted of intrasubject coregistration, intrasubject normalization, automatic detection of the AC-PC line, detection of multiple stretching points and surface landmarks on intrasubject averaged image sets, intersubject summation and statistical analyses, and superimposition of statistical results onto the stereotactic MRI. Deformation of individual brains to correspond with the standard atlas brain was achieved by spatially matching individual landmarks to the corresponding predefined standard surface landmarks and minimizing correlation coefficients of regional profile curves between the stretching centers. Activation foci were considered to be significantly activated if the corresponding value was less than a predetermined threshold (, Bonferroni correction for multiple comparisons). Anatomical identification of activation foci was achieved by referring the stereotactic coordinates of the peak activated pixels to the standard Talairach brain atlas [15].

2.5. Visual Analogue Scale (VAS) of Sound Richness

After the PET measurement, the degree of sound richness of each melody with the three types of accompaniments was investigated in each subject. In a quiet room, each subject listened to the stimuli and was required to subjectively mark the VAS (Figure 1) according to the degree of sound richness the subject felt. Three colors (yellow, blue, and red) were used because the lyrics of some songs had a relationship with a specific color, for example, the sea related to blue and the sunset to red. Subjects marked to the right to the degree that they felt that the sound of the music was rich. We measured the distance from the left end to the marked position (mm) and, using the Wilcoxon signed rank test, statistically compared the distance between the three kinds of accompaniments, namely, chord, simple, and complex condition.

3. Results

Regarding the VAS of sound richness, the mean distance from the left end was significantly longer as the accompaniment became more complex (Figure 2): chord condition ; simple condition ; complex condition  mm (mean ± standard deviation (sd)). We can reasonably conclude that, as expected, the more complex the accompaniment became, the richer the subjects reported the sound.

The results of subtractions providing significant regions activated as the sound became more complex are given in Tables 1, 2, and 3 and Figures 3, 4, and 5. The regions activated during the simple condition but not during the chord condition are listed in Table 1 together with stereotactic coordinates based on the brain atlas of Talairach and Tournoux [15]. These results show areas of relative blood flow changes that emphasize differences between the two conditions and minimize areas that are common to both conditions. Significant increases in relative cortical blood flow were found in the posterior portion of the left inferior temporal gyrus, bilateral fusiform gyri, the medial surface of the bilateral frontal lobes, the right superior parietal lobule, and the left orbital frontal cortex (Table 1, Figure 3). Compared to the chord condition, the complex condition produced significant activation at the posterior portion of the left inferior temporal gyrus, left fusiform gyrus, right medial surface of the occipital lobe, the lateral surface of the left occipital lobe, and the anterior portion of the left middle temporal gyrus (Table 2, Figure 4). Between the complex and simple condition, the former condition significantly activated the posterior portion of the left inferior temporal gyrus, the left fusiform gurus, the left retrosplenial region, the anterior portion of the right middle temporal gyrus, the right cingulate gyrus, and the bilateral cerebellum (Table 3, Figure 5). The important point to note is that the activation of the posterior portion of the inferior temporal gyrus and the fusiform gyrus was observed in all results after every subtraction, that is, simple minus chord, complex minus chord, and complex minus simple condition. The opposite subtraction of chord minus simple, chord minus complex, and simple minus complex conditions revealed almost the same activation pattern. The activation was observed at the bilateral orbital frontal cortex, the bilateral or left superior frontal gyrus, and the right superior temporal gyrus (Tables 46, Figures 68).

4. Discussion

The findings of this experiment are summarized as follows: as an accompaniment became more complex, (i) the subjects felt that the sound of music was richer and (ii) the fusiform cortex and the posterior portion of the inferior temporal gyrus were activated. In the following paragraphs, we discuss the functional significance of these activated brain regions.

The fusiform cortex might participate in the perception of sound richness. The present study showed that, as the sound became richer, the activation of the fusiform cortex increased. This finding revealed that the degree of the activation of the fusiform cortex was different depending on the degree of the sound richness of the accompaniment in the identical melodies. It is generally accepted that the fusiform cortex processes color recognition, based on the results of a case [16] and a PET activation study [17]. The findings of the present study and previous reports suggest that color information in vision and sound richness in audition might be similarly registered in the brain. In other words, it is possible that similar information from different sensory modalities might be processed within the same brain region and that the visual association cortex might not only be involved in visual processing. Recent studies have revealed that some sensory modalities are related to each other. This phenomenon is called “cross-modal integration” and was observed between taste and audition [18], taste and smell [1922], taste and color [23], odor and color [24], taste and music [25], pitch and visual size [26, 27], brightness and frequency of vibrotactile stimuli [28], sound and color [29, 30], and vision and audition [31]. It was reported that cross-modal associations are ubiquitously present in normal mental function [25, 32, 33]. Recent research suggests that cortical auditory processing is divided into separate processing streams [31, 34]. Posterior temporoparietal regions, labeled the “where” or “how” stream, may be specialized for processing sound motion and location [31]. Regions anterior and ventral to primary auditory cortex, labeled the “what” stream, may be specialized for processing characteristic auditory features [31]. Neurons in “what” stream respond directly to auditory and visual sensory stimuli and are important for forming the association between auditory and visual objects [31]. Therefore, we may conclude that cross-modal integration also occurs at the fusiform cortex between color and sound richness when listening to music.

In the present study, the posterior portion of the inferior temporal gyrus was also activated. This area is called the lateral occipital complex (LOC) and is known to participate in foreground and background segregation in vision [35]. It was suggested that the LOC also participates in the discrimination between melody and its accompaniment [2]. In our previous study, we considered that the LOC might play a similar role of foreground and background segregation in both vision and audition. This finding reinforced the hypothesis that some association cortices carry out a similar function beyond the differences in sensory modalities (Figure 9). After the perception of sounds at the auditory cortex level, the information might be sent to the LOC and fusiform cortex. The former and the latter might participate in the foreground and background segregation and the recognition of sound richness, respectively, both in vision and audition.

The opposite subtraction, namely, chord minus simple, chord minus complex, and simple minus complex condition, all produced an activation of the bilateral orbital frontal cortex. The functional significance of this region in this experiment is unclear. However, this region is known as a structure within Yakovlev’s circuit that participates in emotion and memory. Damage to this region often results in disinhibition, impairment in control over impulsive behavior based on instinct and emotion. It is possible that activation of the orbital frontal cortex was caused by the comfortable and pleasant feeling of listening to familiar nursery songs or by inhibiting the desire to sing along with these familiar melodies.

In summary, the fusiform cortex and the LOC might have a similar function in vision and audition. The fusiform cortex recognizes color and sound richness, and the LOC participates in foreground and background segregation. We may conclude that the association cortices might play a similar role across multiple sensory modalities. Further studies are needed to clarify the multimodal integration of association cortices.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Supplementary Materials

Examples of auditory stimuli of chord, simple, and complex condition of Japanese nursery song “Scene of Winter”.

  1. Supplementary Material