Abstract

Compounding is the dominant morphological type in modern Chinese words; however, its brain mechanisms remain unspecified. Here, we aim to address this issue by manipulating three common morphological structures in Chinese disyllabic words in an fMRI study: parallel, biased, and monomorphemic. Behavioral analyses show no significant difference in reaction times and error rates among these three conditions. No difference in neural activation was observed in direct contrasts among these conditions in univariate contrast analyses. A support vector machine categorization analysis reveals that the left inferior frontal gyrus (LIFG) is the only region in the frontotemporal network that can differentiate the parallel from the biased disyllabic words in neural activation patterns. This finding indicates that the LIFG is the core region responsible for morphological representation universally across different language modalities and morphological structures.

1. Introduction

Morphology is a linguistic branch that studies the internal structures and syntax of morphemes. Morphemes are the minimal meaning-bearing linguistic elements which convey semantic and syntactic cues of written or spoken words. There are three major morphological structures across different languages: inflectional, derivational, and compound. Previous research predominately focused on inflectional and derivational morphology which is prevalent in Indo-European languages. For isolating languages like Chinese, however, compounding is the predominant morphological structure, and this remains relatively neglected in the literature. To fill this gap, here we aim to shed light on the brain mechanisms of compound morphological processing in Chinese written word recognition.

Inflectional morphology is composed of one stem and one or more inflectional affixes, such as regular past tense (e.g., “jumped = jump + ed”) and regular noun plurals (“dogs = dog + s”). Connectionist models claim that regular inflected forms are not decomposable and rather are processed as overlapping whole forms [1, 2]. Other researchers argue that stem morphemes (e.g., “jump” and “dog”) are processed differently from regular inflectional morphemes (”-ed” and “-s”), since morphophonological parsing of the complex word form is needed in order to access the phonological and semantic properties from the stem only [3]. Morphological decomposition is underpinned by a neural network connecting the left inferior frontal regions with left posterior superior and middle temporal regions via the arcuate fasciculus, as greater activation was observed in these regions in processing regularly inflected words than irregular words [4]. Better behavioral scores in processing regular past tense significantly correlate with higher grey matter density in the left frontotemporal cortex, particularly the left inferior frontal gyrus (LIFG) in brain-damaged patients [5]. Morphophonological parsing is early and automatic on all possible word forms including derivational complex morphemes (e.g., happiness = happy + ness, builder = build + er) and pseudoderived words (e.g., corner = corn + er; “corner” actually has no morphological affix of “er”) [6, 7].

Different from Indo-European languages, Chinese has almost no inflectional or derivational morphology, and over 70% of all Chinese words are compounds with two or more constituent morphemes [8]. Chinese morphemes can stand alone as monomorphemic words. Syllables are the phonological forms of Chinese morphemes. There are more than 5000 morphemes and around 1300 syllables (with four tones taken into account) in Chinese, so every syllable corresponds to about four different morphemes on average [9]. The orthographic forms of Chinese morphemes, i.e., characters, can differentiate different homophonic morphemes. Word meaning is not a simple combination of meanings of constituent morphemes; rather, it is the result of interactions between them. Previous behavioural studies have demonstrated that morphological parsing in compound words is early and automatic, and the activated morphological information facilitates the recognition process of Chinese words [1013]. However, the neural mechanisms of compound morphology remain underspecified since this has rarely been investigated in the research literature.

In the present study, we aim to shed light on this issue by manipulating three typical morphological structures in disyllabic Chinese words, i.e., parallel, biased, and monomorphemic compounds. In each parallel compound, the two constituent morphemes contribute to the meaning of the whole-word equally, while the meaning of each biased compound mainly comes from the second morpheme with the first morpheme as a modifier. We also include a group of monomorphemic words as a baseline condition. With comparisons among these three conditions, we aim to reveal the neural network engaged in processing or representing different morphological structures and also investigate whether compound morphological parsing is underpinned by the same left frontotemporal neural network for processing inflectional and derivational morphology as shown in previous Indo-European language research. Given the automatic feature of morphological parsing, we hypothesize to observe a weak or even null effect of morphological processing in our canonical neuroimaging analysis. To address this potential issue, we plan to adopt a machine learning approach, i.e., support vector machine (SVM) categorization analysis, to further explore the neural basis of morphological parsing, and would expect to find that neural activation patterns in the left frontotemporal language network, particularly the LIFG, might function to differentiate different morphological structures.

2. Materials and Methods

2.1. Participants

Twenty young healthy adults (20–36 years, mean age = 24; 10 males) took part in this study. All were right-handed (Edinburgh Handedness Inventory, Oldfield [14]) undergraduates or postgraduates in Tongji University and native speakers of Chinese. All participants’ vision was normal or corrected to normal. None of the participants had major medical conditions (e.g., heart disease, stroke), psychological or neurological disorders, or were taking medicine which might affect the brain function or neural activity [15]. All participants gave consent and were compensated for their time. This study was approved by the Ethics committee, Department of Medicine and Life Sciences, Tongji University.

2.2. Stimuli

To understand the neural mechanisms of morphological representation, we manipulated three types of morphological structures in common Chinese real words in three conditions: parallel bimorphemic (PB), biased bimorphemic (BB), and monomorphemic (MM). The meaning of two constituent morphemes contributes equally to the whole meaning of each PB word, for example, “父母” (parents) is a combination of first morphem“父”(father) and the second morpheme母(mother). In contrast, the meaning of each BB word originates mainly from the second morpheme (i.e., word head), for example, “红豆” (red bean) emphasizes the bean), (豆) while red (红) is only a certain feature. Each MM word also consisted of two characters but only a single morpheme, e.g., “坦克” (tank) cannot be divided into two morphemes “坦” and “克” grammatically. There are 88 words in each condition with word frequency and stroke matched in between them (Table 1). We also chose 132 meaningless nonwords as experimental fillers and 60 nonlinguistic symbols “####” as visual fixation controls.

All stimuli in each type were divided equally into four parts by word frequency and number of strokes and then allocated to each experimental run, respectively. As a consequence, there are four runs in this fMRI experiment, with each run composed of 22 PB words, 22 BB words, 22 MM words, 33 nonwords, and 15 nonlinguistic symbols. Each stimulus was displayed in the center of the screen for 1000 ms, followed by a short period of blank screen (see Figure 1 for the illustration of experimental procedures). Participants were instructed to press either the left button for each meaningful word or the right button for each meaningless nonword and symbol. Participants were instructed to respond as quickly and accurately as possible. They were instructed to practice for short time to get familiar with the procedure before going into the scanner. Response time (RT) was recorded and calculated from the start of each stimulus to the press of a button. The trials were randomized in display order and jittered with inter-trial-intervals (ITI) varying from 2 to 6 s, (M = 3.2 s) using the Optseq2 program [16]. Four display orders of these four experimental runs were created using Latin square randomization, and each participant was allocated to receive one display order randomly. All stimuli were displayed using the software E-Prime (https://pstnet.com/products/e-prime/), and the total duration of each run is 6 minutes.

2.3. MRI Acquisition and Statistical Analysis

All participants were scanned in a 3.0 Tesla GEMR 750 whole-body human scanner (General Electric, Milwaukee Wisconsin, USA) with an eight-channel head coil at Tongji University. We chose a gradient-echo EPI sequence to collect functional scans, each of which consisted of 40 contiguous oblique axial slices with no gap between adjacent slices, voxel size = 3 × 3 × 3 mm, field of view (FOV) = 19.2 × 19.2 cm, time of repetition (TR) = 2 s, echo time (TE) = 23 ms, and flip angle = 77°. The acquisition of slices in each scan was interleaved and parallel to the AC-PC line. There were 248 brain volumes in each functional run that last for 8 minutes and 16 seconds. We also collected T1-weighted structural images using a 3D fSPGR pulse sequence for anatomical localization with 162 contiguous slices, voxel size = 1 × 1 × 1 mm, FOV = 25.6 cm2, TR = 7.64 s, TE = 2.94 ms, and flip angle = 12°.

We preformed preprocessing and statistical analysis on the collected functional and structure images in SPM12 (Wellcome Institute of Cognitive Neurology, London, UK. http://www.fil.ion.ucl.ac.uk), under MATLAB (Mathworks Inc., Natick, MA, USA). Three lead-in EPI scans were removed in each run, and the remaining images were realigned to the first image to correct for head motion, followed by slice timing correction. T1 structural images were coregistered to the mean images of all functional images and then segmented into grey matter, white matter, and cerebrospinal fluid (CSF). All images were normalized to a standard Montreal Neurological Institute (MNI) template, using a cutoff of 25 mm for the discrete cosine transform functions. We performed further statistical analyses using the general linear model, with an 8 mm full-width half-maximal (FWHM) Gaussian smoothing kernel.

In the fixed-effect analysis for each participant, all the experimental stimuli were modeled in six independent events: PB, BB, MM, nonwords, visual fixation, and errors. Trials in the error event consisted of both trials with incorrect responses and those with RT over 3000 ms. Trials in the error event accounted for 4.7% of all trials. A canonical hemodynamic response function (HRF) was used to model each trial. The onset of each trial was calculated and inputted into the model with duration = 0, in order to flexibly detect the peak activation for each trial. The data for each run were first analyzed and then averaged across the four runs for each participant, and the activation maps for each contrast (e.g., PB minus null events, BB minus null events, and MM minus null events) in each participant were input into random effects analysis at the group level. Significant activations were reported at , voxel-level uncorrected, and , corrected at cluster level for multiple comparisons. Coordinates of all peaks of significant clusters in this study were in MNI space. Accurate brain regions of activations were identified using the Brodmann templates and AAL Atlas [17] as implemented in MRIcron (http://www.MRicro.com/MRicron) and verbally delineated in Section 3.

2.4. Regions of Interest (ROIs)

To perform machine learning analysis, we defined four ROIs based on significant activation clusters in a major experimental contrast of real words minus null events. Neural activity within each ROI was extracted using Marsbar (region of interest toolbox for SPM) for each contrast of interest and each participant. Voxel activation values served as the input features to the Support Vector Machine classifier. In the present study, the dimension of the feature vector r was much larger than the number of training samples, N. Therefore, dimensionality reduction was necessary to project samples into a low-dimensional space, which also reduced the computational complexity of the classifier.

2.5. Support Vector Machine

Support Vector Machine (SVM) is a machine learning method proposed and developed on the basis of statistical learning theory. It has many unique advantages in nonlinear, small sample, and high-dimensional pattern recognition, so it is widely used in solving machine learning problems. For data that are linearly indivisible in N dimensions, spaces above N + 1 have a greater chance of becoming linearly separable. Therefore, we can map linearly indivisible data to a linearly separable new space and make predictions in the new space with the hard interval SVM or the soft interval SVM. In this way, we change the original problem of differentiating between neural activation patterns for the different stimulus classes to one where the patterns for each class can be linearly separable in the new space.

In our current work, we use SVM for binomial classification. The basic model is defined as a linear classifier with the largest interval in the feature space, and its learning strategy is to maximize the interval, which can eventually transform the problem into the solution of a convex quadratic programming problem. The ultimate goal of this analysis is to try to distinguish the difference in the activation patterns in the regions of interest of 20 subjects in two conditions, which is essentially a binary classification problem that the method of SVM can well meet for the purpose of this experiment.

2.6. Classifier Performance

We evaluated the performance of the classifier using cross-validation. For each cross-validation run, 18 participants were chosen to train the classifier and the two remaining participants were used for testing. This procedure was repeated 190 times, with all possible combinations of two subjects considered in testing across the 190 cross-validation runs. The classifier accuracy was measured by the proportion of observations correctly classified.

3. Results

3.1. Behavioral Results

Response times and accuracy on all trials were recorded and averaged for each experimental condition (PB words: mean RT = 758 ms, error rate = 3.7%; BB words: RT = 739 ms, error rate = 2.8%; MM words: RT = 754 ms, error rate = 3.5%). We performed an ANOVA (Analysis of Variance) on RTs of correct trials among these three conditions, but found no significant difference (F = 0.12, ). No further analysis was performed on the error trials since the error rate in each condition was very low (all < 5%).

3.2. Imaging Results

The first step in the neuroimaging analysis was to test whether the task produced activations in those cortices was typically engaged in written word recognition. We addressed this issue by comparing all words against the fixation baseline. As shown in Figure 2 and Table 2, Chinese word recognition produced greater activation than fixation primarily in the left inferior frontal gyrus (LIFG), bilateral lateral occipital cortex (LOC), and supplementary motor area (SMA). This is a typical neural network for written word processing which has been widely observed in the previous studies [1820].

To explore the neural substrates of morphological representation, we performed a one-way ANOVA with three morphological conditions as input levels: PB words minus null events, BB words minus null events, and MM words minus null events. No significant difference was found among these three conditions.

3.3. SVM Results

The null effects of morphological processing in the above univariate analyses indicate that PB, BB, and MM words might activate the left frontotemporal network to the same amplitude level. To test whether the neural activation patterns are the same across these three conditions, we performed SVM binary classification analysis which is sensitive to differences in pattern-information rather than activation magnitude (see Figure 3 for the illustration of the analysis steps). In a whole-brain analysis, classification accuracy for three contrasts (PB words–fixation vs. BB words–fixation; PB words–fixation vs. MM words–fixation; BB words–fixation vs. MM words–fixation) did not differ significantly from chance levels (mean < 52%, significance test ).

The whole-brain analysis includes all voxels in the brain, which might reduce detection sensitivity of SVM since some brain regions included might not be involved in morphological processing. To solve this problem, we chose the four significant clusters from the canonical contrast of words minus fixation as regions of interest (ROIs) : LIFG (BA47), left and right LOC (BA18/19), and SMA (BA6) and performed the SVM analysis in each ROI (Table 2). In the LIFG ROI, the accuracy of classification for the PB and BB words is 75.8%, which is above chance (). However, the LIFG cannot distinguish PB or BB words from MM words (both accuracy < 70%). None of the other three ROIs (SMA and left and right LOC) could distinguish between any of these three conditions (all accuracy < 70%) (Table 3).

4. Discussion

In this study, we manipulated three morphological structures in Chinese disyllabic words to explore the neural mechanisms of compounding morphology. We did not observe significant differences among these three conditions in canonical neuroimaging analyses but found that LIFG can differentiate the parallel from the biased morphological structures in an SVM analysis. This finding is in line with previous studies in that morphological parsing or representation–irrespective of whether it is inflectional, derivational, or compound–is supported by a left frontotemporal network [3, 5, 6]. The commonly activated LIFG across this study and many others indicates that this region might be the core location for morphological processing universally across different morphological structures and different language modalities. In contrast, other activated regions, such as the LOC and SMA, cannot differentiate different morphological structures. The LIFG has been widely reported in many different levels of Chinese language representation, such as phonological, semantic, syntactic, and morphological processes. A relevant study on Chinese word recognition found that the morpheme-word incongruency effect was weaker in left IFG in Chinese dyslexia [21]. LIFG might be engaged in detecting and encoding morphological information of Chinese words and also constitute and parse the mental structures of various constituent morphemes.

Compounding is a special morphological structure that combines two morphemes directly together without explicit changes in word form; therefore, decomposition of compound words cannot rely on word form (i.e., affixes) as in inflectional and derivational words but more likely depends on the meaning of each constituent morpheme. The relatively implicit morphological structure (without explicit form changes) might explain, in part, the null effect of contrasts between different morphological structures in canonical fMRI analyses. Another possible explanation is that we used a lexical decision task in this study rather than a more explicit morphological priming paradigm as used in the previous behavioral research. No difference was found between disyllabic compounds and monomorphemic words, which could be interpreted partly by the explicit boundaries of constituent Chinese characters. From the decomposition point of view, the two constituent characters in a monomorphemic word might be processed separately and then combined together as a single morpheme, in a process that is very similar to that found for disyllabic compounds.

In contrast to the decomposition hypothesis on compound word processing, there was another account in support of representation of compounds as whole-word units [12, 22] as there were no direct links between words sharing the same morphemes at the lexical level. According to the economic rule of cognitive processing, the affix in inflectional and derivational words provides some regularity, for example, “-ness” is an index of nouns, so it might be more efficient to segment the word into stem + affix than create another new noun. However, there is no such regularity in Chinese compounds, so segmentation of constituent morphemes is unlikely efficient and necessary. Our experimental findings could also be interpreted under this framework in that both disyllabic compounds and monomorphemic words are processed as whole-word units without early segmentation, so there was no behavioral and neural activation difference between these two conditions.

Data Availability

All fMRI and behavioral data, together with relevant analysis scripts and files, are available upon request from the corresponding author (e-mail: [email protected]).

Disclosure

This manuscript has not been published elsewhere nor is it currently under consideration for publication elsewhere.

Conflicts of Interest

The authors have no potential conflicts of interest regarding the publication of this study.

Acknowledgments

This study was funded by the Program to Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. TP2018056), the Science and Technology Commission of Shanghai Municipality under Grant 18ZR1442700, and the China Electronics Technology Group Corporation (CETC). All authors have reviewed the contents of the manuscript, approved its contents, and validated the accuracy of the data. We thank Dr. Barry Devereux at Queen's University Belfast for his valuable comments on our manuscript.