Abstract

Visual stimuli are known to activate the auditory cortex of deaf people, presenting evidence of cross-modal plasticity. However, the mechanisms underlying such plasticity are poorly understood. In this functional MRI study, we presented two types of visual stimuli, language stimuli (words, sign language, and lip-reading) and a general stimulus (checkerboard) to investigate neural reorganization in the superior temporal cortex (STC) of deaf subjects and hearing controls. We found that only in the deaf subjects, all visual stimuli activated the STC. The cross-modal activation induced by the checkerboard was mainly due to a sensory component via a feed-forward pathway from the thalamus and primary visual cortex, positively correlated with duration of deafness, indicating a consequence of pure sensory deprivation. In contrast, the STC activity evoked by language stimuli was functionally connected to both the visual cortex and the frontotemporal areas, which were highly correlated with the learning of sign language, suggesting a strong language component via a possible feedback modulation. While the sensory component exhibited specificity to features of a visual stimulus (e.g., selective to the form of words, bodies, or faces) and the language (semantic) component appeared to recruit a common frontotemporal neural network, the two components converged to the STC and caused plasticity with different multivoxel activity patterns. In summary, the present study showed plausible neural pathways for auditory reorganization and correlations of activations of the reorganized cortical areas with developmental factors and provided unique evidence towards the understanding of neural circuits involved in cross-modal plasticity.

1. Introduction

Cortical structures that are deprived of their normal sensory input may become responsive to the stimulation of adjacent receptors, a process that is generally known as cross-modal plasticity or cross-modal reorganization [1]. In human brain imaging studies, there is growing evidence showing that, in early bilaterally deaf adults, the superior temporal cortex (STC) may experience cross-modal recruitment of different visual inputs, such as visual motion [28], biological motion [911], sign language [1119], and silent speech reading [15, 2023]. Animal models have also confirmed the dystrophic change that occurs when the auditory cortex fails to develop typically due to the absence of auditory input [2428].

Visual-related responses in the STC of deaf subjects could result from long-term auditory deprivation (e.g., missing auditory sensory input) but could also be caused by other dynamic cognitive functions (e.g., sign language learning) [1, 12, 16, 19, 29, 30]. In the previous studies, STC activity was found to positively correlate with the duration of deafness or the age at cochlear implantation [2, 18, 3135], suggesting that functional reorganization was likely to take place in the auditory cortex over a considerable period of time. A functional magnetic resonance imaging (fMRI) study showed that STC activation was highly correlated with speech reading fluency, but not with the duration of sensory deprivation [36], indicating that functional compensation of sensory deprivation did not require slow progressive colonization of the STC by visual inputs, but instead rapidly modulated by the preexisting latent connectivity from high-level language-related cortical areas. Thus, for the reorganization of STC, potentially both bottom-up signals (e.g., from the visual cortex) and top-down modulation (e.g., from the associative frontal-temporal areas) could contribute to such cross-modal activity [30]. Meanwhile, a magnetoencephalography study showed that the left frontotemporal network, including the STG, was activated during lexicosemantic processing in the congenitally deaf individuals, but not responsive to the early sensory visual processing, suggesting a more top-down modulation from high-level language-related regions [37].

Although it is clearly known that the STC responds to various visual stimuli in deaf people, the neural mechanisms underlying this cross-modal plasticity are still not fully understood. There are questions remaining to be answered. First, how do developmental factors (e.g., the duration of deafness or the learning of sign languages) in deaf people constrain or promote the reorganized activity in the auditory cortex? Second, how do the bottom-up and top-down two neural pathways contribute to cross-modal activation? Third, does the STC integrate inputs from different pathways, or does it keep them functionally segregated?

In the present study, using fMRI, we aimed to directly compare cross-modal activity and whole-brain functional connectivity in subjects when they were viewing a general stimulus (checkerboard) representing the bottom-up input from the visual cortex and language-related stimuli (words, sign language, and lip-reading) denoting the both bottom-up from visual regions and top-down signals from associative cortical areas. Nineteen profoundly deaf (congenital) subjects, 15 residual hearing subjects with a hearing aid, and 15 hearing subjects were recruited to investigate how behavioral factors (e.g., the duration of hearing loss and age at sign language learning) affected cross-modal activity. This study also aimed to investigate possible sources of cross-modal activation by applying dynamic causal modeling (DCM) [38] and representational similarity analysis (RSA) [39]. We hypothesized that the reorganized STC activity by a checkerboard was mainly induced through a feed-forward network and that activity provoked by language-related stimuli was instigated from both feed-forward and feedback components, but relied more on the feedback regulation. Furthermore, it was considered that the STC activities responsive to the two pathways were likely to be functionally segregated.

2. Materials and Methods

2.1. Participants

Thirty-four early-deaf subjects (14 males; mean age: 20.8 years old) and 15 hearing controls (7 males; mean age: 20.3 years old) participated in the study. The deaf participants were from the Shanghai Youth Technical School for the Deaf (http://www.shlqj.net/longxiao), and their information on the history of hearing loss, hearing aid use, and sign language use was documented through an individual interview (Table 1). All participants were healthy, had a normal or corrected-to-normal vision, were not taking psychoactive medications, did not have a history of neurological or psychiatric illness, took classes at the high-school level, and had normal cognitive functions. In the residual hearing group, most participants communicated by a combination of two or three strategies, which included spoken language (13 out of 15), lip-reading (8 out of 15), and sign language (11 out of 15), while most of the profound deaf (15 out of 19) communicated only via sign language. The ethical committee at East China Normal University in China approved the experimental procedure. All participants gave their informed and written consent according to the Declaration of Helsinki and were paid for their participation. The 15 hearing subjects were recruited from East China Normal University in China and had no learning experience of sign language or lip reading. The groups were matched for age, gender, handedness, and education.

Suitable deaf participants were selected by means of hearing threshold pretests, conducted within the 2 weeks preceding the fMRI experiment. To facilitate a preliminary screening of the subjects, deaf participants self-reported their level of hearing loss on the basis of their audiologists’ diagnoses. Hearing thresholds of all the participants were then measured at the Institute of Speech and Hearing Science, Shanghai. Thresholds were assessed monaurally for both ears, either with or without a hearing aid, at 250, 500, 1000, 2000, 4000, and 8000 Hz, using steps of 5 dB. According to the International Hearing Impairment Classification Standard [40], we divided the 34 deaf participants into two groups in terms of their hearing loss level: profoundly deaf (>90 dB, ; in average, the left hearing is 106.8 ± 2.5 dB; the right hearing is 106.7 ± 2.4 dB) and residual hearing (<75 dB, ; the left hearing is 73.6 ± 5.5 dB; the right hearing is 76.1 ± 4.4 dB) (Table 1).

2.2. Visual Stimuli

Four different visual materials were presented to participants, a checkerboard pattern to act as a general visual stimulus, and three visual stimuli with language content: words, sign language, and lip-reading (Figure 1, which also see details in Supporting information (available here)). All stimuli were pseudorandomly presented using a block design (Figure 1). Within each block, only one type of stimulus was presented. Each block lasted 20 s and was followed by a 20 s interblock interval. During the 20 s visual presentation, the stimuli were played at a similar rate. During the 20 s interval, a red cross with a black background was presented at the center of the screen and participants were asked to maintain their gaze on the cross. Per subject, twenty blocks in total were included. That is, each type of stimulus was repeated for five times. The blocks were separated into three sessions (6 or 7 blocks per session), with a 5 min intersession interval for rest.

Checkerboard stimuli were presented at 1280 × 1024 pixels. Each image was presented for 1 s. Word stimuli were composed of 80 Chinese characters (monosyllable) chosen from the List of Frequently Used Characters in Modern Chinese written by the State Language Commission of China. Each character was written in white on a black background and presented as a stimulus for 1 s using a font and size of SimSun 36. For sign language stimuli, five sentences were chosen and expressed by a female presenter moving her hands and arms without facial expression. The video was presented at a resolution of 1024 × 768 pixels, and each sentence lasting 10 seconds was repeated twice within the same block (20 seconds). The presenter’s face in the video was masked to avoid potential interference from the lip-reading. For lip-reading stimuli, consecutive frames of a feminine face pronouncing disyllable Chinese words were presented at a moderate speed. The disyllable words were chosen from the Lexicon of Common Words in Contemporary Chinese by the Commercial Press. Both sign language and lip-reading stimuli were displayed at a rate similar to that used for the word and checkerboard stimuli (~1 Hz). The questionnaire data after scanning showed that all the participants were able to view the stimuli clearly and understand the content of each stimulus (Supporting information).

2.3. Experiment Procedure

The fMRI experiment was approved by the Shanghai Key Laboratory of Magnetic Resonance at East China Normal University. Before scanning, the experimental paradigm and scanning procedures were introduced to the deaf participants through a professional signer. They were asked to stay focused on stimuli and were told that they would be asked questions later after the scan to ensure that attention had been paid to the stimuli. Visual stimuli were displayed on a half-transparent screen hung around 285 cm away from the participant’s eyes and displayed via a LCD projector (Epson ELP-7200L, Tokyo, Japan). The participant viewed the screen through a mirror. The participant’s right hand was placed on a button box connected to a computer so that the participant was able to press a button as a sign that he/she wished to withdraw at any stage of the experiment or scan, without having to give a reason.

After scanning, all participants were asked to complete a feedback questionnaire about the content of the experiment and their subjective experiences, to ensure that they were paying attention during the experimental sessions. They were also asked to give ratings on a 3-point scale (3  =  all stimuli were very clear, 1  =  all stimuli were not clear) to ensure both the clarity of visual stimuli presented and their full engagement in the experiment. Additionally, participants had to describe what they had just seen between trials, the frequency of checkerboard flashing, and the meaning of simple words, sign language, and lip-reading sentences used during the experiment. We did not intend to control the complexity of the language stimuli. The rating scores from stimulus categories did not significantly differ from each other (one-way ANOVA, ).

2.4. Data Acquisition

The fMRI was performed on a 3-T TimTrio (Siemens, Erlangen, Germany) scanner. During scanning, the participant’s head was immobilized using a tight but comfortable foam padding. To avoid nonspecific activation, the participant was asked not to make any sort of response or read aloud during the scan. When presented with visual stimuli, the participant was required to concentrate on the presentation but was not required to perform any mental task or physical operation. Ear defenders were used for all residual and hearing participants throughout the whole procedure. Each participant underwent a T1-weighted structural MR scan (3-D FLASH), with 1 mm-thick slices, a repetition time (TR) of 1900 ms, an echo time (TE) of 3.42 ms, a flip angle of 9°, and a field of view (FOV) of 240 × 240 mm. FMRI was performed using echo planar imaging (EPI) sequences with the following parameters: 32 axial slices acquired in an interleaved order; TR, 2000 ms; TE, 30 ms; voxel size, 3.75 × 3.75 × 3.75 mm; flip angle, 70°; and FOV, 240 × 240 mm. A total of 147 sessions (78,400 volumes) were collected from 49 participants.

2.5. Preprocessing

The first two volumes of each run were discarded to allow for T1 equilibration effects. Data were analyzed using SPM8 (Wellcome Trust Centre for Neuroimaging, London, UK) running within Matlab 7.10 (Mathworks Inc., Natick, MA, USA). The image data preprocessing followed standard SPM8 preprocessing procedures and included slice timing correction, realignment for the correction of motion artifacts, coregistration to the participant’s structural T1 image, normalization to the Montreal Neurological Institute (MNI) template, and smoothing with a Gaussian kernel of full width at half maximum. No participants were discarded from the analysis. The head movements were less than 3.75 mm.

2.6. Cross-Modal Activation Analysis

A first-level analysis approach was adopted for the block-design fMRI data using SPM8. In this step, a general linear model encompassing the design and contrasts at the individual subject level was created. The model contained all the information on different conditions, onsets, and durations for all the scans combined across a subject. The twelve predictors included [1–4] the onsets of the four conditions (checkerboard, words, sign language, and lip-reading) in the profoundly deaf group, [5–8] the onsets of the four conditions in the residual deaf group, and [9–12] the onsets of the four conditions in the hearing group. These twelve events were modeled as delta functions convolved with the canonical hemodynamic response function and its temporal and dispersion derivatives. Head motion parameters derived from realignment were also included in the model as covariates of no interest.

The weighted sum of the parameter estimates from the individual analysis was represented as contrast images that were used for the group analysis using a random effect model. The contrast images obtained from the individual analyses represented the normalized condition-related increment of the MR signal of each subject, with the visual stimulus presentations compared with the resting baseline period (stimuli > baseline). The second-level group analysis of the three participant groups (Group: profoundly deaf, residual hearing, and hearing) in the four experimental conditions (Condition: checkerboard, words, sign language, and lip-reading) was performed using SPM. Each contrast image from the relevant condition was firstly submitted to a one-sample t-test at the group level for the whole brain to examine the cross-modal activations in the auditory cortex in individual groups. Then, to identify the differences between groups and conditions, a two-way ANOVA with two main factors: Group and Condition, was conducted for the whole brain using a general linear model. To define the regions of interests (ROIs) for following analyses, the peak voxels were selected within the STC (Brodmann areas 41 and 42) in the right hemisphere of the whole-brain map showing a significant main effect of Group (peak at ) and within language-related brain regions: the left anterior temporal cortex (ATC, peak at ) and left inferior frontal gyrus (IFG, peak at ) in the map showing the Condition effect. A spherical ROI with a 10 mm radius was then generated and centered on the peak voxel. The mean percent signal change for each participant was extracted from the first-level analysis using the Marsbar software tool (http://marsbar.sourceforge.net).

2.7. Correlation Analysis

In the residual hearing group, most participants communicated by a combination of two or three strategies, which made the analysis of their language learning experience complicated. In the profoundly deaf group, language experience of four participants was not available. Therefore, only 15 profoundly deaf participants were included in the correlation analysis. For the same reason, only the profoundly deaf group was examined to be compared with the hearing group in the functional connectivity analysis and dynamic casual modeling (descripted below). To test the hypothesis that the sign language experience would modulate cross-modal reorganization, we examined the activity in the right superior temporal cortex (STC; using the ROIs defined in the STC showing the Group effect). Spearman’s rank tests for correlations between STC activity and the duration of deafness or between STC activity and the age of learning sign language were performed.

2.8. Functional Connectivity Analysis

A functional connectivity analysis was performed to search for brain areas showing significant differences between the profoundly deaf and hearing groups, with the right STC as a seed region (the same ROI in the above analyses). Functional connectivity analyses were performed using CONN-fMRI Functional Connectivity SPM [41]. EPI images that had been preprocessed as described but had undergone no further statistical analysis were used. Connectivity strength was calculated over the visual presentation period. Before the subject-level analysis, standard preprocessing and depositing procedures using the default settings of the CONN toolbox were performed on the EPI data using the BOLD signal derived from white matter masks and cerebrospinal fluid, as well as motion correction parameters from the realignment stage of the spatial preprocessing as covariates of no interest. The data were further band-pass filtered between 0.008 and 0.09 Hz. For each subject, bivariate regression coefficients were estimated to represent the total linear temporal association between the BOLD signal of the ROIs and the rest of the brain. The subsequent analysis compared correlation strengths by a two-sample t-test (FDR, corrected) on the beta images from the group analysis to examine the differences between the profoundly deaf and the hearing groups at a whole-brain level. To identify the task specificity in each stimulus condition, a further two-sample t-test (FDR, corrected) on the beta images of differences between the groups was performed to examine the differences between the checkerboard condition and the three language conditions.

2.9. Dynamic Causal Modeling

Six different models regarding the language-related visual inputs in deaf participants were compared. These models mainly tested whether STC activations were induced by language stimuli receiving the feedback modulation from IFG and ATC and the feed-forward signal from the primary visual cortex (V1) (see Results). Each model was composed of four regions: IFG, ATC, STC, and V1. The extrinsic input (visual stimulation) always entered the system via the V1. The main differences among the models involved the connections among brain regions: specifically, (1) a model with feedback or feed-forward connections between IFG/ATC and STC, (2) a model with both feed-forward connections between V1 and STC and between V1 and ATC, and (3) a model with only feed-forward connections between V1 and STC or between V1 and ATC. The models were split into two classes of families. The first class tested if models with or without feedback (IFG/ATC to STC) were more likely to explain the data. The family with feedback from IFG/ATC to STC included models [1], [3], and [5], and the family without feedback included models [2], [4], and [6]. The second class tested if models fitted the data which explained the connections between V1 and STC, including V1 to STC (models [1] and [2]), V1 to both STC and ATC (models [3] and [4]), or V1 to only ATC (models [5] and [6]). A group analysis (, FDR corrected) of deaf participants (profoundly deaf and residual hearing groups) was conducted to investigate the voxels most significantly activated across all three language-related stimuli in areas of left V1, STC, ATC, and IFG. Specifically, the peak intensities of four regions were identified at V1 , STC , ATC , and IFG . The principle eigenvariety (time series) was extracted from the volumes of interest that centered at the coordinates of the nearest voxels within a sphere of 8 mm radius (ROI). Based on the estimated model evidence of each model, using SPM8, random effect Bayesian model selection then calculated the “exceedance probability.” When comparing model families, all models within a family were averaged using Bayesian model averaging and the exceedance probabilities were calculated for each model family.

2.10. Representational Similarity Analysis (RSA)

The analysis of neural activity within ROIs was conducted with the RSA toolbox (http://www.mrc-cbu.cam.ac.uk/methods-and-resources/toolboxes/) [39]. Both the primary visual area and STC were selected as ROIs defined anatomically by using WFU PickAtlas [42]. We compared the condition-wise patterns amongst fMRI t-maps for the four types of visual stimuli: checkerboard (nonlanguage), words, sign language, and lip-reading (language). Per subject, the representational dissimilarity matrixes (RDMs) comprised correlation distances (1 correlation coefficient) between the images from the blocks for each condition in both the profoundly deaf and the residual hearing group, which yielded a 4 × 4 matrix. The four conditions were separated into two categories: nonlanguage (checkerboard) and language (words, sign language, and lip-reading). We then compared the correlation coefficient in the three pairs between the nonlanguage and the language conditions (category C-L: checkerboard versus words, checkerboard versus sign language, and checkerboard versus lip-reading) with the three pairs within the language conditions (category L-L: words versus sign language, words versus lip-reading, and sign language versus lip-reading) for each subject. In the individual ROIs, the similarities of the two categories were tested statistically (t-test, ) in both the profoundly deaf and the residual hearing groups. As there was no plasticity in the auditory cortex with most of visual stimuli in the hearing group, the RSA analysis did not include such group of participants.

3. Results

3.1. Brain Activations in Auditory Areas in Response to Visual Stimuli

We first examined cross-modal activation in the STC of both the deaf and the hearing groups at the group level for each condition (Table 2). We found that the STC was significantly activated by all of the visual stimuli (, cluster level corrected; Figure 2(a)) in the deaf participants. The visual stimuli with language content activated the STC bilaterally, and the checkerboard only induced the STC activation in the right hemisphere (Figure 2(a)). The hearing subjects did not show such cross-modal activity, except for the lip-reading condition. Then, we conducted a two-way ANOVA to identify the difference in brain activity between the profoundly deaf, the residual hearing, and the hearing groups and between four visual conditions (Figure 2(b) and Table 3). Results demonstrated that the activations in the right STC had a significant main effect of both Group (, corrected, Figure 2(b)) and Condition (, corrected, Figure 2(b)). Other brain areas, including the bilateral middle lateral occipital gyrus, bilateral anterior temporal cortex (ATC), and inferior frontal gyrus (IFG), were also activated to the main effect of Condition (Table 3).

We next studied the STC activation in the right hemisphere that was induced by all four visual stimuli (Figures 2(a) and 2(c) and Table 2). For the checkerboard stimulus, we found that the right STC was significantly activated, and the post hoc region of interest (ROI, selected from the map showing the main effect of Group) analysis showed that the cross-modal activation was significantly higher in both the profoundly deaf (t-test, ) and the residual hearing groups (t-test, ) than in the hearing group (Figure 2(c), first row). For the visual word stimulus, the activation in the right STC showed significant differences between the profoundly deaf and the hearing groups (t-test, ) and between the residual- and hearing groups (t-test, ) (Figure 2(c), second row). For the sign language stimulus, the STC showed enhanced responses in both the profoundly deaf (t-test, ) and the residual hearing groups (t-test, ) in comparison with the hearing subjects (Figure 2(c), third row). For the lip-reading stimulus, cross-modal activations were found in the right STC in all subject groups, with no significant differences being found between the profoundly deaf and the hearing groups and between the residual hearing and the hearing groups (t-test, all ; Figure 2(c), last row).

3.2. Correlations between Developmental Parameters and STC Activations

We then wished to investigate whether activations in auditory regions showed a stronger correlation with the duration of hearing loss or with the age of starting to learn sign language. Most of the residual hearing subjects had a reasonably similar learning duration in reading Chinese words and frequently used multiple language strategies (sign language, speech reading, and spoken language) in their communications. Thus, it is difficult to determine the accurate duration of language learning in the residual hearing group. In the correlation analysis, we only included profoundly deaf subjects and the developmental factors of duration of deafness and the age of learning sign language (Table 1). We first confirmed that the two developmental parameters were not significantly correlated with each other (Spearman’s rank, , ).

For profoundly deaf individuals, we found that the right STC activation resulting from the checkerboard was positively correlated with the duration of deafness (Spearman’s rank, , ), but not with the age of sign language learning (Spearman’s rank, , ; Figure 3(a)). In contrast, the STC activation evoked by sign language stimuli was positively associated with the onset of sign language learning (, ), but not with the duration of deafness (, ; Figure 3(b)). Similar correlations were also found for all the visual stimuli that contained language content. That is, STC activity induced by all of the language stimuli was highly correlated with the onset of sign language learning (Spearman’s rank, , ), but not with the duration of deafness (Spearman’s rank, , ; Figure 3(c)). Further analyses showed that the activation in the left ATC and left IFG during the presentation of sign language was highly correlated with the onset of sign language learning (ATC: , ; IFG: , ; Figure S1). Interestingly, the activation in the same IFG region under the word condition also demonstrated a significant correlation with the onset of sign language learning (IFG: , ) (Figure S1). However, no areas showing significant correlation with the onset of sign language were found under the checkerboard condition.

3.3. Whole-Brain Functional Connectivity with Cross-Modal Activity in the STC

We next examined the neural sources of cross-modal plasticity in the auditory cortex. We placed a seed region in the reorganized right STC and examined the difference in whole-brain functional connectivity between the profoundly deaf and the hearing subjects (, FDR corrected) under the checkerboard condition. We identified significantly greater connection strengths to the STC in the occipital cortex (peak at , t-test, ) and right thalamus (peak at , t-test, ) of deaf subjects in comparison with hearing subjects (Figure 4(a)).

To explore the difference in functional connectivity between the language stimuli and the checkerboard, we further compared the connectivity contrast (profoundly deaf versus hearing) of each language stimulus with the checkerboard contrast at the whole-brain level (, FDR corrected) (Figure 4 and Table 4). For the word stimuli, compared with the checkerboard, we found enhanced connection strengths not only in the left occipital cortex (left hemisphere, peak at , ) but also in the bilateral ATC (the left hemisphere: peak at , ; the right hemisphere: peak at , ) and right IFG (peak at , ; Figures 4(b) and 4(e)). The connected area in the left occipital cortex for the word condition was located precisely in the classical visual word form area, which is specific to the processing of visual word information [4345]. For the sign language stimuli, we identified significantly stronger connections in the bilateral middle temporal areas (the right hemisphere: peak at , ; the left hemisphere: peak at , ), the bilateral FFA (the right hemisphere: peak at , ; the left hemisphere: peak at , ), right ATC (peak at , ), and bilateral IFG (the right hemisphere: peak at , ; the left hemisphere: peak at , ; Figures 4(c) and 4(f)). The activated bilateral visual areas were identified to be selective for visual processing of the human body (extrastriate body area, EBA) [46]. For the lip-reading condition, we found significantly greater connection strengths in the bilateral FFA (the right hemisphere: peak at , ; the left hemisphere: peak at , , Figure 4(d)), right ATC (peak at , ), and right IFG (peak at , ; Figure 4(g)). The FFA, which is well known as an area involved in the processing of face information [47], was activated in both the sign language and the lip-reading conditions. In short, in comparison with the checkerboard stimulus, the STC activity induced by language stimuli received extra and common connections from the ATC (e.g., the temporal pole) and frontal (e.g., IFG) regions. Additionally, the sensory component was mainly from visual areas (including the VWFA, EBA, and FFA) that seemed highly selective to stimulus features.

3.4. Dynamic Causal Modeling

Although we found that the visual cortical areas, ATC, and IFG showed functional connections with STC under the language condition, we still do not know the causal direction between these brain regions. Dynamic causal modeling (DCM) is a generic Bayesian framework for inferring interactions among hidden neural states from measurements of brain activity and has been used in early blind individuals [48, 49]. Thus, we used DCM and Bayesian model selection to explore how language components reach the STC in deaf subjects by comparing six plausible models (Figure 5(a)). Random effects Bayesian model selection showed that cross-modal activity observed in the STC of deaf subjects was best explained by the feedback connection from IFG/ATC (Figure 5(b), left, with feedback; exceedance probability of 0.97) and feed-forward connection from V1 (Figure 5(b) right, V1 to STC; exceedance probability of 0.43; and V1 to STC/ACT; exceedance probability of 0.30) (in model 1, Figure 5(c); exceedance probability of 0.44). The result strongly suggested that the feedback component from language circuit (ATC and IFG) and the feed-forward component from the sensory region were both involved in the induction of cross-modal plasticity in the STC under the language condition.

3.5. Representational Similarity of Cross-Modal Activation in the STC

We finally wanted to explore whether cross-modal activities in the STC shared the same spatial activity pattern when in receipt of distinct contributions from occipital and temporal-frontal areas. We used a multivariate pattern analysis technique known as representational similarity analysis (RSA) [39] to examine how the spatial pattern of BOLD signals over voxels varied in response to different visual stimuli. Per subject, the representational dissimilarity matrixes (RDMs) comprised correlation distances (1 correlation coefficient) between the images from the blocks for each condition in both the profoundly deaf and the residual hearing groups, which yielded a 4 × 4 matrix. The four conditions were separated into two categories: nonlanguage (checkerboard) and language (words, sign language, and lip-reading). We then compared the correlation coefficient in the three pairs between the nonlanguage and the language conditions (category C-L) with the three pairs within the language conditions (category L-L) for each subject. Results showed that correlation coefficients between the checkerboard and any of the language-related stimuli in the bilateral STC were significantly lower than those between any two language-related stimuli in both the profoundly deaf (Figure 6(a), the left hemisphere, t-test, ; the right hemisphere, t-test, ) and the residual hearing groups (Figure 6(a), the left hemisphere, ; the right hemisphere, ). As a control comparison, no significant differences in RSA were found in the primary visual cortex in either the profoundly deaf (Figure 6(b), the left hemisphere, t-test, ; the right hemisphere, t-test, ) or the residual hearing individuals (Figure 6(b), the left hemisphere, t-test, ; the right hemisphere, t-test, ).

4. Discussion

Relative to hearing subjects, both profoundly deaf and residual hearing subjects showed enhanced STC responses to checkerboard, word, and sign language stimuli, which confirmed the existence of cross-modal plasticity after auditory deprivation [2, 14, 35, 50, 51]. While Lambertz et al. [51] reported that cortical reorganization of the auditory cortex was only present in profoundly deaf subjects not in subjects with residual hearing ability, our results showed that such plasticity existed in both groups of hearing-impaired subjects. One possible interpretation could be that intensive behavioral and perceptual training caused neuroplasticity in the late-onset sensory deprivation [30]. Despite the fact that there are differences between pre- and postlingually deaf individuals, cross-modal activity is consistently found in postlingually deaf CI patients as well as in mild to moderately hearing impaired individuals [33, 5254]. Hearing subjects also showed significant STC responses to lip-reading in the present study, which is compatible with previous observations indicating that silent speech reading activates lateral parts of the superior temporal plane in hearing adults [15, 2023].

Although sensory deprivation triggers cortical reorganization, the origin of anatomical and functional changes observed in the STC of deaf individuals is not only sensory (feed-forward) but also cognitive (feedback), such as in the use of sign language and speech reading [30]. The purely visual stimulus (checkerboard) provoked activations in the right STC, which showed correlations only with the duration of deafness [52], and strong functional connectivity with the visual cortex and thalamus, implying the contribution of sensory components to the plasticity. However, the cognitive stimuli with language content induced activations in both the right and the left STC, which exhibited strong association only with the experience of sign language learning, and enhanced functional connections with not only visual cortical areas but also the ATC and IFG, suggesting a strong potential top-down modulation of plasticity induced by the linguistic components of the cognitive stimuli. The DCM analysis further confirmed the information flow triggered by the visual stimuli with language content, by showing a strong feedback effect from IFG/ATC and a feed-forward effect from V1 to STC.

In deaf humans, it was found that auditory areas preserved the task-specific activation pattern independent of input modality (visual or auditory), suggesting the task-specific reorganization during the cortical plasticity [55]. Cardin et al. [18] showed that auditory deprivation and language experience cause activations of different areas of the auditory cortex when two groups of deaf subjects with different language experience are watching sign language. In the auditory cortical areas of deaf animals, Lomber with his colleagues showed that the neural basis for enhanced visual functions was located to specific auditory cortical subregions. The improved localization of visual stimuli was eliminated by deactivating the posterior auditory cortex, while the enhanced sensitivity to visual motion was blocked by disabling the dorsal auditory cortex [25, 56, 57]. Land et al. [58] demonstrated that visually responsive and auditory-responsive neurons in the higher-order auditory cortex of deaf cats form two distinct populations that do not show bimodal interactions. However, we still know little of how other brain regions contribute to the task-specific activations in the auditory cortex. In the present study, although neural reorganization in deaf individuals permits both the sensory and language inputs to reach the STC, the RSA result suggested that the functions of the two components were segregated within the reorganized auditory area, which confirmed the functional segregation hypothesis. Furthermore, our functional connectivity analysis suggested that the stimulus-specific activation in STC was probably developed via different neural pathways. Specifically, the sensory component of stimuli was found to be highly stimulus-specific. During the word presentation, visual areas functionally connected with the STC were located exactly within the visual word form area (only in the left hemisphere, Figure 4(b)) which is a region demonstrated to be involved in the identification of words and letters from lower-level shape images prior to association with phonology or semantics [44, 45, 59]. During the sign language and lip-reading stimuli, the functionally connected visual areas were identified as being in the extrastriate body area and fusiform face area, which is known to be especially involved in facial recognition [47, 60] and human body representation [46]. In contrast, for the language component, the cross-modal plasticity shaped by sign language could also be generalized to the responses to other language stimuli (Figure 3(c) and Figure S1). Additionally, STC activities induced by the word, sign language, and lip-reading stimuli were functionally connected with a similar neural network consisting of the temporal pole areas and inferior frontal regions (part of Broca’s area) (Figure 4), which was shown to be involved in semantic processing in the language system [61]. These results may suggest that the language components from different visual stimuli share a common language circuit for top-down visual-auditory reorganization.

The difference between checkerboard and language-related stimuli (Figure 6(a)) cannot be interpreted by other experimental accounts. For example, one may argue that the language stimuli have a higher visual richness than the purely visual stimuli, which may therefore have induced higher similarity within language stimuli. This seems unlikely, as such a difference in similarity was not shown in the primary visual cortex (Figure 6(b)).

Previous studies on prelingual deaf groups have proposed a link between poor speech outcomes and exposure to a visual language and indicate that the incapacity for processing auditory signals (poorer outcomes of cochlear implant) is due to usurping of the auditory cortex functionality by visual language [31, 32, 6264]. However, to the contrary, some studies indicate that proficiency with speech reading is linked to better outcomes of cochlear implant [31, 33, 6568]. Thus, together with other animal studies, our human imaging results on functional segregation suggest that, although exposure to sign language may indeed partially take over the auditory cortex, the auditory regions could still preserve the ability to process auditory signals following cochlear implants, with this being facilitated through the recruitment of new populations of neurons or different spatial activity patterns.

In conclusion, both language and sensory components contribute to the cross-modal plasticity of the STC in deaf people; these are associated with hearing loss duration and language experience, respectively. The feed-forward signal of sensory input is highly stimulus-specific, while the feedback signal from language input is more associated with a common neural network. Finally, even though both pathways activate auditory areas in deaf people, they seem functionally segregated in respect to cross-modal plasticity. In summary, this study provides important and unique evidence for understanding the neural circuits involved in cross-modal plasticity in deaf people and may guide clinicians in consideration of cochlear implants or hearing recovery.

Conflicts of Interest

The authors have no financial conflict of interests.

Authors’ Contributions

Mochun Que and Xinjian Jiang contributed equally to this work.

Acknowledgments

The authors are grateful to all participants. They thank the Institute of Speech and Hearing Science, the Shanghai Deaf Youth Technical School, and the Special Education Department of Shanghai Applied Technology Institute. They thank Dr. Yixuan Ku for the initial discussion on this study. They also thank Haidan Lu, Chunyan Lin, Hang Zhao, and Yan Xiu for the recruitment of participants; Di Zhao for the help with data collection; and Dr. Qing Cai, Lei Li, and Shuai Wang for the suggestions on data analysis. The research leading to these results received funding from the National Key Fundamental Research (973) Program of China (Grant 2013CB329501), the National Natural Science Foundation of China (31571084), and Shanghai Pujiang Plan (15PJ1402000) to Liping Wang.

Supplementary Materials

Supplemental information includes one figure, Supplemental visual materials, and Supplemental questionnaire (translated to English) after scanning. (Supplementary Materials)