Abstract

Second language learning has been shown to impact and reshape the central nervous system, anatomically and functionally. Most of the studies on second language learning and neuroplasticity have been focused on cortical areas, whereas the subcortical neural encoding mechanism and its relationship with L2 learning have not been examined extensively. The purpose of this study was to utilize frequency-following response (FFR) to examine if and how learning a tonal language in adulthood changes the subcortical neural encoding in hearing adults. Three groups of subjects were recruited: native speakers of Mandarin Chinese (native speakers (NS)), learners of the language (L2 learners), and those with no experience (native speakers of foreign languages (NSFL)). It is hypothesized that differences would exist in FFRs obtained from the three language experience groups. Results revealed that FFRs obtained from L2 learners were found to be more robust than the NSFL group, yet not on a par with the NS group. Such results may suggest that in human adulthood, subcortical neural encoding ability may be trainable with the acquisition of a new language and that neuroplasticity at the brainstem level can indeed be influenced by L2 learning.

1. Introduction

There is ample literature on neuroplasticity as it relates to bilingualism and second language (L2) learning [15]. Learning two languages simultaneously is typically defined as simultaneous bilingualism. This often occurs at a younger age when spoken language learning occurs with relative ease (e.g., children growing up in bilingual households) [6, 7]. Bilingualism can also be achieved by learning L2 after forming his or her native language, which is typically defined as sequential bilingualism [7, 8]. Both simultaneous bilingualism and sequential bilingualism have been shown to impact the central and peripheral nervous system, indicating its adaptive and plastic nature. For example, in the simultaneous bilingual population, it has been found that at the cortical level, grey matter density, cortical thickness, and white matter integrity differ from those in the monolingual groups [7]. At the subcortical level, researchers have examined pitch coding capacity in children and adult bilingual subjects who learned an L2 at younger ages (before age 3) and found that subcortical neural activity was also experience-dependent: their brainstem responses were higher in amplitude than their monolingual counterparts [9, 10].

Sequential bilingualism and L2 learning and their relationship to neuroplasticity, especially for those learning a second language in their adulthood, have garnished attention in more recent years [10, 11]. With the help of rapidly advancing technology in neuroimaging and other techniques, researchers have been able to better understand the neuroplasticity of L2 learning in adulthood. For example, it has been found that adults who were immersed in Chinese learning courses demonstrated increased white matter density compared to their monolingual control counterparts [11]. At the subcortical level, however, little research exists regarding how the brainstem changes its encoding of information during or after the L2 learning experience, especially in adulthood.

One of the more recently utilized tools in subcortical neuroplasticity investigations is the scalp-recorded frequency-following response (FFR) [12]. FFR reflects synchronized neural activities at the level of the brainstem, typically elicited by speech stimulus [13]. As a noninvasive far-field potential response, it reflects the synchrony of neural activity generated by the lateral and inferior colliculus in the brainstem [14]. It is believed to reflect the phase-locking activity of multiple generators in the auditory brainstem [15].

FFR has been shown to be an effective tool in assessing the processing of complex sounds along the auditory pathway, especially at the subcortical brainstem level [1619]. It has been widely utilized in previous studies to evaluate the effects of auditory and music training [20, 21], speech perception in noise [22], and how aging affects the auditory functions [2326]. Furthermore, the influences of language background on subcortical pitch processing in different groups of people have been studied using FFR. For example, FFRs recorded from Mandarin Chinese (tonal language) speakers and English (nontonal language) speakers have shown that the pitch tracking accuracy and strength, which both reflect subcortical neural coding capacity, were stronger in speakers of Mandarin Chinese than those of English speakers, indicating that the phase-locking ability in the auditory brainstem could be strengthened by long-term language experience [2729].

Neural coding of the auditory brainstem could be shaped and reshaped by short-term auditory training and long-term musical training, indicating that both auditory training and musical training have effects on the speech processing of the auditory brainstem [19, 21, 31]. Speech-evoked FFRs in newborn infants from different language background families, Mandarin Chinese and American English, have been shown to be very similar to each other, suggesting that the aforementioned difference in adulthood may indeed be the result of long-term language exposure [31]. In other words, neuroplasticity as a product of long-term auditory influence is evident at the subcortical level. However, to date, there have been few studies aimed at how L2 learning in adulthood could change one’s subcortical neural encoding, reflected by measurements such as FFR.

The aim of the current study was to investigate the impact of L2 learning and exposure on an adult’s subcortical pitch coding ability, measured by speech-evoked FFR. It was hypothesized that adult L2 learners of a tonal language would have more robust speech-evoked FFRs than those who do not have any exposure to the specific language, yet weaker than those obtained from native speakers.

2. Material and Methods

2.1. Participants

Three groups of subjects were recruited in this study: native speakers (NS), second language learners (L2 learners), and native speakers of foreign languages (NSFL). For the NS group, 15 native speakers of Mandarin Chinese (seven males and eight females, years of age) were recruited from the local community in Beijing, China.

Another 15 participants who were learning Mandarin Chinese were recruited for the L2 learner group (five males and ten females, years of age). These participants were native speakers of Indonesian, which is not a tonal language. Participants in the L2 learner group had no prior exposure to Mandarin Chinese until they started learning ( years in experience) Mandarin Chinese in Beijing, China. All L2 learners had achieved at least a level five on the Hanyu Shuiping Kaoshi (HSK). HSK is a standardized test of Mandarin Chinese as a foreign language. It focuses on nonnative speakers’ ability to use Mandarin Chinese for communication in daily life, study, and work [32]. HSK, which has six levels with level six being the highest, is considered to be one of the most influential language examinations for Mandarin Chinese [32]. A level five HSK is frequently considered the benchmark of effective verbal communication skills in Mandarin Chinese for nonnative speakers.

Another 15 participants (ten males and five females, years of age) were recruited for the NSFL group. None of them had any experience in learning Mandarin Chinese. Of these 15 participants, eight were native speakers of Bengali, five were native speakers of Dutch, and two were native speakers of Hindi, none of which are tonal languages.

None of the 45 participants had any prior history of otological, audiological, neurological, or psychological issues. They also reported no prior training or involvement in music. This was determined by the participants completing a modified version of the Munich Music (MUMU) Questionnaire [33]. The modified MUMU Questionnaire included four questions on whether they had ever trained to play any musical instruments and whether they had received professional musical or vocal training.

All participants provided written consent for this study, which was approved by the Intuitional Review Board at the Beijing Institute of Otolaryngology and Beijing Tongren Hospital.

2.2. Audiometry

Pure tone audiometry, using the cornea two-channel audiometer and insert earphones in a sound treated room, with octave frequencies between 250 Hz and 8000 Hz, revealed hearing thresholds below 15 dB HL in all participants (Figure 1). Click-evoked auditory brainstem responses (ABR) were measured (Intelligent Hearing System, Miami, USA) in all participants with 100 μs rarefaction clicks at 80 dB SPL. Descriptive data of waves I, III, and V of click-evoked ABR is shown in Table 1. No statistically significant group differences were found between any two groups in latencies of wave I, III, or V (Table 2).

2.3. FFR Stimuli and Procedures

FFR stimuli used in this study were Chinese syllables /yi/ with rising tone and falling tone, marked as /yi2/ and /yi4/, respectively. They were recorded from a male native speaker of Mandarin Chinese, with an overall duration of 250 ms, including 5 ms each of rising time and falling time. The fundamental frequency (F0) trajectory of the /yi2/ stimulus changed from 120 Hz to 155 Hz and from 180 Hz to 130 Hz for /yi4/. The formant frequencies of the two stimuli were  Hz,  Hz,  Hz, and  Hz. Each stimulus was presented via an electromagnetic shielded ER-3 insert earphone at 70 dB SPL at a rate of 3.2 times per second to the right ear 2000 times.

Participants rested in a supine position with their eyes closed. A typical one-channel electroencephalogram (EEG) recording montage was utilized: gold-plated recording electrodes were placed at the high forehead for the noninverting, right mastoid for the inverting, and low forehead for the ground (all impedances were kept ≤3 kΩ). EEG was recorded using the SmartEP system by Intelligent Hearing Systems (Miami, FL, USA) with an internal recording sampling rate of 13,333 Hz (sampling interval of 75 μs). The online band-pass filter was set between 100 Hz and 3000 Hz. Any recording sweep that yielded background EEG amplitudes larger than 25 μV was rejected. Measurements started with click-evoked ABR, followed by FFR stimuli /yi2/ and /yi4/. Another click-evoked ABR was performed at the end of each session to ensure the quality of the EEG recordings.

2.4. Data Analysis

EEGs obtained from participants were analyzed to examine the magnitude of the FFR elicited from the stimulus in all three groups. A MATLAB (MathWorks, Natick, MA) pitch tracking script was used for the EEG analysis. An index called Pitch Strength was utilized to quantitatively represent the magnitude of the responses. To calculate the Pitch Strength of each FFR recording, an autocorrelation function was performed on the EEG data resulting in a series of autocorrelation coefficients, which were normalized between 0 and 1. Pitch Strength is defined as the difference between the maximum and minimum autocorrelation coefficients of the EEG signal [28]. It represents the periodicity preserved in the FFR recording, which was a physiological response to the pitch information of the stimulus. Therefore, Pitch Strength, ranging between 0 and 1, demonstrates the preciseness and robustness of the auditory brainstem’s ability in accurately following the frequency information in the stimulus, in the case of this study, tonal information in the Mandarin Chinese voice syllables.

To test the hypothesis, an analysis of variance (ANOVA) was performed to compare the effect of language experience (three groups: NS, L2 learners, and NSFL) and tone type conditions (two groups: /yi2/ rising tone and /yi4/ falling tone) on Pitch Strength. A Tukey HSD post hoc test was carried out on the language experience group differences. Statistical level of significance was set at .

3. Results

3.1. Temporal Waveforms and Spectrograms

Temporal waveforms of FFRs elicited by stimuli /yi2/ and /yi4/ were plotted, as shown in Figure 2. Upon visual inspection, it could be observed that the group averaged FFRs from the NS group had clear periodicity from both stimuli, whereas those from L2 learners and NSFL groups were less periodic and more “random.” Similarly, short-time spectrograms based on averaged FFRs in the three groups were also calculated, as shown in Figure 3. Clear, continuous, and robust spectral energy in the FFR obtained from the NS group could be seen, and such spectral energy follows the fundamental frequency (100-200 Hz) of the stimuli. In contrast, spectral energy in the NSFL’ FFRs was not as concentrated nor as robust as the NS for both /yi2/ and /yi4/ stimuli, whereas the L2 learners’ FFRs were in between.

3.2. Statistical Analysis

One-way ANOVA with repeated measures tested the effects of language experience and tone type on Pitch Strength. Results revealed a statistically significant difference in Pitch Strength among the three groups with different language experiences (, ). Post hoc comparisons using the Tukey HSD test indicated significantly stronger Pitch Strength in the NS group than the L2 learner group () and the NSFL group (). Pitch Strength obtained from the L2 learner group was also significantly stronger than that from the NSFL group () (Figure 4).

No statistically significant difference was found between the two tone types (, ), nor the interaction between language experience and tone type (, ) on Pitch Strength.

4. Discussion

The current study compared speech-elicited FFRs obtained from native Mandarin speakers (NS), second language learners who were not native speakers of any tonal languages (L2 Learners), and native speakers of foreign languages who did not have any experience in tonal languages (NSFL). Results showed significant group differences in Pitch Strength, an index representing subcortical pitch processing ability, in these three groups with different language experiences. It suggests that at the brainstem level, neural encoding capacity of certain language-specific acoustic features, in this case, pitch information manifested as tones in Mandarin Chinese, can be influenced by the amount of exposure to and the active processes of learning that specific language, even in adulthood. The difference in subcortical pitch encoding ability between native speakers and native speakers of foreign languages found in this study is consistent with some of the previous studies [27, 28] where native speakers of tonal languages showed more accurate and robust FFRs than those who did not speak any tonal languages.

In addition to confirming experience-dependent neural plasticity at the subcortical level, the current study adds that such subcortical plasticity exists not only in adults who have had life-long language exposure but also in those who are learning a second language in their adulthood. Pitch Strength obtained from L2 learners was shown to be lower than that from the native speakers, but higher than that from the native speakers of foreign languages. This suggests that after several years of learning a tonal language, the L2 learners’ auditory brainstem may have “adapted” to better process pitch information. To our knowledge, this is the first time that subcortical neural changes following L2 learning in adulthood is reported in literature.

The underlining mechanism in how auditory exposure or learning experience influences the brainstem’s pitch coding ability has been discussed in previous works. For example, a brief auditory training scheme was shown to have elevated pitch coding accuracy in children with learning disabilities, reflected by their enhanced FFR posttraining [20]. One possible explanation provided by the authors was that auditory training may have improved the neural synchronies in the auditory brainstem, which in turn improved the preciseness of neural processing of complex speech sounds. Similarly, more accurate and robust subcortical neural representations of F0 trajectories in lexical tones were obtained in the population of musicians compared with nonmusicians [30]. It was proposed that musician’s training has imposed higher demand for precise and efficient subcortical pitch extraction, which resulted in strengthened brainstem responses to the voice pitch.

In the current study, it is proposed that the extensive exposure to tonal information in Mandarin Chinese during L2 learners’ learning experience served as a means of auditory training that ultimately leads to the reshaping of their subcortical neural coding ability. It has been proposed that linguistically relevant auditory signals begin to be processed before they reach the cortex. One possible theory discussed in previous studies [2729] is that the phase-locking activities of neurons may be enhanced by both actively exciting pitch-relevant intervals and/or inhibiting irrelevant intervals. Interspike discharges of these neurons are possibly subject to corticofugal egocentric selection [34, 35]. Learning Mandarin Chinese as a second language, as experienced by the L2 learners in the current study, may have introduced similar corticofugal processes in enhancing neural sensitivity to specific linguistic and acoustic features such as tones. This may result in an enhanced neural phase-locking representation of those tones in Mandarin Chinese at the brainstem level and in turn facilitate better speech and language perception in the cortex.

Results from this study have also provided an interesting observation where FFRs obtained from L2 learners not only reflected a modification in their subcortical pitch coding ability after learning the language for years ( years); their FFRs also seemed to “fit” right between the native speakers of foreign languages who had no language exposure whatsoever and had the weakest FFR and the native speakers, who clearly had lifelong ( years) exposure and the most robust responses. It begs the question of after they have started learning a tonal language, at what point does the L2 learners’ auditory brainstem start to adapt and reshape to better encode pitch information and to better facilitate tonal information processing. This can only be answered by extending the current study into Mandarin Chinese learners who are less proficient than the current participants and, ideally, follow them through their language acquisition experience. The future direction could also potentially answer the question of if, and when, the subcortical pitch coding capacity of those L2 learners, who are learners of a different language in their adulthood, could ever achieve the same accuracy and robustness of those who are native speakers of that language.

One aspect that we are aiming to improve in the ongoing expansion of the current study is the relatively diverse language backgrounds of the L2 learners and NSFL. Although all L2 learners were native speakers of Indonesian, the NSFL group had more variable language backgrounds: Dutch, Hindi, Bengali, and Indonesian. None of these languages are tonal or utilize pitch in a lexical context, and this variability should not have significantly impacted the results of this study since FFR focuses mainly on pitch coding ability. Nonetheless, differences in acoustic and linguistic features in these languages may have played a subtle role and future studies will try to refine the recruiting process to minimize such impact.

5. Conclusions

Results of this study demonstrated that the experience of learning Mandarin Chinese as a second language can enhance the subcortical pitch coding capacity of native speakers of foreign languages, as reflected by the FFR. It is proposed that second language learning experience could potentially reshape the subcortical neural wiring to better facilitate the processing of certain acoustic and linguistic features in a specific language, even in adulthood.

Data Availability

The TXT data used to support the findings of this study were supplied by Shuo Wang under license and so cannot be made freely available. Requests for access to these data should be made to Shuo Wang, [email protected].

Conflicts of Interest

The authors declare no potential conflict of interest in this research.

Acknowledgments

This work was funded in part by grants from the National Natural Science Foundation of China (Nos. 81870715, 81200754), the Promotion Grant for High-Level Scientific and Technological Elites in Medical Science from the Beijing Municipal Health Bureau (Grant No. 2015-3-012), the 2017 Excellent Talents in Medical Science from the Dongcheng District of Beijing, the 2012 Beijing Nova Program (Z121107002512033), and Beijing Natural Science Foundation (Nos. 7122034, 7154190). The authors would like to thank Drs. Shu-En Lim, Jennifer Henderson-Sabes, and Jayaganesh Swaminathan for their assistance during the proofreading process.