Abstract

The goal of this study was to analyse perceptually and acoustically the voices of patients with Unilateral Vocal Fold Paralysis (UVFP) and compare them to the voices of normal subjects. These voices were analysed perceptually with the GRBAS scale and acoustically using the following parameters: mean fundamental frequency (F0), standard-deviation of F0, jitter (ppq5), shimmer (apq11), mean harmonics-to-noise ratio (HNR), mean first (F1) and second (F2) formants frequency, and standard-deviation of F1 and F2 frequencies. Statistically significant differences were found in all of the perceptual parameters. Also the jitter, shimmer, HNR, standard-deviation of F0, and standard-deviation of the frequency of F2 were statistically different between groups, for both genders. In the male data differences were also found in F1 and F2 frequencies values and in the standard-deviation of the frequency of F1. This study allowed the documentation of the alterations resulting from UVFP and addressed the exploration of parameters with limited information for this pathology.

1. Introduction

A neural dysfunction of the larynx leads to alterations in voice, respiration, and airway protection. Usually, Unilateral Vocal Fold Paralysis (UVFP) is related to a set of well-documented perceptive alterations such as weak voice, breathiness, roughness, diminished voice intensity, vocal effort, low voice efficiency, voice breaks, diplophonia, and air loss [15]. Furthermore, vocal strain is a critical component in various vocal pathologies including UVFP. A neuronal dysphonia, such as UVFP, can alter the vibrational patterns of the Vocal Folds (VF) which leads to compensatory adjustments to the glottic and supraglottic structures that increase the vocal effort and vocal strain perception [6, 7]. In addition to the perceptive alterations, UVFP also results in higher values of jitter and shimmer and lower values of the harmonics-to-noise ratio (HNR) [14, 8]. Furthermore, values of standard-deviation of fundamental frequency (F0) are reported as higher than normal because of the diminished control of the vibrational pattern of the VF, causing greater variability [911]. According to Schwarz et al. [6], there is a need to describe and understand the UVFP patient’s larynx configuration for a better and more individualised vocal intervention, preventing compensatory adjustments. Formant frequencies provide acoustic cues about the vocal tract configuration [1214]. According to Lee et al. [15] the formant’s values are relevant for discriminating normal from pathologic voices and the configuration of the vocal tract is different during phonation in people with vocal pathologies. The same authors [15] found slightly lower values of the first formant (F1) frequency and higher values of the second formant (F2) frequency in cases of UVFP. This indicates that UVFP subjects tend to have a more elevated and advanced tongue position during phonation [13, 14]. A breathy voice (common in UVFP) is reported to be associated with the same configuration referred to previously [16]. However, Titze [13] reports an approximation of the values of the frequency of F1 and F2 in cases of narrower vocal tract. These vocal tract modifications may result from the attempt to compensate the vocal alteration by patients exhibiting UVFP [2]. According to Lee et al. [15] the standard-deviations of the frequency of F1 and F2 have higher values in cases of UVFP indicating a higher instability of the vocal tract configuration during phonation.

The aim of this study was to compare perceptually and acoustically the voices of subjects with UVFP and the voices of subjects representing normal quality. Measures related to the vocal tract configuration, namely, formant frequencies, were also analysed and correlated with alterations caused by vocal pathology.

2. Materials and Methods

This is a quantitative, descriptive, and cross-sectional study [1719]. The recordings were made in Hospital de Santo António and Hospital de São João, both in Porto, Portugal, and at the Speech, Language, and Hearing Laboratory (SLHlab) at the University of Aveiro, Portugal. This took place as part of the data collection process of the first representative European Portuguese pathological voice database [20]. Part of this data was divided into two groups: a group having vocal pathology (UVFP) and a group without vocal pathology. A group of 17 patients, evaluated with videolaryngoscopy and diagnosed with UVFP, formed the pathologic group. The inclusion criteria for this group were having diagnosis of UVFP, not having had speech and language therapy intervention, and being over 18 years old. The exclusion criteria were having other concomitant pathologies to UVFP and/or having been submitted to a surgical intervention to correct the vocal pathology. A group of 85 normal voice volunteers were included in the control group based on two distinct procedures: 43 subjects were evaluated with videolaryngoscopy and diagnosed as normal; 42 subjects were evaluated using a vocal anamnesis and summative evaluation (a similar procedure was used by Roark et al. [21]). The inclusion criteria for the control group were having normal voice quality and being over 18 years old. The exclusion criterion was having vocal or other pathologies that may interfere with normal voice production.

Each pathologic case was individually matched to five subjects of the control group in order to increase the power of statistical tests [17, 22]. The cases were matched according to gender and age. The first variable was gender because after puberty there is a set of different characteristics that differentiate male and female voices [23]. The second variable was age because with aging some functional and structural modifications occur at phonatory level [23, 24]. Taking into account the fact that there are notable voice changes if the subjects’ age difference is more than 10 years [2529] the maximum allowed difference of age between the matched subjects was 5 years, in an attempt to reduce variability.

Four (4) subjects with UVFP were male (23.5%) and 13 subjects were female (76.5%). The youngest patient was 30 years old and the oldest 72. The mean age for the pathologic group was 56.7 years with a standard-deviation of 12.7 years. Nine (9) patients had left UVFP (52.9%) and 8 right UVFP (47.1%). In the control group 20 subjects were male (23.5%) and 65 were female (76.5%). The mean age of the control group was 56.1 years and the standard-deviation was 12.7 years.

The voice recordings were made in a clinical setting using Praat 5.3.56 (32-bit edition) [30]. A Behringer ECM8000 microphone and a Presonus AudioBox USB (16 bits and 48000 Hz) were used for all of the recordings. The subjects were seated and the microphone was aligned to the mouth at a distance of 30 cm [31, 32]. An informed consent was signed and the vowel [a] was recorded. A parcel of the vowel was then annotated according to criteria defined by Pinho et al. [3]: 200 ms after the onset of phonation and with approximately 100 cycles. This parcel was then manually analysed with Praat 5.3.56 (64-bit edition) with an autocorrelation method (used by default by the software) to estimate F0. There were some errors in the identification of the period, so a modification of the “octave cost” to a higher value was made (as suggested in Praat’s manual). The values of the parameters used to run the autocorrelation method are presented in Table 1.

From the “voice report” Praat window the following values were extracted: mean F0; standard-deviation of F0; jitter (ppq5); shimmer (apq11); mean harmonics-to-noise ratio (HNR). The Burg [33] method (used by default by Praat) was used to track the formants. The “formant listing” for the same 100 cycles was obtained and the mean value and standard-deviation were calculated for the frequency of F1 and F2. The values were double-checked through the spectrogram of each segment.

Each voice was also perceptually assessed using the GRBAS scale [34]. For the pathologic voices a group of five speech and language therapists with expertise in voice assessment made the perceptive evaluation. For the normal voices one speech and language therapist made the perceptive assessment. For these procedures the experts used the following headphones connected to the internal soundcard of a laptop computer: Sennheiser HD 380 Pro; Sennheiser HD201; Sony MDR-CD270; Sony MDRZX100B; Sony MDR-ZX110NA. All of the assessments were made blindly regarding the group (patients or normal subjects).

For the statistical analysis IBM SPSS Statistics version 20 was used. The interrater consistency was analysed using the Kendall Coefficient. The Mann-Whitney test was used to analyse the GRBAS scale parameters. The acoustic parameters that had normal distribution (HNR, F2♀, standard-deviation of F0♂, F1♂) were statistically analysed using the -test and parameters that did not have normal distribution (Jitter (ppq5), Shimmer (apq11), F0♀, standard-deviation of F0♀, F1♀, standard-deviation of F1♀, standard-deviation of F2♀, F0♂, standard-deviation of F1♂, F2♂, and standard-deviation of F2♂) were analysed with the Mann-Whitney test. The normality was tested with the Shapiro-Wilk test. A level of significance of 0.05 was used for all statistical analyses.

All of the procedures had the acceptance of the Ethical Commission of the Hospital de Santo António and Hospital de São João. An authorisation from the National Commission for Data Protection was also obtained.

3. Results and Discussion

3.1. Interrater Consistency

The consistency between the five judges that assessed the pathologic voices was analysed using Kendall’s test. Table 2 shows that there is consistency in all of the parameters of the GRBAS scale between judges. The fact that the judges presented consistency between them indicates that they have a similar internal understanding of the used instrument [35]. This consistency is likely to be related to the fact that the GRBAS scale is widely used, understood, and recommended worldwide by clinicians [36]. The ’s value, shown in Table 2, can vary between 0 (no general tendency of consistency between judges) and 1 (all judges responded equally) [37]. In Table 2 we can also see that the lowest value of was found for the R (Rough) parameter. This may be due to the fact that this parameter is a supraclass of perceptive parameters that can lead to various interpretations between different judges [38]. The fact that none of the parameters had a very good consistency was expected because the perceptive assessment is a very complex procedure that includes various subjective elements that are not totally understood [36, 39]. Despite the results varying from reasonable to good, perceptive evaluation is still a central procedure in the vocal assessment [40].

3.2. Comparison of GRBAS Scale Parameters between Normal and UVFP Voices

The results of the perceptive assessment of the voices of the normal and UVFP subjects were analysed using the Mann-Whitney test. Table 3 shows that all of the GRBAS parameters were statistically different between groups, being higher in the pathologic group as expected (see Figure 1). The control group had a mean score of zero, which was expected because the control group was intended to have a normal/nonaltered voice quality that would be associated to a 0 value (normal) of all parameters assessed in GRBAS. In the pathologic group we can see that the parameter with the highest values was G (Grade), which has been observed before by other authors [41, 42]. In this group of UVFP there were alterations in all of the GRBAS parameters, varying between a mild and moderate grade of perturbation. Grade (G), Rough (R), and Breathy (B) presented the highest mean scores, as previously observed by various authors [14, 43]. Another disturbance that is commonly found in subjects with UVFP is a weak voice [1, 4, 44] which is reflected in parameter A (Asthenic), also found in this sample. In addition to the previous parameters, according to Rosenthal et al. [7], it is usual to find vocal strain (parameter S) in these cases, which could also be observed in this study.

One of the major alterations caused by UVFP is the incomplete glottal closure that originates excess air during phonation that creates a breathy voice (parameter B is altered) [2, 4, 45]. This air leakage leads to a lower voice energy originating a weak voice (parameter A is altered) [2, 4, 45]. The irregularity of the VF cycles (parameter R reflects this) is due to the reduced mobility/immobility of the paralysed VF or to the fact that the unhealthy VF may present a passive vibration [4, 46]. In some cases, in an attempt to overcome the alterations caused by the UVFP, patients create compensations that can lead to strain in the supraglottic region, increasing the vocal effort and giving the voice a strained characteristic (parameter S) [4, 7]. Grade (G) is related with the other parameters and varies according to the severity of the overall voice perturbation [47].

3.3. Comparison of Acoustic Parameters between Normal and UVFP Voices

Although perceptive assessment is the most used technique for vocal assessment, it is a subjective process that leads to some variability issues [8]. Contrary to this, acoustic data allows objective and noninvasive measures about the behaviour of the VF [8, 15, 4850]. Table 4 shows statistically different values of jitter (ppq5), shimmer (apq11), and HNR between the normal and pathologic voices. Jitter, which is related to the absolute difference between the durations of consecutive cycles [43], is higher in UVFP subjects (see Figure 2). These results were also obtained by other authors [2, 3, 8]. These higher values may be due to the asymmetry at the VF level, caused by the UVFP that leads to vibration irregularities in frequency altering the jitter values [2]. Similarly shimmer, which is related to the absolute difference between the amplitudes of consecutive cycles [43], is also higher in UVFP cases (see Figure 3). These results were also obtained by other authors [2, 3, 8]. The asymmetry caused by UVFP leads to vibration irregularities in amplitude altering shimmer values [2]. This parameter is also increased by a poor and inconsistent contact between VF, which is very common in UVFP [51]. Thus, these UVFP subjects present more cyclic irregularity at frequency and amplitude level compared to the normal voice subjects. It should be noted that we also had higher than normal values of shimmer in the normal sample. This may be due to the fact that the recordings were made in a clinical setting that is not entirely noise-free and this could have interfered with the data calculation of this parameter. Regarding the HNR, which is obtained from the ratio between the harmonic and noise components of the signal [43], the results indicate a lower value in the pathologic group (see Figure 4). These results were consistent with the literature [2, 8]. The alterations in periodicity caused by the UVFP originate a lower ratio between the two components, diminishing the HNR values in the pathologic cases [2]. These results indicate that patients with UFVP have higher relative noise amplitude during phonation (than the normal subjects) lowering the HNR value.

The parameters presented in Tables 5 and 6 were divided by gender because females and males have different inherent vocal tract and VF characteristics, especially in terms of size and mass [52]. For F0 (see Figures 5 and 6) we can see that there are no significant statistical differences between pathologic and normal voices in both genders. This fact was also previously described by Oguz et al. [8]. Fundamental frequency is directly related to and dependent of length, tension, mass, rigidity, and the interaction with the subglottic pressure [53]. The fact that there are no differences between the two groups indicates that, in this sample, the modifications at VF level caused by UVFP are not sufficient to create real alterations in F0. Also according to Woo et al. [54] the majority of UVFP subjects present F0 values close to normal.

The standard-deviation of F0 (see Figures 7 and 8), which is related to the variations in vibration and muscular control of the VF, is higher in the pathologic group indicating important alterations in the described aspects [53]. Thus, subjects with UVFP present more F0 variability indicating a poorer muscular control and lower vibrational stability of VF. These results are supported by other authors [10, 11, 46, 53].

The vocal tract configuration interacts with VF oscillation; that is, vocal tract configuration constrains VF functioning during phonation [15, 55]. After the onset of UVFP patients usually develop some compensatory adjustments at glottic and supraglottic level altering voice and vocal tract configuration [6]. The description of vocal tract configurations in subjects with UVFP could guide treatments and help prevent negative compensations [6, 7].

Regarding F1 frequency, Table 5 shows that for females there are no statistically significant differences between pathologic and normal subjects (see Figures 9 and 10). A similar result was obtained by Lee et al. [15]. Formant frequency values shown in Table 6 reveal that, for males, differences between groups are statistically significant. Lower values of F1 frequencies in UVFP cases were expected (based on data reported previously [15]); however, Table 6 clearly shows that the F1 frequency values were higher in the pathologic group. However, authors such as Hartl et al. [2] and D. H. Klatt and L. C. Klatt [56] have also reported higher F1 frequency values for voices with similar characteristics to UVFP patients. Since the frequency of F1 is inversely related to the vertical movement of the tongue, higher values of this formant (in UVFP subjects) indicate a lower tongue position during phonation for the pathologic subjects. This result is in line with what was found by Higashikawa et al. [57] for whispered voices.

The second formant (F2) frequency, which is related to the horizontal tongue movement, is higher in UVFP male subjects (see Figures 9 and 10). This result was also obtained in other studies [2, 15]. For females, although the value is very close to the significance level, there are no statistically significant differences between normal and UVFP subjects. However, we can see a slightly higher value of F2 frequency in the female pathologic group compared to normal females. Therefore results indicate that there could be a tendency to a more advanced tongue position during phonation in cases of UVFP. This is consistent with the results presented by Lotto et al. [16] who studied breathy voices (typical of UVFP).

As for the SD of the frequency of F1 shown in Table 5, there were significant differences between the two groups being SD of F1 frequency higher in the patients, for male participants. There were no significant differences between groups for females (see Table 6). As for the SD of the frequency of F2 there were statistically significant higher values in the UVFP group for both genders (see Figures 11 and 12). Therefore these parameters, especially the SD of the frequency of F2, may have an important role in discriminating normal and UVFP voices. Pathologic voices showed higher values of formant frequency SD. These results were also obtained by Lee et al. [15]. This indicates a greater instability of the vocal tract configuration in UVFP during phonation.

Overall results related to the vocal tract configuration (F1 and F2) show great potential to discriminate between normal and UVFP voices (especially for males) in spite of the localisation of the lesion being at the VF level. This is in agreement with the literature which clearly indicates that the behaviour of the VF is not entirely independent of the vocal tract [55, 58, 59]. Thus, these parameters can add useful information to the assessment procedure and may be used as a complement to the more traditional VF behavioural assessment.

It should be noted that the overall results obtained for females distance themselves from what was initially expected. These differences between genders may be due to a greater technical difficulty in analysing female voices [56, 60]. To a large extent, these difficulties are associated with the identification of formants, due to the fact that F0 is higher, and this increases the difficulty in F1 estimation [56].

4. Conclusions

In this study various ways of assessing the UVFP voice were combined. Since vocal therapy is one of the first noninvasive treatment options with potential to help the client to reacquire a functional voice, it is fundamental to know in detail the alterations created by the pathology at VF and vocal tract level to better guide the treatment. Perceptual differences between normal and UVFP voices were found. The perceptual parameters that better characterised this data of UVFP subjects were Rough (R) and Breathy (B), but altered values of Asthenic (A) and Strained (S) were also found. As far as acoustic parameters are concerned there were no differences in F0 values between normal and UVFP voices in this sample. Jitter (ppq5), shimmer (apq11), HNR, and SD of F0 had an important role in discriminating normal and UVFP voices. Measures related to the vocal tract configuration were also indicative of alterations at VF level; therefore the analysis of formant frequencies values and their SD may have an important role in a clinical setting contributing to a better knowledge of the alterations caused by the vocal pathology. Future work should continue to explore formants and their relation to vocal pathology.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the Otorhinolaryngology Team from Hospital de Santo António and Hospital de São João. This work was partially funded by National Funds through FCT (Foundation for Science and Technology), in the context of the projects UID/CEC/00127/2013 and Incentivo/EEI/UI0127/2014. The Advanced Voice Function Assessment Databases (AVFAD) project is supported by the School of Health Sciences (ESSUA), University of Aveiro, Portugal.