Abstract

This study was conducted to illustrate that auscultation features based on the fractal dimension combined with wavelet packet transform (WPT) were conducive to the identification the pattern of syndromes of Traditional Chinese Medicine (TCM). The WPT and the fractal dimension were employed to extract features of auscultation signals of 137 patients with lung Qi-deficient pattern, 49 patients with lung Yin-deficient pattern, and 43 healthy subjects. With these features, the classification model was constructed based on multiclass support vector machine (SVM). When all auscultation signals were trained by SVM to decide the patterns of TCM syndromes, the overall recognition rate of model was 79.49%; when male and female auscultation signals were trained, respectively, to decide the patterns, the overall recognition rate of model reached 86.05%. The results showed that the methods proposed in this paper were effective to analyze auscultation signals, and the performance of model can be greatly improved when the distinction of gender was considered.

1. Introduction

As one of the four diagnostic methods of Traditional Chinese Medicine (TCM), auscultation identifies a syndrome or disease by listening to the speech of patients. Auscultation method was definitely proposed in the Internal Classic of Huang Di [1]. However, it was not until after the Ming and Qing Dynasties that this method attracted attention from the medical field and both its theoretical and clinical application underwent much development. Since then, auscultation has become a distinctive step-by-step diagnostic method. In TCM, auscultation mainly depends on the auditory senses of the physician to accurately identify asthenia, sthenia, and visceral lesions in the patient. Therefore, auscultation is considered a qualitative method that produces unconvincing results.

Objective studies on auscultation have made some progress with the recent developments in computer and signal processing technologies [2]. Frequency spectrum analysis was made on the voice of cough patients making use of digital sonograph [3], but it cannot be quantitatively analyzed and applied for clinical diagnosis. The survey about objective auscultation of TCM was presented in [4], in which the digital technology with respect to auscultation was described and analyzed. The nonlinearity of the vowel /a:/ signals of healthy persons and patients with deficiency syndrome was investigated by using delay vector variance [5], whose studies were an effective attempt on the objective auscultation research. Chiu et al. proposed four novel acoustic parameters, the average number of zero-crossings, the variations in local peaks and valleys, the variations in first and second formant frequencies, and the spectral energy ratio to analyze and identify the characteristics among nondeficiency, Qi-deficiency, and Yin-deficiency subjects [6]. The energy values of wavelet packet coefficients were extracted for the auscultation signals of healthy people and patients with five Zang-organs diseases, and the results were analyzed and discussed [7, 8]. Wavelet packet transform (WPT) and Sample Entropy were combined together to analyze the auscultation signals in TCM. Sample Entropy values for WPT coefficients reflecting the complexity of the signal in different time-frequency resolutions were computed to quantify the signals from three kinds of samples including Qi-deficiency, Yin-deficiency, and healthy people. The statistical and classification results indicated that the method is useful and effective in auscultation research [9].

These studies provide a good basis for the objective research on its clinical use. However, these studies are still in their initial stages, the experiments are usually carried out in limited conditions, the adopted auscultation signals are not typical and comprehensive enough, and the algorithms are unreliably conducted on a small sample database. Therefore, the creation of a reliable and accurate algorithm remains a challenge in auscultation research.

Speech production is a complex process to analyze through traditional methods as it is influenced by glottis, track, and radiation [10]. The nonlinear characteristics of speech, such as the changes in the voice of the speaker when pronouncing the same phoneme twice, also hinder its analysis. Moreover, the predicted signal cannot accurately match the original speech regardless of the linear production order. Pronunciation includes both the nonlinear libration process and the speech itself. Changes in the shape of the tongue and track, particularly in spirant and explodent speeches, can produce a whirlpool on the track boundary layer that subsequently becomes a chaotic onflow [11]. The time-domain waveform of speech is self-comparable, and it also observes the characteristics of periodicity and randomicity. These findings show that chaos and fractal theories can be used in speech signal analysis [12].

Speech analysis must also be improved. The chaotic and fractal characteristics of speech suggest the combination of the nonlinear fractal dimension and the wavelet packet transform (WPT) to solve the problems in speech signal analysis. This study employed fractal dimension with WPT to analyze the auscultation signals of patients with chronic bronchitis as research object. The patients with chronic bronchitis occurred mostly in the pattern of lung Qi-deficiency and the pattern of lung Yin-deficiency. Therefore, this study analyzed the auscultation signals of patients with lung Qi-deficient pattern, patients with Yin-deficient pattern, and healthy subjects as control. Statistical analysis was made to obtain the effective features, which were used in the multiclass support vector machine (SVM) classifiers. Classification models were constructed to automatically identify the auscultation samples. The classification results are discussed at the end of the study.

2. Materials and Methods

2.1. Collected Materials

Patients who met the diagnostic criteria of chronic bronchitis and provided informed consent were included in the present study. The diagnostic criteria of patients were based on western medicine and TCM. The diagnostic criteria based on western medicine were adopted “the chronic bronchitis clinical diagnosis and curative effect judgment standard” [13]. The diagnostic criteria based on TCM were according to “Guideline for Clinical Study on New Drugs of Traditional Chinese Medicine” [14] and “Clinic Terminology of Traditional Chinese Medical Diagnosis and Treatment-Syndromes” [15], which is the national standard made by the State Bureau of Technical Quality Supervision, as well as the standards in textbooks [16, 17]. Patients with other severe disease, as well as those who cannot express their feelings clearly and did not provide informed consent, were excluded in the present study.

A total of 229 subjects were collected by TCM Syndrome Lab in Shanghai University of Traditional Chinese Medicine. The 186 patients with chronic bronchitis from the affiliated hospital of Shanghai University of TCM were separated into two groups, namely, lung Qi-deficient subjects (Group Q) and lung Yin-deficient subjects (Group Y). Fourty-three healthy subjects from the faculty of the Shanghai University of TCM were in control group (Group H). The detailed information is listed in Table 1. Lung Qi-deficiency and lung Yin-deficiency are TCM specific terms. The pattern of lung Qi-deficiency refers to the condition of declining function of the lung in governing Qi and defending the exterior. Its clinical manifestations are weak cough, panting, spitting of clear and thin phlegm, laziness in speaking, fatigue and a pale complexion, and other signs and symptoms. The pattern of lung Yin-deficiency refers to lung-yin failing to disperse and descend and an internal production of deficient-heat. Its clinical manifestations are dry cough or cough with scanty and sticky sputum or even with blood-streaked sputum, a dry mouth and throat, emaciation, a feverish sensation in the palms, night sweats, red cheeks and a hoarse voice, and other signs and symptoms.

The speech signals of the three groups were recorded using a microphone and were digitized using a 24-bit A/D acquisitive card (Brand CME Xcorpio, with a frequency response range of 20 Hz to 20 kHz and a dynamic range of 100 dB) at a 16 KHz sampling rate with an antialiasing function. The speech signals were recorded at a maintained collecting distance and position. The vowel /a:/ is easy for either patients or healthy people to pronounce. In addition, the vocal organ is not abuttal and there is no obstacle in cavity when someone is sending out the vowel [18]. Thus, each patient was asked to utter the vowel /a:/. Each subject produced a sustained stable phonation of vowel /a:/ that lasted for about 1 second.

The research was conducted at Shanghai University of TCM and its affiliated hospital after the approval of the moral and ethical committee of Shanghai University of TCM and informed consent of all subjects had been obtained.

2.2. Methods
2.2.1. Wavelet Packet Decomposition Algorithm

Wavelet transform is a time-frequency analysis method that uses different scales to obtain the best time-domain and frequency-domain resolutions in different parts of the signal. Resolution analysis conducts further decomposition in the low frequency part only to prevent the subdivision of the high frequency part [19, 20]. WPT provides a more precise decomposition for the signal analysis and carries out further decomposition in the high frequency part to subdivide the frequency bands in the low and high frequency parts synchronously. WPT can also self-adaptively select the signal resolution in different frequency bands to improve the time-frequency resolution [2123].

A common wavelet function consists of the Harr wavelet, Daubechies wavelet, SymletsA wavelet, Coiflet wavelet, Morlet wavelet, and Mexican Hat wavelet. After comparing the analysis results of db, coif, and sym wavelet functions, the db4 wavelet function with a high energy concentration was ultimately chosen for the analysis of the speech signals of the three sample groups [24]. Figure 1 shows the two-layer WPD tree, and Figure 2 shows the decomposition of a speech signal at layer 3, whereis the original signal,represents the approximations (the low frequency components), andrepresents the details (the high frequency components).

2.2.2. Fractal Dimension

Although the fractal dimension has several definitions [25], the box dimension definition is used in this paper for calculation convenience. Time sequenceof the speech signal is covered with a reticulation grid.represents the border length, anddenotes the number of mesh grids that contain set[26]. The box dimension definition is represented by the following equation:

The least square method (LSM) is used to approximate the line. The slope is represented by box dimension. The approximation is carried out as follows.(1)The original speech is unified to a unit square area with a gain signal of,.(2)The square area is divided into the mesh grids, and the andare calculated. The change inis recorded, and the corresponding andare calculated.(3)Let and is used to approximate lineby LSM.represents the box dimension, which is calculated as follows:

2.2.3. Wavelet Packet Fractal Dimension

Fractal theory is found to be in agreement with the wavelet analysis in terms of self-similarity and understanding things from coarse to fine scales [27]. This paper proposes a wavelet packet fractal theory that uses the WPD of the auscultation signals to compare the box dimension value and its changes in different frequency bands to reflect the irregularity, complexity, and nonstationarity of the signals. The box dimension values of the discrete signals range from 1 to 2, indicating that the more complex the signals are, the greater the box dimension values become.

2.2.4. Support Vector Machine

SVMs were first introduced by Vapnik (1998) to perform highly effective classification, regression, and pattern recognition processes [28]. SVM uses a hypothesis space of linear functions in a high-dimensional space and is trained with a learning algorithm from optimization theory that implements a learning bias from statistical learning theory. SVM uses a linear model to implement nonlinear class boundaries by nonlinearly mapping input vectors into a high-dimensional feature space using kernels [29, 30]. Previous studies suggest using the Radial Basis Function (RBF) kernel as a default kernel. The kernel parameters can be automatically chosen by optimizing a cross-validation-based model selection. One-against-one (1-versus-1) and one-against-rest (1-versus-r) are two popular multi-SVM schemes. In a-class problem, 1-versus-1 forms a training subset for every possible class pair combination and learns an SVM model from each subset. A total ofSVM classifiers are trained for all the combinations. The class for an unseen example is obtained by majority vote of all binary SVM classifiers. The classification result is affected if the same kernel parameters are used in all the SVM classifiers that apply the 1-versus-1 scheme. Therefore, the kernel parameters of each SVM classifier must be separately identified through cross-validation. A multiclass SVM classifier is used to identify the auscultation signals. Each SVM classifier chooses specific RBF kernel parameters through a grid search with a nested cross-validation [31].

3. Experimental Results

3.1. Feature Analysis

The auscultation signals were processed by a self-developed analytic program under the Matlab environment. The decomposition structure tree at layer 7 was used for the WPD of the auscultation signals in the computer experiment.frequency subbands were observed at theth layer after the WPD of the speech signals. The fractal dimensions of the speech voice in each subband were calculated with a sampling rate of 16,000 Hz and 16 bit precision. The frequency bands for these subbands are as follows: first layer (frequency interval = 4 kHz,= 0, 1), second layer (frequency interval = 2 kHz,= 0, 1, 2, 3), third layer (frequency interval = 1 kHz,), fourth layer (frequency interval = 0.5 kHz,), and fifth layer (frequency interval = 0.25 kHz,). After calculating the box dimensions of vowel /a:/ and observing the fractal dimension trajectory curve, the following findings were made.(1)The box dimension value in each subband ranges from 1.3 to 1.8, indicating the existence of a regular fractal dimension space distribution for the auscultation signals. This result also indicates that the box dimension value of auscultation signals is completely different from that of noise.(2)The auscultation signals of male subjects are different from those of female subjects.(3)Different auscultation signals have different box dimension values, and the box dimension values between the first layer and the second layer of WPD significantly differ. Figures 3 to 5 show different trends in the box dimension values between the first layer and the fifth layer subbands.

3.2. Statistical Analysis

The statistical analysis software SPSS 20 was used to analyze the differences between the samples. Age as covariant was included in the statistic model to correct the effect of age. The box dimension values of the WPT coefficients from the first to the fifth layers were analyzed to identify the significant differences between the two groups of samples. Table 2 shows 43 frequency bands with significantly different box dimension values from the first to fifth layers in all subjects. Table 3 shows 7 frequency bands with significantly different box dimension values from the first to fifth layers in the male subjects. Table 4 shows 48 frequency bands with significantly different box dimension values from the first to fifth layers in the female subjects.

3.3. Classification

The multiclass SVM was applied as a classifier to discriminate Group Q, Group Y, and Group H by the extracted features of box dimension. The RBF function was chosen as the kernel function. The classification model, in which every binary classifier had its own feature subset and RBF kernel parameters, was constructed to identify the auscultation signals based on the libsvm software [32].

In the process of classification, threefold cross-validation was applied on the classification of auscultation signals. For comparison of recognition accuracy, the experiments were done with different training data. The classification results using multiclass SVM were shown in Table 5. When the auscultation signals of all subjects were trained by SVM to decide the patterns of TCM syndromes, the overall accuracy of all subjects was 79.49%. When auscultation signals of male and female auscultation signals were trained, respectively, to decide the patterns of TCM syndromes, the overall accuracy of male subjects was 91.95% and that of female subjects was 72.07%. And the overall accuracy of male and female subjects was 86.05%, which was the weighted accuracy and the weight was the proportion of subject number of the group with different gender in all samples.

4. Discussions

In TCM, the signs and symptoms can be captured through four methods of diagnosis, namely, inspection, auscultation and olfaction, inquisition, and pulse taking. Combined use of four methods is necessary for acquiring full and detailed clinical information as well as disease diagnosis. Auscultation is indispensable part of four methods. The traditional auscultation depends on subjective hearing; however, accurate auscultation is difficult to be done by TCM doctors who lack experience and have decrease in hearing acuity. Therefore, objective research of auscultation is highly desirable, which contributes to quantitatively combined use of four methods, avoiding clinical misdiagnosis or missed diagnosis.

Box dimensioncan be used for analysis and classification in auscultation research. Independent from  , the auscultation signal shows a relatively steady fractal dimension value, and the dimension values among the subbands show significant differences. Figures 3(b)3(d) and Table 2 show that Group H and Group Y share the same variations in their box dimension values. The box dimension values of Group H and Group Y for all subjects are the largest in the low frequency band, then those become lower, and finally the box dimension values of all groups rise to certain extent in the high frequency band. However, the box dimension value of Group Q of all subjects is higher, and then it fluctuates in the other frequency bands. As shown in Figures 4(b)4(d) and Table 3, Group H has the similar variety of box dimension values to Group Q instead of Group Y. The box dimension values of all groups for all subjects are higher in the low frequency band. In addition, there is a peak in the frequency band around 4 kHz. However, the change of box dimension values of Group H and Group Y for female subjects in Figure 5 is similar to that in Figure 3. As shown in Tables 2 to 4, most frequency bands with significantly different box dimension values in female subjects, while least frequency bands with significantly different box dimension values in male subjects. It indicates that the female subjects have better class separability in wavelet packet fractal dimensions than the male subjects and all subjects.

The internal information on the auscultation signals shows some irregularities, and the fractal dimension trajectory distinctively varies among the subjects. The differences in the trends are reflected in the diverse internal information of the three groups, and this information can be applied for clinical diagnosis in TCM.

As shown in Table 5, when auscultation signals of all subjects were trained by classification model to decide the patterns of TCM syndromes, the overall accuracy was 79.49%. When those of male subjects were trained by classification model the accuracy was 72.07%, whereas the overall accuracy was up to 91.95% when those of female subjects were trained. Therefore, the overall accuracy of male and female subjects reached 86.05%. The resulted showed that the methods proposed in this paper were effective to analyze auscultation signals, and the performance of model can be greatly improved when distinction of gender was considered.

5. Conclusions

This paper selected the vowel /a:/ to be pronounced by each subject to decrease the interference, complexity, and uncertainty of the auscultation signal analysis. The auscultation signals were further processed using combination of box dimension and wavelet packet transform by self-developed analytic program under Matlab environment. Then the box dimension values of the auscultation signals were analyzed and compared in each frequency subband. The differences in the trends are reflected in the diverse internal information of the three groups, and this information was applied for the further classification of auscultation signals. The male and female classification models, if established separately, are applicable and effective in the auscultation analysis of TCM. The clinical subject size must be extended for future studies to verify our proposed methods. Our future research aims to construct an automatic auscultation system to assist in clinical diagnosis.

Conflict of Interests

The authors declare that they have no financial and personal relationships with other people or organizations that can inappropriately influence their work; there is no professional or other personal interest of any nature or kind in any product, service, and/or company that could be construed as influencing the position presented in, or the review of, the paper.

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant nos. 81302913, 81270050, 30901897, 81173199, and 30701072.