Abstract

Auscultation signals are nonstationary in nature. Wavelet packet transform (WPT) has currently become a very useful tool in analyzing nonstationary signals. Sample entropy (SampEn) has recently been proposed to act as a measurement for quantifying regularity and complexity of time series data. WPT and SampEn were combined in this paper to analyze auscultation signals in traditional Chinese medicine (TCM). SampEns for WPT coefficients were computed to quantify the signals from qi- and yin-deficient, as well as healthy, subjects. The complexity of the signal can be evaluated with this scheme in different time-frequency resolutions. First, the voice signals were decomposed into approximated and detailed WPT coefficients. Then, SampEn values for approximated and detailed coefficients were calculated. Finally, SampEn values with significant differences in the three kinds of samples were chosen as the feature parameters for the support vector machine to identify the three types of auscultation signals. The recognition accuracy rates were higher than 90%.

1. Introduction

TCM is considered a unique medical system because of its basic theories describing the physiology and pathology of the human body, disease etiology, diagnosis, and differentiation of symptom complexes. The zang-fu organs, according to TCM theories, comprise the core of the human body as an organic entity in which tissues and sense organs are connected through a network of channels and collaterals (blood vessels). In traditional Chinese medicine the zang and fu organs more importantly represent the generalization of the physiology and pathology of certain systems of the human body instead of simply anatomical substances, but Zang fu is comprised of the five zang and six fu organs. The five zang include heart, liver, spleen, lung, and kidney. The six Fu are the gallbladder, stomach, large intestine, small intestine, bladder, and triple burner. When one falls ill, a dysfunction in the zang-fu organs may be reflected on the body’s surface through the channels and their collaterals. At the same time, diseases involving body surface tissues may also affect their related zang or fu organs. Furthermore, the affected zang or fu organs may influence each other through internal connections [1]. In addition, auscultation, one of the auscultation and olfaction methods in TCM diagnosis, is used to detect vocal changes reflecting the functional activities of zang-fu organs and abundance or decline of the qi, blood, and body fluid.

Auscultation was clearly illustrated as early as in the Internal Classic of Huang Di [2], which provided the theoretical basis for clinical diagnosis in terms of listening to the vocal change. However, complete acoustic diagnostic methods have not been formulated. After the Ming and Qing Dynasties, auscultation gradually attracted the attention of the medical field with both theoretical content and clinical application considerably developed. Thus, a considerable distinctive step-by-step diagnostic method was formed. People around the world made substantial progress in the objective research of auscultation in the recent years with the development of computer and signal processing technology.

Mo made a frequency spectral analysis on the voice of cough patients using digital sonograph [3]. Wang and Yan performed a number of studies on the nonlinearity of the vowel /a/ signals of healthy persons and patients with deficiency syndrome by applying delay vector variance [4, 5]. These studies were effective attempts on the objective auscultation research. Chiu et al. proposed four novel acoustic parameters, such as the average number of zero crossings, variations in local peaks and valleys, variations in first and second formant frequencies, and the spectral energy ratio, to analyze and identify the characteristics among non-, qi-, and yin-deficient subjects [6].

There are several other studies on auscultation around the world [711]. These methods have provided a good basis for objective auscultation in clinical diagnosis. However, auscultation signal analysis and recognition are still in the initial stage. The experiment are conducted on a small sample database. Thus the recognition is not satisfactory such that further investigation is necessary to be carried out based on these studies.

The variations in energy imply corresponding changes in signal characteristics considering the changes in the normal and abnormal voice signals corresponding with the changes in the spatial distribution of the voice signal energy. In other words, the different signal frequency components can represent the different physical properties of the measured signal [12, 13]. Compared with the traditional Fourier transform time-frequency analytical method, the wavelet transform (WT) can reveal more information on signals based on multiscale and multiresolution decomposition. Wavelet packets have recently been applied to analyse auscultation signals because of their capability of partitioning both low- and high-band frequencies unlike the WT that often fails to capture accurately high-frequency information [1416].

Both approximate entropy (ApEn) and sample entropy (SampEn) can represent the signal complexity which can be used in many biomedical fields. ApEn was proposed by Pincus and Goldberg [17] to compute the quantitative information for the experimental data. However, there are some weak points in the ApEn computation process because its computation in irregular times is affected by a bias, in addition to the inconsistency of ApEn in some cases. SampEn, compared with ApEn, does not count self-matches and shows better relative consistency and less dependence on data length.

Daubechies 4 (db4) wavelet is selected in this paper as the wavelet packet function to decompose the auscultation signals into 5-level wavelet packet coefficients. Then, SampEn is proposed as a feature parameter extracted from these coefficients to analyze quantitatively the auscultation signals. Furthermore, statistical analysis is conducted to obtain the effective feature parameters with significant differences for the recognition of the voice signals. Finally, these feature values are used as input vectors of the support vector machine (SVM) classifier for automatic identification for qi- and yin-deficient, as well as healthy, subjects.

2. Materials and Methods

Feature parameters of auscultation signals were extracted using a combined WPT and SampEn (Figure 1). Traditional signal processing methods, including the Fourier transform (FT), fast Fourier transform (FFT), and short-time Fourier transform (STFT), cannot reveal the nonlinear information contained in the nonstationary signal. The non-linear information of the auscultation signal can be extracted under different time-frequency resolutions with this scheme.

2.1. WPT

Wavelets are generally well crafted to have specific properties that make them available for signal processing. WT has the capability of time-frequency analysis and can draw different frequency bands of the signal. However, with increasing scale, the higher the space resolution ratio of the wavelet functions, the lower the frequency resolution ratio will be. This phenomenon is a drawback of the wavelet function. WPT was developed to adapt the underlying wavelet bases to the contents of a signal. The basic idea is to allow subband decomposition to select adaptively the best basis for a particular signal. The WPT characteristic of narrowing wide window of frequency spectrum with increasing scale overcomes the shortcoming of the WT.

Given a finite energy signal whose scaling space is assumed as , WPT can decompose into small subspaces in a dichotomous way (Figure 2).

shows the th subspace in the th resolution level.

The dichotomous way is realised by the following recursive scheme: where is the resolution level and denotes orthogonal decomposition. , , and are three close spaces corresponding to , , and , respectively. satisfies the following equations: where and are the coefficients of the low- and the high-pass filters, respectively. The sequence of function generated from a given function is called the wavelet packet basis function.

The voice signal is a kind of transient, non-stationary, and random signal. Therefore, db wavelets have been widely implemented because of their advantage in matching the transient components in voice signals. Moreover, another main issue in wavelet analysis is the vanishing moment determined by trial-and-error methods. More points that can be neglected will emerge in the high frequencies if the degree of vanishing moment increases. Therefore, db wavelets with vanishing moments of 4, 6, 8, and 10 were chosen to decompose and reconstitute the voice signals in this study. The db4 wavelet function was selected after analysing the different effects of the wavelet functions to decompose and reconstitute the voice signals because the rate of decay and less point can be neglected.

The signal is decomposed into two subbands in the first level, namely, low- and high-frequency sub-bands. Then, the low-frequency subbands are further decomposed into lower- and higher-frequency parts in the following level, which was also performed in the high-frequency sub-bands. The same decomposition goes on repeatedly. Then, frequency sub-bands can be partitioned to be consistent with the signal features.

2.2. SampEn

SampEn examines time series for similar epochs and assigns a nonnegative number to the sequence, with larger values corresponding to greater complexity or irregularity in the data [18]. Self-matches in the SampEn algorithm are not included in calculating the probability, in contrast to the ApEn algorithm. The time series and similar patterns in parameter and tolerance window are used as two input parameters, which must be set before computation. For a time series , is the length of the time series. SampEn (m, r, N) is computed as follows [18].(1)The vectors defined by , for , are formed. These vectors represent consecutive values starting with the th point.(2)The distance between vectors and , , as the absolute maximum difference between their components is defined: (3)For a given , the number of , denoted as , is counted such that the distance between and is less than or equal to . Then, for , (4) is defined as (5)The dimension is increased to , and was calculated.

Thus, is the probability that two sequences will match points, whereas is the probability that two sequences will match points. Finally, SampEn can be defined as This value is estimated by the statistics:

2.3. SVM

SVM is a useful machine learning technique that has been successfully applied in the classification area. Classifying data is a common task in machine learning. In most cases, the data to be classified is linearly non-separable but nonlinearly separable in which the nonlinear support vector classifier can then be used. The main idea is to transform the original data into a high-dimensional feature space. Thus, it may be nonlinear in the original input space even though the classifier is a hyperplane in the high-dimensional feature space [19].

The product is replaced by a kernel function to construct a nonlinear support vector classifier. The following are some commonly used kernel functions:

polynomial (homogenous) polynomial (inhomogeneous) radial basis function Gaussian radial basis function hyperbolic tangent

The goal of SVM is to produce a model that predicts target values of data instances in the test set for which only the attributes are given. The following decision function is applied to determine which class the sample belongs to: The parameters and are the optimum solutions for specificity.

2.4. Clinical Data

Qi-deficient patients, based on TCM theory and clinical practice, exhibit the following characteristics: dispirited spirit, lack of qi and no desire to speak, discouraged, small voice; giddy dazzled, palpitations, sweaty, qualitatively weak tongue, tender, and feeble pulse. By contrast, yin-deficient patients are characterised as follows: emaciation, feverish sensation over the five centres, hot flushes, night sweats, and dry stool, among others. The subjects comprised voice signals from people of different age and sex. The detailed information is listed in Tables 1 and 2.

All these data are collected by our research partner the TCM Syndrome Laboratory of the Shanghai University of Traditional Chinese Medicine in its affiliated hospitals including the Longhua Hospital and the Shuguang Hospital. The voice is recorded using a high-performance microphone (the band is AKG model HSD171) and a 16-bit A/D converter connected to a computer. The frequency response range of the microphone is 60 Hz to 17 kHz. Its sensitivity is 1 mv/Pa (−60 dBV) with an impedance of 600 ohms. In addition, the sample frequency is 16 kHz. All the voice samples were collected by the acquisition system developed based on Visual C++ 6.0. The endpoint detection algorithm was applied to remove the nonvoice portions of the leading and trailing of each utterance.

The vowel /a/ was chosen as the utterance. Each subject produced a stable phonation of a sustained English vowel /a/ lasting about one second. This vowel is chosen because both patients and healthy subjects can easily pronounce this vowel. In addition, the vocal organ is not abuttal, and there is no obstacle in the cavity when this vowel is pronounced [20]. The pronunciation flow is unblocked, and a periodical waveform can be produced. Therefore, the vowel /a/ was mainly recently chosen as the utterance. The time-domain plot and spectrum of the vowel /a/ are shown in Figure 3.

2.5. Processing of Voice Signal Using WPT

The voice signals including three kinds of samples were analyzed using WPT in the first stage of processing of sample identification. Five levels of wavelet packet decomposition were applied as the preprocessing step for all subjects. The maximum frequency in high-frequency bands of the original signal is 8 kHz under the sample frequency 16 kHz, then the frequency interval of the coefficients for the frequency bands is 250 Hz in fifth level.

2.6. The SampEn Computation

In the second stage, SampEn values of approximation and detailed coefficients at each level of the wavelet decomposition were computed for the voice signals of the healthy subjects, as well as yin- and qi-deficient patients. In choosing the optimum parameters and , Pincus suggested and to 0.25 , where is the standard deviation of the original signal , . One of the original signals was chosen and analysed using different and values to better illustrate the advantages of the choice. The results are shown in Figures 4 and 5. We can easily see that the difference in the SampEn values was the largest among the signals of the three kinds of samples (shown in Figure 5). This condition indicates that the choice of the value is appropriate. We can also see that the SampEn value decreased as the parameter increased, although in a lower degree. Therefore, is selected as 0.2  appropriately.

3. Results and Discussion

3.1. Results on SampEn Values for WPT Coefficients

Voice signals from qi- and yin-deficient, as well as healthy, subjects were decomposed into sub-bands using WPT. The frequency bands for these sub-bands were as follows: (the frequency interval is 4 kHz, ), (the frequency interval is 2 kHz, ), (the frequency interval is 1 kHz, ), (the frequency interval is 0.5 kHz, ), and (the frequency interval is 0.25 kHz, ). SampEn values of the approximated and detailed coefficients under fifth-level WPT decompositions were computed using the selected parameters in Section 2.6.

The average SampEn values for the coefficients of the 1–5 levels are illustrated in Figures 6(a)6(e). The differences between healthy and qi- or yin-deficient samples are relatively high, except in 0–0.5 kHz and 7.5–8 kHz of the forth level and 0.25–0.0.5 kHz, 7.5–7.75 kHz and 7.75–8 kHz of fifth level. However, the differences between the qi- and yin-deficient samples are relatively low apart from the following frequency ranges: 0 kHz to 8 kHz in the 1–5 levels.

We also can see in Figures 6(a)6(e) that, with increasing wavelet packet levels, the frequency bands become more subtle. At the same time, more feature information contained in the voice signal is represented. Slight changes that cannot be reflected in low scales will be represented in high scales. Furthermore, the overall trend of SampEn values for qi-deficient, yin-deficient and healthy samples tends to be higher as frequency increases. The SampEn values of qi-deficient samples are lower than those of yin-deficient samples in most of frequency bands of 0–4 kHz in 1–5 levels, while the SampEn values for qi- and yin-deficient samples are intertwined in 4–8 kHz.

3.2. Statistical Analysis

Statistical analysis software, SPSS 20, was applied to analyse the differences among the samples. All SampEn values of the WPT coefficients from the first to the fifth levels were analyzed to obtain the features with significant differences among the three groups of samples. Tables 3, 4, and 5 shows there were 47 frequency bands having SampEn values with significant differences from 1 to 5 level.

3.3. Classification Analysis

LibSVM 2.93 was used to identify the auscultation signal. The feature parameters with remarkable differences (47 features in different bands) were chosen as the input vectors consistent with the format of the LibSVM. The SVM type is C-SVC, and the RBF function was chosen as the kernel function for nonlinear training and testing after numerous experiments. The optimum parameters and were obtained as 0.25 and 0.0625 using cross-validation ( is the penalty factor, and is the parameter for kernel function). Table 6 shows the classification results using SVM, in which a good result for classifying the samples (up to 96%) was obtained. This finding proves that the method applied in this paper is impressive.

3.4. Discussion

The quantitative analysis of the speech of healthy persons and deficient patients is one of the important task in the objectification and modernization of auscultation of TCM. The voices of healthy people are natural, gentle, clear, fluent, and understandable, while the patients with deficient syndrome speak feebly in low voice and discontinuously. The SampEn values of healthy samples are higher than qi- or yin-deficient samples in most of frequency bands. It may demonstrate that healthy persons have more physiological adaptabilities than the patients with deficiency syndrome. The variation trend of the SampEn values in the qi- and yin-deficient samples were almost similar, perhaps because both qi- and yin-deficient subjects belong to the deficiency syndrome, and the differences of voice signal characteristic between them are not remarkably significant. The classification result demonstrated that the SVM classifier was effective for the identification of the auscultation signals. Therefore auscultation analysis based on WPT-SampEn-SVM was suitable for the identification among qi- and yin-deficient, as well as healthy, subjects.

4. Conclusions

In this paper, we proposed a new method in identifying the auscultation signals in TCM including three kinds of samples, namely, qi- and yin-deficient, as well as healthy, samples. Instead of solely using traditional time or frequency domain features, we applied nonlinear dynamic parameter SampEn together with time and frequency analysis method to come up with the wavelet packet to obtain our feature parameters. Wavelet packets are specifically used because of their capability to partition both low- and high-frequency signals. At the same time, SampEn, a statistics parameter used to measure the predictability of the current amplitude values of a physiological signal, is adopted in our research to analyze the signals from three kinds of samples. Experimental results illustrated that WPT-SampEn-SVM-based analysis was suitable for the identification among qi- and yin-deficient, as well as healthy, subjects. Our future research will improve the performance of indentifying deficient patients by analyzing the SampEn variability of the signals of reconstructed coefficients in different frequency bands of each level. In addition, the clinical sample size will be extended for the verification of our methods.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grants no. 30701072, 81173199, and 30901897) and the Shanghai 3rd Leading Academic Discipline Project (Grant no. S30302).