Computational and Mathematical Methods in Medicine

Volume 2016 (2016), Article ID 8301962, 9 pages

http://dx.doi.org/10.1155/2016/8301962

## Improved CEEMDAN and PSO-SVR Modeling for Near-Infrared Noninvasive Glucose Detection

School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin 150001, China

Received 14 April 2016; Revised 18 July 2016; Accepted 27 July 2016

Academic Editor: Thomas Desaive

Copyright © 2016 Xiaoli Li and Chengwei Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Diabetes is a serious threat to human health. Thus, research on noninvasive blood glucose detection has become crucial locally and abroad. Near-infrared transmission spectroscopy has important applications in noninvasive glucose detection. Extracting useful information and selecting appropriate modeling methods can improve the robustness and accuracy of models for predicting blood glucose concentrations. Therefore, an improved signal reconstruction and calibration modeling method is proposed in this study. On the basis of improved complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and correlative coefficient, the sensitive intrinsic mode functions are selected to reconstruct spectroscopy signals for developing the calibration model using the support vector regression (SVR) method. The radial basis function kernel is selected for SVR, and three parameters, namely, insensitive loss coefficient , penalty parameter , and width coefficient , are identified beforehand for the corresponding model. Particle swarm optimization (PSO) is employed to optimize the simultaneous selection of the three parameters. Results of the comparison experiments using PSO-SVR and partial least squares show that the proposed signal reconstitution method is feasible and can eliminate noise in spectroscopy signals. The prediction accuracy of model using PSO-SVR method is also found to be better than that of other methods for near-infrared noninvasive glucose detection.

#### 1. Introduction

Diabetes is a chronic disease that poses a serious threat to human health. According to the International Diabetes Federation (IDF) in 2014, diabetes affects 387 million individuals around the world, and this figure is expected to increase to 592 million in 2035 [1]. The diabetes, cancer, and cardiovascular diseases are the main causes of death since 2005 [2]. At present, diabetes is treated by detecting blood glucose concentrations to adjust the dose of glucose-lowering drugs, thus controlling blood glucose levels to prevent and reduce the symptoms of diabetes and its complications [3]. The accurate detection of blood glucose concentrations is important in diabetes prevention and treatment.

Diabetes monitoring is usually carried out in hospitals or through self-monitoring [4]; diabetes monitoring commonly involves invasive detection, which uses high amounts of biochemical reagents, entails long testing times, and causes inevitable pain and inconvenience to patients. By contrast, noninvasive blood glucose detection [5–7] offers a number of advantages, such as fast analysis speed, absence of trauma, low cost, and environmental friendliness. Noninvasive optical detection technology [8–10] is an important research topic in the area of noninvasive blood glucose detection. Since the 1970s, scientists have applied optics to determine the chemical composition of the human body. Noninvasive optical detection technologies include a variety of methods, such as near-infrared spectroscopy [11, 12], infrared spectroscopy [13], polarimetry [14], photoacoustics [15], Raman spectroscopy [16], and light-scattering coefficient method [17].

Near-infrared light, the wavelength of which varies from 780 nm to 2526 nm, is the electromagnetic wave between visible light and mid-infrared light that can penetrate the human skin and tissues. A good linear correlation exists between blood glucose concentrations and near-infrared spectrum absorption. In recent years, near-infrared spectroscopy measurement has been widely employed and has thus become fast-developing technology for analysis, particularly in medical applications [11, 18, 19]. The research into near-infrared spectroscopy combined with chemometrics is regarded as an effective method for the noninvasive detection of blood glucose concentrations [11, 20].

Empirical mode decomposition (EMD), which is an adaptive time frequency data analysis method, is widely used in nonsteady and nonlinear systems [21]. However, mode mixing occurs in EMD. For example, different oscillations exist in the same intrinsic mode function (IMF), or similar oscillations exist in different IMFs. This problem is addressed with ensemble empirical mode decomposition (EEMD), which employs EMD to integrated signals with white Gaussian noise [22]. However, signals with added noise can produce varying numbers of IMFs, and reconstructed signals contain residual noise after decomposition. In complementary ensemble empirical mode decomposition (CEEMD), which can completely eliminate the residual noise in reconstructed signals [23], pairs of positive and negative noises are added to a signal to improve the efficiency of the original noise auxiliary method. EEMD or CEEMD will produce wrong ingredients components, and the IMFs obtained via decomposition may fail to meet the definition of IMF when parameter selection is ineffective. These limitations are resolved with another noise auxiliary algorithm, called CEEMDAN, which is used to achieve an accurate reconstruction of original signals and pure decomposed mode spectra [24]. The iterations of CEEMDAN are less than half of the iterations of EEMD. Moreover, CEEMDAN can accurately reconstruct original signals and recover the features of EMD that are lacking in EEMD. However, CEEMDAN still has some problems which need to be improved; for example, its modes contain some residual noise, and the signal information shows some spurious modes in the early stages of decomposition [25]. To overcome these two issues, the improved CEEMDAN method is applied in this paper to obtain modes with less noise and more physical meaning.

The EMD-based methods can decompose the signal into a series of IMFs which contain the noisy modes and information modes. Therefore, it can be powerful adaptive tool to extract the sensitive intrinsic mode functions to reconstruct the signal. The problems is how to select the sensitive mode to distinguish relevant IMFs and irrelevant IMFs in an efficient way. Reference [26] uses an analogue approach based on consecutive mean squared error (CMSE) criterion. The signal is reconstructed from the mode whose criterion is minimal. In [27], the authors propose an intuitive selected mode method by a new criterion based on Hausdorff distance (HD). Moreover, [28] introduces the mutual information (MI) to select the sensitive IMFs which can reflect the signal characteristics for signal reconstruction. In this paper, the correlative coefficient is used to select relevant IMFs to extract useful spectral information.

Chemometrics, which was proposed by Bhattacharjee in 1994 [29], employs a multivariate statistical analysis of calibration methods and computing technologies to calculate the sample content of each component combined with the near-infrared spectrum. Common linear chemometrics modeling methods include multiple linear regression, principal components regression, and partial least squares (PLS) regression. Examples of nonlinear modeling methods include artificial neural networks and support vector regression (SVR). Generally, modeling is the process of selecting parameters and methods. SVR can obtain the global optimal solution in spectrum detection and convert linear regression to nonlinear regression, as well as kernel function to the linear mapping of high-dimensional space. The basic principle of SVR, which is a regression method developed from support vector machine, is to map the original data to high-dimensional feature space through nonlinear mapping and to establish a regression model in this space. Applying SVR in near-infrared spectrum quantitative analysis modeling produces a good effect. The commonly used kernel functions include linear kernel functions, polynomial kernel functions, radial basis kernel functions, and sigmoid kernel functions. Many researches and experiments demonstrate that radial basis kernel functions are preferable options if previous knowledge is insufficient. The particle swarm optimization- (PSO-) SVR method is proposed to select , , and simultaneously. The results show the satisfactory learning precision and generalization ability with PSO-SVR.

The paper is organized as follows. Section 2 provides a description of the spectrum reconstruction method based on improved CEEMDAN and the PSO-SVR model. The CEEMDAN algorithm, improved CEEMDAN algorithm, correlative coefficient, PSO, and SVR are also introduced. Section 3 presents the near-infrared spectrum experiments on glucose solutions and the results of different modeling methods. Section 4 presents the conclusion of the study.

#### 2. Methods

##### 2.1. CEEMDAN Algorithm

The basis of CEEMDAN is EEMD. Thus, the decomposition theory of the EEMD method is described first [24].(1)Set , where is a different white Gaussian noise.(2)The modes of each can be obtained by EMD, where representatives modes.(3)The th mode of is set to , and the corresponding average of is

In EEMD, each independently decomposed produces residue . However, the decomposition modes are called , and the first residue is , where is obtained by employing EEMD. is the mean value of the result. with a different given noise is decomposed by EMD. The next residue is . Other modes continue this process until the stop condition is met.

Operator is the th mode of a given signal decomposed by EMD, where is the white noise with the mean value of zero and the variance of one.

If is the signal, then the steps of CEEMDAN are described as follows:(1)The signal is decomposed by EMD times to obtain the first mode: .(2)When , the first residue is calculated.(3) is decomposed until the first EMD mode is obtained. The second mode is then calculated: (4)When , the th residue is calculated.(5) is decomposed until the first EMD mode is obtained. The th mode is then defined:(6)Steps (4)–(6) are repeated until the obtained residue can not be decomposed; that is, the residue has a maximum of one extreme at most. The final residue meets , where is the total mode number. Thus the expression of signal is

##### 2.2. Improved CEEMDAN Algorithm

According to [25], the improved CEEMDAN algorithm is described based on CEEMDAN as follows:(1)For , calculate the local means of realizations by EMD to obtain the first residue, , where and is the operation which produces the local mean of the signal.(2)When , calculate the first mode: .(3)Estimate the second residue as the average of local means of the realizations ; then the second mode is defined(4)Calculate the th residue and th mode (): (5)Repeat (4) for the next .

##### 2.3. Correlative Coefficient

The correlative coefficient is widely applied in almost all areas of science and technology. The correlative coefficient is a dimensionless index used in multivariate statistics to represent the statistical relationship between two groups of variables. Its value ranges from −1 to 1, and it is divided into three classes, namely, positive correlation, irrelevant correlation, and negative correlation. Generally, certain processing in the computation is necessary to combine the negative correlation with the positive correlation. The value of the correlative coefficient ranges from 0 to 1, and a high value indicates a strong correlation. After setting the two groups of variables, namely, , the correlative coefficient is where , , , , and .

Thus, the correlative coefficient can be expressed as

##### 2.4. Signal Reconstitution Method

Signal characteristics are not evident because of the overlapped hydrogen absorption peaks in the near-infrared spectrum. Moreover, the modeling result using the original spectroscopy data is inferior, and the accuracy is not high. Therefore, removing useless components can produce satisfactory predictions and simplifies the model. According to improved CEEMDAN and the correlation coefficient, the signal reconstitution method can be concluded by employing the following steps.(1)The original signal is decomposed into by using the improved CEEMDAN algorithm, and is the number of IMFs.(2)All the correlative coefficient value between and the original signal is calculated using formula (9). The sensitive IMFs are selected according to the correlative coefficient threshold [30], which is shown in formula (10). In the formula above, represents the correlative coefficient between and the original signal, and the maximum number of correlative coefficient is denoted by . If the correlative coefficient value between and the original signal is larger than , then the relevant IMF is maintained as the sensitive mode. Otherwise, the relevant IMF is removed as a false component.(3)The sensitive IMFs are selected to reconstruct the signal for modeling.

##### 2.5. PSO-SVR Modeling Method

The PSO algorithm is a type of parallel global search strategy that is based on population. It is easy to implement, and its concept is relatively simple; in PSO, many parameters no longer require adjustments. PSO exhibits a fast convergence speed and the capability of dealing with high-dimensional problems.

The speed-position model is used in the PSO algorithm. In the -dimension solution space, the position of the th particle in the group is , and the velocity ratio is . The individual extreme value at the current time is , and the global extreme value is . In each iteration process, the particles adjust the position and velocity of the current time by tracking the individual extreme value and global extreme value and state in the previous time. The iterative formula is shown as follows:where are the velocity and position at the current moment and next moment, respectively; is the random number within , and are the learning factors which are usually equal to two. is the weighting factor that should automatically decrease with algorithm iteration to accelerate convergence speed; it is generally defined aswhere and are the maximum and minimum weighting factors, respectively, is the current iteration number, and is the total iteration number.

For the sample data set , the regression function obtained by SVR fitting iswhere and are Lagrangian operators and is the threshold. Consider the following:

For PSO-SVR, the position and velocity of each particle are determined by 3D parameters (). The mean square error (MSE), which can directly reflect the regression performance of SVR, is used as the fitness function:where is the estimated value of a new sample.

The steps for the optimal selection of parameters () in PSO-SVR are described as follows.(1)The particle swarm are initialized. Group size is determined, the maximum and minimum weighting factors of algorithms , are identified, and the maximum iteration number is set.(2)The individual extreme value of each particle is set as the current position. The fitness of each particle is set using the fitness function, namely, formulas (13) and (15). The individual extreme value corresponds to the particle with the best fitness as the global extreme value .(3)On the basis of steps (1)–(3) for iteration calculation, the position and speed of particle are updated.(4)The fitness of each particle is evaluated using formulas (13) and (15).(5)If the fitness of each particle is better than the corresponding fitness , then is updated. Otherwise, the original value is retained.(6)If the updated of each particle is better than the global extreme value , the is updated. Otherwise, the original value is retained.(7)If the maximum iteration is reached or if the solution does not change, the iteration is stopped. Otherwise, the process returns to step (3).

#### 3. Experimental Results and Discussion

##### 3.1. Simulation Signal Reconstruction Experiments

Consider the original signal . The length of the data is 1024, as shown in Figure 1. White Gaussian noise is added to the original signal with the input signal to noise ratio (SNR) fixed at 5 dB. Noisy signal (Figure 2) is decomposed into eight modes. Figure 3 indicates that the sixth and seventh modes are two components of a pure signal. In the proposed method, the correlative coefficients between each IMF and noisy signal are calculated, and the threshold is obtained with formula (7). The reconstructed signal is the sum of IMFs with correlative coefficients larger than the threshold. For noisy signal , the correlative coefficients of IMF6 and IMF7 are larger than the threshold (0.16721), which is shown in Table 1. The IMFs are arranged from high to low frequency, and the noise is often concentrated around the first IMFs. The first three modes should be removed to reconstruct the signal regardless of whether the correlative coefficients are larger than the threshold in the proposed method. The reconstructed signal is presented in Figure 4.