Abstract

Diabetes has been one of the four major diseases threatening human life. Accurate blood glucose detection became an important part in controlling the state of diabetes patients. Excellent linear correlation existed between blood glucose concentration and near-infrared spectral absorption. A new feature extraction method based on permutation entropy is proposed to solve the noise and information redundancy in near-infrared spectral noninvasive blood glucose measurement, which affects the accuracy of the calibration model. With the near-infrared spectral data of glucose solution as the research object, the concepts of approximate entropy, sample entropy, fuzzy entropy, and permutation entropy are introduced. The spectra are then segmented, and the characteristic wave bands with abundant glucose information are selected in terms of permutation entropy, fractal dimension, and mutual information. Finally, the support vector regression and partial least square regression are used to establish the mathematical model between the characteristic spectral data and glucose concentration, and the results are compared with conventional feature extraction methods. Results show that the proposed new method can extract useful information from near-infrared spectra, effectively solve the problem of characteristic wave band extraction, and improve the analytical accuracy of spectral and model stability.

1. Introduction

Diabetes is one of the major diseases threatening human health, and the number of people with diabetes is growing at an alarming rate. More than one hundred million people suffer from diabetes; furthermore, the number is expected to increase to 592 million by 2035 [1]. Although proper diet and insulin injection can be used to regulate blood glucose levels, serious complications are caused in the later stage of diabetes, such as heart failure and blindness [2]. Therefore, the treatment of diabetes is very important, and the concentration of blood glucose detection is the foundation of diabetes treatment. The noninvasive blood glucose detection technology that measures the glucose concentration in the blood under the condition of no skin damage includes near-infrared spectroscopy, photo acoustic spectroscopy, polarization method, fluorescence method, and dielectric spectroscopy method [35]. Compared with the near-infrared spectra method, other noninvasive blood glucose detection methods are not perfectly suitable for real-time detection. The signals are hard to be detected and easy to be interfered by other components. At present, noninvasive blood glucose detection based on near-infrared spectra has become the research focus at home and abroad. Near-infrared spectroscopy (NIR), which is generated from molecular vibrations and reflects the chemical bond information, such as C-H, O-H, N-H, and S-H, can measure most kinds of compounds and their mixtures. Compared with the traditional analytical techniques, NIR has been widely applied because it is highly efficient and causes no damage and pollution [68]. The main structure and composition of glucose information is contained in the near-infrared spectra. The useful glucose information can be extracted form spectral data; then, the data after pretreatment are used to establish a mathematical model to calculate the glucose concentration. In the field of biomedicine, NIR combined with chemometrics is considered one of the most effective methods for noninvasive blood glucose concentration detection [3]. The common chemometrics methods include multiple linear regression (MLR), principal component regression (PCR), partial least squares regression (PLSR), and support vector regression (SVR). The MLR is limited by the noise in spectral data, and the irrelevance between some principal components and the actual content appears in the PCR. Therefore, the PLSR method and SVR method are applied in this paper. However, certain technical difficulties exist in noninvasive blood glucose measurement because near-infrared spectral samples cannot be pretreated, namely, the complex background, overlapped spectral peaks, and less effective information rate. Therefore, extracting effective information from the original spectra is critical for establishing an ideal mathematical model. The effective extraction of glucose characteristic information from nonlinear and nonstationary near-infrared spectral signals can improve the detection efficiency and detection precision.

In 1984, Shannon introduced entropy to the field of information theory and proposed the concept of information entropy to measure the uncertainty of events [9]. Subsequently, the concept of entropy was gradually generalized. In 1991, Pincus proposed the concept of approximate entropy (ApEn), which has the advantages of short required calculating data and excellent antinoise ability [10] and offsets the shortcomings of nonlinear analysis. However, the data has no relevance and the errors are produced in the computational process of ApEn. To observably improve the accuracy and efficiency of the ApEn method, Richman and Moorman proposed an improved ApEn in 2000 called sample entropy (SampleEn) [11]. Compared with the ApEn algorithm, SampleEn has short required data and robust antinoise and anti-interference abilities, well consistent in the range of large parameters such unique advantages. The definition of SampleEn must contain a template match; otherwise, it is meaningless. Therefore, Chen et al. improved SampleEn and first defined a new measure of sequence complexity, named fuzzy entropy [12]. This new measure fuzzifies the similarity measure formula with an exponential function to enable the fuzzy entropy value to transition smoothly with changing parameter. Its definition still has significance when the parameter is small, and it inherits the relative consistency and short data set-processing characteristics of SampleEn. Bandt and Pompe proposed a randomness detection method of a time series, namely, permutation entropy (PE), which can detect the randomness of time series and dynamic mutation behavior [1318]. Permutation entropy calculates entropy based on permutation patterns by comparing the neighboring values of the time series [19]. PE between 0 and 1 has the advantages of simple concept, fast calculation speed, and robust anti-interference ability, and it is especially suitable for nonlinear data.

The key point of near-infrared spectra noninvasive blood glucose detection is to extract the characteristic information from the spectral signal. The near-infrared spectral signals of glucose solution are nonstationary and noisy, but the calculation of PE has a certain antinoise and anti-interference ability. In this study, the feature extraction of spectral information is investigated with glucose solution as the research object from the perspective of whole information from a signal. This paper is organized as follows. Section 2 describes the principles of ApEn, SampleEn, fuzzy entropy, and PE and then briefly introduces the methods of near-infrared spectral characteristic band extraction of a glucose solution, such as fractal dimension, mutual information, and the modeling methods, such as PLSR and SVR. In Section 3, the application of the proposed method is presented, and PLSR and SVR are used to establish calibration models with the extracted characteristic bands, as well as verify the validity and superiority of the proposed method. Finally, the conclusion is drawn in Section 4.

2. Theory and Methods

2.1. Entropy
2.1.1. Approximate Entropy

In 1991, Pincus defined ApEn as a conditional probability that the similarity vector maintains its similarity when it increases from dimension to dimension. The physical meaning is the probability of generating a new pattern of time series when the dimension changes. ApEn has the following advantages: (1) short required data, (2) robust antinoise and anti-interference abilities, and (3) applicability for deterministic and stochastic signals and a mixed signal composed of deterministic and stochastic signals. The steps of the ApEn algorithm are as follows [10]: (1)Given a time series of length , , reconstruct a m-dimensional vector according to the formula .(2)Compute the distance between arbitrary the vector and the vector . The distance between the two vectors is the maximum absolute value of the difference between two corresponding elements in two vectors.(3)Specify the threshold , which is typically between 0.2 and 0.3. For each vector , find the number of ( is the standard deviation of the sequence) and calculate the ratio between this number and the total number of distances , which is denoted as .(4)Take the logarithm of , average for all , and denote . (5)Increase by 1 and repeat steps 1 to 4 to obtain and .(6)Obtain the ApEn from . (7)For a finite time series, ApEn can be estimated by a statistical value.

The parameters ,, and in the above steps are the length of time series, length of the comparison window, and margin of similarity, respectively. The bigger the value of is, the more dynamic the process can be reconstructed.

2.1.2. Sample Entropy

The physical meaning of SampleEn is the same as that of ApEn. The larger SampleEn is, the higher the complexity of the sequence and the greater the probability of generating the new pattern will be. The specific algorithm implementation process is as follows [11]: (1)Given the time series , compose a set of dimension vectors according to the serial number order. (2)Define the distance between vector and vector as the largest difference between their corresponding elements, namely, (3)Define the threshold . For the value of each , find the number of , and calculate the ratio between and the total number of distances , which is denoted as . The average for all is as follows: (4)Increase the dimension to and repeat the above steps to obtain (5)Theoretically, the SamEn of this sequence is as follows: where . In practice, is not an infinite value. When is a finite value, SamEn is calculated as follows:

2.1.3. Fuzzy Entropy

In the definition of fuzzy entropy, the concept of a fuzzy set is introduced, and the exponential function is chosen as the fuzzy function to measure the similarity of two vectors. The exponential function has the following expectation properties: (1) continuity of the exponential function ensures that its value does not have a mutation and (2) the nature of the exponential function ensures that the self-similarity value of the vector is maximum. The definition of fuzzy entropy is as follows [12]: (1)The sampling sequence with points is .(2)Compose a set of -dimensional vectors according to the serial number order. where represents the continuous values of starting from the th point. is its mean value. (3)Define the distance between vector and vector as the largest difference between their corresponding elements, namely, (4)Define the similarity between vector and vector , with a fuzzy function , namely, where the fuzzy function is an exponential function, and and are the gradient and width of the exponential function boundary, respectively.(5)Define the function as follows: (6)Similarly, repeat steps 2 to 5, reconstruct a set of -dimensional vector according to the serial number order, and define the following function: (7)Define the fuzzy entropy as follows: When is a finite value, the value obtained by the above steps is the estimated value of the fuzzy entropy of the sequence with length .

2.1.4. Permutation Entropy

According to [13], the definition of PE is setting a time sequence , and reconstruct it in phase space to obtain the matrix where and are the embedding dimension and delay time, respectively, and . Each row in the matrix can be regarded as a reconstructed component, with a total of reconstruction components. The th reconstruction component of the reconstruction matrix is rearranged according to the values in ascending order. represents the index of the column in which the individual elements of the reconstructed component are as follows:

If equal values in the reconstructed component are observed, the components are arranged according to the size of the value of and , that is, when , .

Therefore, for an arbitrary time series , a set of symbol sequences can be obtained from each row in the reconstructed matrix where and . is observed when the -dimensional phase space map has a different symbolic sequence , and the symbolic sequence is one kind of arrangement. If the probability of the occurrence of each symbol sequence is , the PE of k kinds of different symbol sequences of time series in terms of Shannon entropy is as follows: When , reaches the maximum value . For convenience, is typically normalized with , namely,

The magnitude of represents the randomness degree of the time series . The smaller the value of is, the more inerratic the time series will be; otherwise, the more stochastic the time series will be. The change in reflects and amplifies the minute details of the time series.

2.1.5. Application to Simulation Signal

In order to compare the ApEn, SampleEn, fuzzy entropy, and PE, define a mixed signal composed of deterministic signal and stochastic signal with a different probability, where , is the stochastic signal in , is the data length, and . Considering the four kinds of entropy of signal with and , note ApEn1, SampleEn1, FuzzyEn1, PE1, ApEn2, SampleEn2, FuzzyEn2, and PE2 for convenience, respectively. Their change relation with signal amplitude , signal length , and signal-to-noise ratio of signal are shown in Figures 13.

Figure 1 shows that ApEn, SampleEn, and FuzzyEn change slightly with the same probability, when signal amplitude changes bigger gradually. However, the PE has been the same all the time with a different probability, which illustrates that the PE has excellent stability and consistency.

Figure 2 shows that the ApEn, SampleEn, and FuzzyEn change with the signal length and remain unchanged after with the same probability, which illustrates that the values of ApEn, SampleEn, and FuzzyEn are related to the data length. However, the PE has been the same all the time with a different probability, which illustrates that the required time sequence of PE is shorter in the calculation process.

Figure 3 shows that the ApEn, SampleEn, and FuzzyEn change with the signal-to-noise ratio of the signal as the same trend with the same probability, which illustrates that the values of ApEn, SampleEn, and FuzzyEn are affected by SNR of the signal. However, the PE has been the same all the time with a different probability, which illustrates that the PE has robust antinoise ability. Therefore, the PE is used for extracting the feature information of near-infrared spectral data of glucose solution in this study.

2.2. Fractal Dimension

Dimension, an important feature of geometry, characterizes the size of the space a shape occupies. Because Euclidean geometry objects are relatively regular shapes, the obtained dimension is an integer. However, Euclidean geometry is not applied for irregular complex shapes. Most natural geometries exhibit similar properties. Therefore, the spatial dimension of an object is not always an integer; it can also be fractional. The noninteger dimension is the real dimension of most geometric shapes; integers are only special cases. The noninteger dimension is a different concept, but it is suitable for all the geometric shapes in nature. In 1919, Hausdorff, who studied the properties of singular sets, first proposed the concept of fractal dimension [20] and defined the Hausdorff measure and dimension theory. Since then, several scholars have developed various dimensions, such as self-similar dimension, box dimension, information dimension, correlation dimension, and Lyapunov dimension. The box dimension is one of the most common fractal dimensions because of its ease of calculation, few parameters, and ease of application. The box dimension is defined as the way that the set is covered by a hypercube with size .

is a nonempty and bordered subset . If it is covered by hypercube with length , then

The above formula is the definition of the fractal box dimension. The steps are as follows: (1)Set discrete signal , and is the closed set in the -dimensional Euclidean space .(2)Divide with the grids as small as possible, and is the grid counts of set . The limit in the above formula cannot be determined by definition; so, an approximate method is used in calculation. The grid is used as the reference, and it is enlarged to the grid, where . In this way, is the grid count of set in discrete space, and the following formula can be obtained: where is the number of sampling points.

The grid count is as follows: where . In the graph of , the scale-free region is determined with good linearity. The beginning and end points of the scale-free region are and ; thus,

Finally, the slope of the line is determined by the least square method,

The box dimension is as follows:

2.3. Mutual Information

Information is the movement state and the way the movement state changes items that are felt and expressed by the cognitive subject. Two kinds of metric forms are identified for information. One measures how much information the message or message collection itself contains; another one is the measure of how much information is provided between messages or message sets. The former is described by self-information entropy and message entropy, whereas the latter is described by mutual information (MI) and average mutual information. Mutual information measures the degree of interdependence between two variables and represents the amount of shared information between two variables. For two given random variables and , if their respective marginal probability distribution and joint probability distribution are , and , the definition of mutual information between them is as follows:

When the variables and are completely unrelated or independent with each other, the mutual information is the minimum 0, which means no information overlaps between the two variables. By contrast, the higher the degree of interdependence between the two variables is, the greater the value of mutual information will be, and the more similar the information it contains.

2.4. Partial Least Squares Regression

In 1984, Wold and Albano first proposed PLSR, which was a new multivariate statistical data analysis method and studied the regression modeling of multiple dependent variables [21].

Given independent variables and dependent variables, the study of the statistical relationship between the independent variable and dependent variable involves the observation of sample points, which form the data tables and of the independent variable and dependent variable. In PLSR, the components and are first extracted form and ; namely is the linear combination of and is the linear combination of . After the first components and are extracted, the regression of on and on is performed by PLSR. The accuracy of the model is validated; if the regression equation reaches satisfactory accuracy, the algorithm is terminated. Otherwise, the residual information of interpreted by and the residual information of interpreted by will be extracted for the second round. This step is repeated until accuracy is satisfactory. Finally, if components form , PLSR can be expressed as the regression equation of about the original variables by implementing regression with on .

2.5. Support Vector Regression

The support vector machine (SVM) is a new machine learning method proposed by Vapnik et al. based on statistical learning theory. SVM has the characteristics of small sample learning and strong generalization ability, which can avoid the problems of overlearning and local minimum. By introducing the insensitive loss function , Vapnik et al. have extended the SVM to the regression estimation of the nonlinear system and established the SVR algorithm. SVR has been widely used in function estimation, nonlinear system modeling, and other fields [22].

For the sample set ( is the input vector, is the corresponding target value, and is the number of samples), the SVR function is as follows: where is the nonlinear map transforming data into a high-dimension feature space and and are coefficients.

2.6. Near-Infrared Spectra Data

In the near-infrared spectra noninvasive blood glucose measurement experiments, the blood glucose solution is temporarily replaced by glucose solution. All the glucose solutions, which concentration ranges from 50 mg/dL to 1000 mg/dL, are continuous and are equally distributed liquid that is uniformly configured under the same conditions. The prepared samples of glucose solution are put into the detection system of spectrometer. All the experimental data are collected by Fourier spectrometer Antaris II FT-NIR, produced by America Thermo Company. Its spectral range is 833–2630 nm, resolution is 4 cm−1 across spectral range, wavenumber reproducibility is better than 0.05 cm−1, and wavenumber accuracy is ±0.03 cm−1. The data meet the measurement principle of Lambert-Beer’s law. All the collected near-infrared spectral data of glucose solution are measured five times with the same concentration in order to get a small statistical error and shown in Figure 4.

3. Results and Discussion

The principle of selecting optimal wave bands has two key points: (1) information on the selected bands is large and (2) the correlation between bands is small. The widely used extraction methods include the information comparison of each band, information correlation between bands, best index method, entropy, and joint entropy of band data. In feature extraction, the fractal dimension (FD) can be used as a feature because almost all signals have fractal characteristics. The fractal dimension can distinguish different signals on the premise that two signals have different dimensions under the same measure. If two signals come from the same state, they have similarities, and their fractal dimensions are similar. The correlation or correlation coefficient can only reflect the linear correlation between two variables but cannot measure the nonlinear relationship between them. However, from the view of information theory, mutual information can estimate the total information amount between variables, is not limited to a linear relationship, and has greater advantages than correlation comparison. PE analysis can effectively determine the similarity between sequences and show the strong similarity and distinction, which can be further applied to biological sequence analysis. PE of the time series is calculated using the PE algorithm. The ratio value of PE is the basis for analyzing the similarity between sequences. Therefore, the PE, fractal dimension, and mutual information are used in extracting the feature information of near-infrared spectral data of glucose solution in this study.

In the PE calculation, there are three parameters needed to be considered and set, namely, data length , embedded dimension , and time delay . According to reference [23], has a little effect on the PE value and there are very small differences among the PE values with different . Therefore, is chosen in this study. In order to discuss the relationship of and with PE value, the PE values of the given signal with data length 128, 256, 512, 1024, and 2048 are calculated in Figure 5. Figure 5 shows that the PE values are almost same in with different data length . If is too small, the algorithms lose significance and effectiveness because of the few states contained in reconstructed sequence. If is too big, the time series will be homogenized by phase space reconstruction; thus, the computation is time-consuming and the subtle changes in sequence cannot be reacted. Therefor and , which is the length of collected spectra data, are chosen in this study.

Because PE cannot be affected by noise (), the characteristic wave bands are extracted from the collected near-infrared spectroscopy data of the glucose solution. Full spectral wavelength data have 1867 points in total, which are divided into wavelength intervals with 50, 100, 150, and 200 points. The ratio of PE of the corresponding wavelength interval between 50 mg/dL, 500 mg/dL, 1000 mg/dL, and pure water solution, the ratio of FD of the corresponding wavelength interval between 50 mg/dL, 500 mg/dL, 1000 mg/dL, and pure water solution and the MI values of the corresponding wavelength interval between 50 mg/dL, 500 mg/dL, 1000 mg/dL, and pure water solution are shown in Figure 6. As shown in Figure 6, the ratios of FD values and MI values of each wavelength interval have no obvious difference. The PE values of some wavelength intervals are substantially consistent, and other wavelength intervals are significantly different. Therefore, the later wavelength intervals are the characteristic wave bands that contained abundant glucose concentration information. However, the PE values of wavelength intervals that are divided with less than 50 points or more than 200 points of different concentration spectra have no obvious distinguished law. In order to improve the precision and accuracy of feature wavelength extraction, the characteristic wavelength intervals are extracted in four uniform ways (50, 100, 150, and 200), and their overlapping intervals are considered as the final characteristic wavelength intervals (Table 1).

In order to verify the effectiveness of the proposed method, the characteristic wavelength intervals of the collected spectral data of glucose solutions with the FD method, MI method, the proposed method, successive projection algorithm (SPA) method [24], EMD-SPA method [25], and full spectral data are taken into the calibration models that were established by PLSR and SVR. The correlation coefficient and root mean square error correction (RMSEC) of the model are evaluated. The characteristic wavelength points that were extracted based on the PE method are 150, which is much less than the full spectral wavelength points, and the smaller selected characteristic wavelength points are, the shorter the established model time is. The experimental results of SVR and PLSR calibration model (Tables 2 and 3) show that the correlation coefficient (R) and RMSEC of established calibration model by characteristic wavelength intervals that were extracted based on the PE method reach 0.9998/0.9897 and 0.0346%/0.0468%. The results are better than that of the established calibration model by characteristic wavelength intervals that were extracted based on FD method, MI method, SPA method, EMD-SPA method, and by full spectra data. The overall modeling results of SVR are more reliable to that of PLSR modeling method.

4. Conclusions

The feature wave band extraction method that was based on permutation entropy that is proposed in this study for near-infrared spectra noninvasive blood glucose detection. The spectra data do not need denoise because PE has the advantages of robust antinoise and anti-interference abilities, and it is especially suitable for nonlinear data. Taking the near-infrared spectra data of glucose solutions as the object, all of the collected near-infrared spectra are divided with different interval points, and the ratio values of PE of corresponding spectra intervals are calculated. The overlap spectral intervals contained abundant glucose concentration information, which were extracted for reducing the effective range of data. Then, the PLSR and SVR methods are introduced for establishing the calibration models with characteristic spectra that were extracted by the proposed method, FD method, MI method, SPA method, EMD-SPA method, and full spectral data. According to the correlation coefficient and RMSEC of the calibration models, the proposed feature extraction method effectively solves the redundancy problem of near-infrared spectra data, and it also improves the robustness and predictive ability of regression model.

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Program for Harbin City Science and Technology Innovation Talents of Special Fund Project (Grant no. 2014RFXXJ065) and the Fundamental Research Funds for the Central Universities (Grant no. HIT. IBRSEM. 201307).