Journal of Spectroscopy

Volume 2018, Article ID 2689750, 8 pages

https://doi.org/10.1155/2018/2689750

## Nonlinear Regression with High-Dimensional Space Mapping for Blood Component Spectral Quantitative Analysis

Correspondence should be addressed to Yan Zhou; nc.ude.utjx.liam@uohz.nay

Received 11 July 2017; Accepted 25 October 2017; Published 10 January 2018

Academic Editor: Vincenza Crupi

Copyright © 2018 Xiaoyan Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Accurate and fast determination of blood component concentration is very essential for the efficient diagnosis of patients. This paper proposes a nonlinear regression method with high-dimensional space mapping for blood component spectral quantitative analysis. Kernels are introduced to map the input data into high-dimensional space for nonlinear regression. As the most famous kernel, Gaussian kernel is usually adopted by researchers. More kernels need to be studied for each kernel describes its own high-dimensional feature space mapping which affects regression performance. In this paper, eight kernels are used to discuss the influence of different space mapping to the blood component spectral quantitative analysis. Each kernel and corresponding parameters are assessed to build the optimal regression model. The proposed method is conducted on a real blood spectral data obtained from the uric acid determination. Results verify that the prediction errors of proposed models are more precise than the ones obtained by linear models. Support vector regression (SVR) provides better performance than partial least square (PLS) when combined with kernels. The local kernels are recommended according to the blood spectral data features. SVR with inverse multiquadric kernel has the best predictive performance that can be used for blood component spectral quantitative analysis.

#### 1. Introduction

The component concentration in human blood may be an indicator of some diseases. Fast and accurate determination is very essential to the early diagnosis of the diseases. For instance, the serum uric acid (UA) level can be used as an indicator for the detection of diseases related to purine metabolism [1–3] and leukemia pneumonia [4–6]. Various analytical methods are developed for the determination of UA. These include electrochemical and chemiluminescence [7, 8], high-performance liquid chromatography [9, 10], and spectroscopic quantitative analysis [11, 12]. As spectroscopic quantitative analysis only requires a small sample with easy preparation, the method for blood analysis attracts more attention [13, 14].

In a spectroscopic quantitative analysis, when radiation hits a sample, the incident radiation may be absorbed, and the relative contribution of absorption spectrum depends on the chemical composition and physical parameters of the sample. A spectrometer is used to collect a continuous absorption spectrum. The concentration of the component could be predicted by a regression algorithm [15–17]. Partial least square (PLS) regression and support vector regression (SVR) models have been applied [18–20]. PLS focus on finding the wavelengths that have the closest relationship with the concentration regression. SVR operates on the structural risk minimization (SRM) principle and the Vapnik-Chervonenkis (VC) theory [21, 22]. SVR uses the SRM principle instead of the traditional empirical risk minimization (ERM) which equips the model great generalization. The wildly used linear regression models may not be guaranteed in practice for some restrictions on the spectral data [23, 24]. The spectral data collected by low precision system always exhibits a characteristic of nonlinearity. Moreover, a high concentration of blood component may be beyond the optical determination linear range [25]. Samples with higher concentration should be diluted to meet the linearity requirement. The kernel method can be introduced to overcome the restriction [26]. Using the kernel method, the input vectors are mapped into a higher dimensional feature space and makes nonlinear problems into linearly or approximately linearly problems. A kernel can be any function that meets Mercy’s condition. Kernel-based regression methods are reported for spectral quantitative determination. The comparison of SVR with Gaussian kernel and other four nonlinear models are presented by Balabin and Lomakina [27]. The performance of SVR with Gaussian kernel and PLS model for fruit quality evaluation is compared by Malegori et al. [28]. The evaluation of SVR with Gaussian, polynomial, sigmoid, linear kernel, and PLS for Biodiesel content determination is discussed by Alves and Poppi [29]. As the most famous kernel, Gaussian kernel is usually adopted by most of the researchers (three more traditional kernels are discussed by Alves and Poppi). Different kernels describe different high-dimensional feature space mapping which affects regression performance [30, 31]. More kernels are needed to be discussed to evaluate the high-dimensional space mapping. Moreover, PLS can be extended to nonlinear regression and compared with SVR with the same kernel.

Nonlinear regression with high-dimensional space mapping for blood component spectral quantitative analysis is discussed in this paper. Kernels are incorporated with PLS and SVR to realize nonlinear regression in the original input space. The kernel extension of PLS and SVR is completed by replacing the dot product calculation of elements with the kernel. Eight kernels are used in this paper to discuss the influence of different space mapping to the blood component spectral quantitative analysis. Each kernel and corresponding parameters are assessed to build the optimal nonlinear regression model. The dataset obtained from spectral measurement of uric acid concentration is used to evaluate the effectiveness of the proposed method. The experiment results are analyzed, and the mean squared error of prediction (MSEP) is used to compare the predictive capability of the various models.

This article is organized as follows. The methods are introduced in Section 2. The experimental process is explained in Section 3. The results are analyzed in Section 4. Finally, Section 5 concludes the paper.

#### 2. The Methods

##### 2.1. PLS

PLS is advantageous to ordinary multiple linear regression for it examines for collinearities in the predictor variables. It assumes uncorrelated latent variables which are linear combinations of the original input data. PLS relies on a decomposition of the input variable matrix based on covariance criteria. It finds factors (latent variables) that are descriptive of input variables and are correlated with the output variables. For PLS, the concentration of the blood component () is calculated by the following linear equation: , where is an input matrix of wavelength signals, is a matrix of regression coefficients, and is a bias vector. Matrix has the form where the latent vectors and are linear combinations of input and output variables.

##### 2.2. SVR

For SVR, a linear regression can be performed between the matrix of wavelength signals and corresponding blood component concentration : , where is the matrix of weight coefficients and is a bias vector. According to Lagrange multiplier and Karush-Kuhn-Tucker (KKT) condition where is a variable of matrix , and are the corresponding Lagrange coefficients, and is the number of samples. The linear regression equation can be written as

##### 2.3. High-Dimensional Mapping

The regression ability of linear model could be enhanced by mapping the input data into high-dimensional space. By using the kernel method, the algorithm realizes a prediction in high-dimensional feature space without an explicit mapping of original space. A kernel describes the function of two elements in the original space which is concerned to be the dot product of them in feature space. A kernel extension of a linear algorithm can be completed by replacing the dot product calculation of elements. The combination kernel extension of PLS and SVR will be introduced.

Kernel PLS is a nonlinear extension of PLS. A nonlinear mapping is used to transform the original data into a feature space. When a linear PLS regression is constructed, a nonlinear PLS is obtained for original input data. The kernel gram matrix can be calculated in the following form: . The component concentration regression model comes out as where and are the output variables of validation set and calibration set, is the matrix of validation variable feature space mapping, latent vectors and are linear combinations of input and output variables, and is the matrix composed of , where and are input variables of validation set and calibration set. The nonlinear regression can be determined when the kernel function is selected.

Kernel extended SVR, the concentration of the component, is calculated by the regression function: . is a high-dimensional mapping that introduced to complete the nonlinear regression. is an input variable of wavelength signals and and act the same role in SVR. Define kernel function: . The component concentration regression model can be expressed as the following expression: where is the output variable of the validation set, is the matrix of validation variable feature space mapping, and is the matrix composed of , where and are input variables of the validation set and calibration set. The kernel extended SVR is completed.

Kernel determines the feature of high-dimensional space mapping and affects the regression performance. To build the optimal nonlinear regression model, different kernels should be evaluated combined with PLS and SVR. The kernels [32] used in the experiments are the following: (1)Linear kernel: Linear kernel has no parameter. Actually, KPLS turns into PLS, and SVR turns into LinearSVR when linear kernel is adopted.(2)Gaussian kernel: The kernel parameter is the width, .(3)Polynomial kernel: The kernel parameter is the degree.(4)Inverse multiquadric kernel: The kernel parameter is .(5)Semi-local kernel: The kernel parameter is the width, .(6)Exponential kernel: The kernel parameter is the width, .(7)Rational kernel: The kernel parameter is .(8)Kmod kernel: The kernel parameter is .

The prediction performance of high-dimensional mapping by the kernels introduced and the related parameter optimization will be discussed in the next section.

#### 3. Experimental

##### 3.1. Dataset

To evaluate the effectiveness of nonlinear regression with high-dimensional space for blood component spectral quantitative, the UA dataset is used in the experiment.

200 samples are obtained by uric acid concentration spectral determination experiment. Each spectrum has 601 signals from 400 nm to 700 nm with a 0.5 nm interval. The UA concentrations from 105 to 1100 *μ*mol/L are evaluated. A spectrum of the UA data is shown in Figure 1.