#### Abstract

Melamine is a nitrogen-rich substance and has been illegally used to increase the apparent protein content in food products such as milk. Therefore, it is imperative to develop sensitive and reliable analytical methods to determine melamine in human foods. Current analytical methods for melamine are mainly chromatography-based methods, which are time-consuming and expensive and require complex pretreatment and well-trained technicians. The present paper investigated the feasibility of using near-infrared (NIR) spectroscopy and chemometrics for identifying and quantifying melamine in liquor milk. A total of 75 samples were prepared. Uninformative variable elimination-partial least square (UVE-PLS) and partial least squares-discriminant analysis (PLS-DA) were used to construct quantitative and qualitative models, respectively. Based on the ratio of performance to standard deviate (RPD), UVE-PLS model with 3 components resulted in a better solution. The PLS-DA model achieved an accuracy of 100% and outperformed the optimal reference model of soft independent modeling of class analogy (SIMCA). Such a method can serve as a potential tool for rapid screening of melamine in milk products.

#### 1. Introduction

Melamine or 2,4,6-triamino-1,3,5-triazine is a nitrogen-rich compound and has been widely used in industry for producing melamine-formaldehyde resin [1]. Such a substance contains a substantial amount of nitrogen, 66.7% by mass, and therefore forms a driving force of the adulteration of milk products with melamine for obtaining high protein content readings. This is also because the conventional Kjeldahl or Dumas test is a method of analyzing total nitrogen content, without identifying its sources [2, 3]. It gives an appearance of protein content, not true protein content, and therefore cannot distinguish nitrogen of melamine from other proteins.

Although melamine is not inherently a carcinogen, it has low oral acute toxicity [4]. High-dose ingestion of melamine or chronic administration can induce urinary calculi and acute renal failure and even results in death, especially in babies and children [5–7]. Studies on toxicity caused by oral ingestion of melamine in humans are nonexistent. LD50, the lethal dose of a given compound resulting in death of 50% of tested animals, is 3.1 g/kg of body mass for melamine in rats [8]. At present, the US Food and Drug Administration (FDA) states limit of melamine in milk of no more than 2.5 ppm [9]. In 2008, a serious adulteration incident, that is, milk powder adulterated with melamine, was made public in China and many other countries [10]. The adulterated products resulted in kidney illness of various degrees affecting about 294,000 individuals, 6 of whom died [11]. This incidence has incurred global concern about melamine and food safety. Therefore, the development of fast and reliable analytical methods to determine melamine in human food is of utmost importance.

In the wake of these incidents, various methods have been developed in the area of melamine detection, including gas chromatography (GC) [12], high-performance liquid chromatography (HPLC) [13], mass spectrometry (MS) [14], matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, enzyme linked immune sorbent assay (ELISA) [15], and capillary electrophoresis (CE) [16]. However, to eliminate the strong interference components from matrix, sophisticated instrument and complicated sample treatment are generally needed for the above techniques, which seriously hampered the speed in field analysis. Also, these methods maybe require well-trained technicians to operate the instrumentation. Currently, spectrophotometry combined with chemometrics seems to be a kind of attractive methods in analytical chemistry [17–19]. On one hand, spectroscopic instruments are widely available in ordinary analytical laboratory; they are also much faster than the usual techniques, present good accuracy and precision, and are nondestructive, and the procedure of sample preparation is relatively simple. On the other hand, chemometrics can help chemists to resolve the constituents of complex chemometrics by introducing of computer and statistical techniques [20, 21]. In particular, Raman, infrared, and near-infrared (NIR) spectroscopy have been successfully used in adulteration detection for various food products such as milk and honey. Several researches have mainly focused on the detection of melamine in milk powder by these techniques [22–24]. A good review related to melamine prediction in milk by vibration spectroscopy is available [25].

Among spectroscopic methods, NIR spectroscopy has obvious advantages because it is noninvasive, requires minimal sample preparation, and can yield a response in real time. NIR spectroscopy corresponds to the absorption of electromagnetic radiation in 780–2500 nm [26–28]. Quantitative and qualitative analysis with NIR spectroscopy depend greatly on the regression/classification model to relate subtle spectral variations to changes of certain component concentration in the sample matrix or to discriminate sample categories. The accuracy and robustness of a model directly decide its applicability in most situations. Even if many modeling methods have been developed, undoubtedly, the most commonly used multivariate technique is still partial least squares (PLS) [29]. Traditionally, PLS is considered being based on latent variables and without having to perform a previous variable selection. However, more and more evidences have indicated that variable selection is still very important and necessary for a parsimonious and robust model [30]. Comparing with many other methods of variable selection, uninformative variable elimination-PLS (UVE-PLS) is a method for variable selection based on an analysis of regression coefficients of PLS, which has been widely applied in spectral analysis and provided satisfactory results.

In this study, the feasibility of identifying and quantifying melamine in liquor milk by NIR spectroscopy in combination with chemometrics was investigated. Several algorithms, that is, PLS, UVE-PLS, partial least square-discriminant analysis (PLS-DA), and the soft independent modeling of class analogy (SIMCA), were used to construct quantitative and qualitative models and the optimal results were obtained.

#### 2. Materials and Methods

##### 2.1. Materials and Reagents

Melamine (1,3,5-triazine-2,4,6-triamine) (99%) was purchased from Aladdin Reagent Co., Ltd. and used without further purification. Nescafe milk powder from Shuangcheng Nestle Co., Ltd., was purchased from a local supermarket. It was confirmed to be melamine-free. Based on the data provided by the manufacturer, protein, fat, and carbohydrates contents in the milk powder were 23.2%, 28.2%, and 37%, respectively. The milk powder was used for the preparation of liquor milk, which was then adulterated by melamine in order to improve apparent protein concentration. A total of 45 samples of liquor milk were prepared. Each of the first 30 samples was obtained by dissolving 15 grams of milk powder in 100 mL deionized water. These samples have the same protein concentration of 3.48% and were used for simulating pure liquor milk unadulterated with melamine. Each of the other 15 samples was generated by first dissolving 11 grams of milk powder in deionized water (protein % = 2.55%) to obtain 100 mL liquor milk sample and then adding 0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, 0.17, 0.19, 0.21, 0.23, 0.25, 0.27, or 0.29 grams of melamine in it. That is, the range of melamine concentration was set to be from 0.01% to 0.29% (i.e., 0.1–2.9 g/L). Such an upper limit set is based on the consideration of the solubility of melamine in water (3.1 g/L at room temperature). In the market, the popular protein concentration of liquid milk is 3.0–3.5%; the adulteration of melamine is to improve the apparent protein content of liquid milk. Each 0.01 g melamine in 100 mL can only improve 0.04% of apparent protein content, below which it will lose driving force for counterfeiters. Apparent protein concentration of the adulterated samples varies from 2.59% to 3.76%. Each time, melamine was added to a milk sample independently so as to ensure appropriate variations in all samples. For constructing quantitative models of melamine, only later were 45 spectra used. Also, these samples with melamine concentration of 0.03%, 0.07%, 0.13%, 0.15%, 0.19%, 0.23%, and 0.27% constituted the test set. For constructing an identification/classification model, all samples/spectra were divided into the training/calibration set and the test set by alternative sampling.

##### 2.2. Instrument and Measurement

A spectrum was collected for each unadulterated sample while 3 spectra were collected for each of the adulterated milk samples. In order to measure the influence of environmental conditions, the adjacent measurement interval was set as 20 minutes. A total of 75 NIR spectra were acquired on Fourier transform NIR spectrometer (Thermo Fisher, USA) equipped with a special optical fiber probe. The collection process was controlled by the Result software of Thermo Fisher. The wavenumber range was set as 4000–10000 cm^{−1}. Each spectrum was the average result of 32 scans with 1557 points. Only 489 points in the region of 4000–5882 cm^{−1} were used for modeling. Figure 1 shows the NIR spectra of all experimental milk samples. All calculation and modeling were performed in MATLAB 7.0 for Windows using PLS toolbox.

##### 2.3. Partial Least Squares

Partial least squares (PLS) [31], as a classic multivariate calibration, is commonly used in spectroscopy, especially for NIR spectroscopy, to correlate spectral data () with related reference data (). It is based on latent variables and can therefore handle so-called collinear problem, but for PLS the decomposition of is guided by the variation in : the explained covariance between and is maximized, so that the variation in directly correlating with is extracted. PLS decomposes the matrix of zero-mean variables and the matrix of zero-mean variables into the formwhere is the matrix of spectral data; is the score matrix; is the variable loading matrix and is the residual matrix; is the response matrix; is the score matrix; is the loading matrix; and is the residual matrix. The and represent information after removing most noise. Based on the correlation between them, the linear regression model can be given bywhere is the regression vector to be determined during calibration. To obtain a good estimation of , it maybe needs to collect a training set that span the variation in well and in general are representative of the future samples. From the mathematical point of view, a vector means a calibration model even if a few versions of realizing PLS algorithm are available.

##### 2.4. Uninformative Variable Elimination-PLS (UVE-PLS)

Nowadays, in spectroscopy, many researchers make use of PLS because it is a full-spectrum based method. However, using full spectral region does not always yield optimal results because it may include variables containing more noise than relevant information to models. So, eliminating the variables which have high variance but small covariance with the dependent variables, so-called uninformative variables, may be helpful for improving the model. Based on this, center proposed an algorithm called uninformative variable elimination-partial least squares (UVE-PLS) algorithm [32]. Compared to other algorithms of variable selection, it is user independent and therefore does not cause any configuration problems. It consists of the following steps:(1)Using leave-many-out procedure on spectral matrix () for determining the optimal complexity of the model, with the lowest RMSEP as criterion.(2)Building up of a matrix () of artificial variables of the same size as , generated by the rand command. These variables are multiplied by a small constant (10^{−10}) so as to make them obviously smaller than the imprecision of the instrument; such an operation retains the variation of the variables but makes their influence on the final model negligible.(3)Combining of with to form an extended matrix with twice as many variables as . The resulting matrix can be denoted as (), the first columns being those of and the last ones being those of .(4)Building PLS models for according to leave-one-out procedure. The complexity, that is, the number of latent variables square discriminan, is set the same as for . This leads to PLS models, each with a coefficient vector of elements. All of them are collected in a matrix ().(5)Calculating the reliability factor () for each variable by dividing the mean value of the coefficient vector by its standard deviation.(6)Setting the cut-off limit as certain percentage () of the largest of artificial variables and constructing consist of the variables with high compared to the cut-off limit. This means that all artificial variables and all original variables assumed to contain nothing but noise are eliminated.(7)Building the final PLS model by leave-one-out cross-validation procedure.

Considering the computational cost, the number of columns of matrix was fixed at 200.

##### 2.5. PLS-DA

The PLS regression is initially developed for the prediction of continuous target variable. But it seems to be useful in the classification problem where one expects to predict the values of discrete attributes. The generic name is the partial least square-discriminant analysis, that is, PLS-DA, which is often used in the literature [33]. As a special form of PLS modeling, PLS-DA aims to find the variables and directions in multivariate space, which discriminates the known classes in a dataset. If there are only two classes to separate, the PLS model uses one dummy variable, which codes for class membership as follows: 0 for members of class A and 1 for members of class B. A discriminant model is developed by regression of the spectral data () against the assigned dummy variable (). The model based on experimental data is established so as to assign unknown samples to a previously defined class based on its pattern of measured features. The threshold is set in the model, and a sample is considered to be categorized correctly if the predicted value lies on the same side of the midpoint of the assigned value. A sample is identified as class A if its predicted value is below the threshold and as class B if its predicted value is above the threshold.

##### 2.6. SIMCA

The SIMCA is a popular classification method based on principal component analysis (PCA) [34]. In SIMCA, a separate PCA is performed on each class in the dataset, and a sufficient number of principal components (PCs) are retained to account for most of the variation within each class. The number of PCs retained for each class is usually determined by cross-validation and is maybe different for each class model. Classification in SIMCA is made by comparing the residual variance of a sample with the average residual variance of a class. This comparison provides a direct measure of the goodness of fit of a sample for a particular class model. SIMCA can classify an unknown sample to the class for which it has a high probability.

##### 2.7. Statistical Evaluation

The performance of both PLS and UVE-PLS was assessed using three measures, that is, the standard error of prediction (SEP), the coefficient of determination , and the ratio of performance to standard deviate (RPD). which can measure the proportion of the variation in the response that may be attributed to the model rather than to random error was also recorded. RPD measures the ratio of the standard deviation of reference values to the root mean squared error (RMSE) of prediction. Generally, the RPD values were classified into four levels of prediction accuracy: RPD < 1.5 indicates very bad model/predictions; RPD between 1.5 and 2.0 indicates poor model/predictions; RPD between 2.0 and 2.5 indicates good model/predictions; and RPD > 2.5 indicates very good/excellent model/predictions [35].

#### 3. Results and Discussions

##### 3.1. Spectral Analyses

Even if the spectra were collected in the range of 4000–10000 cm^{−1}, only the spectral region of 4000–5880 cm^{−1} was selected to construct models since other regions obviously contain little useful information. Figure 2(a) gives the mean 1st-derivate NIR spectra of adulterated milk, pure/unadulterated milk, and melamine samples in the region, where a vertical shift is used to prevent superposition; that is, the spectrum corresponding to melamine is moved down by 0.01 units. As can be seen in Figure 1, there exists spectral difference between adulterated and pure milk, but the difference is too small to find by the naked eye. Further, Figure 2(b) amplifies this difference by only showing a smaller region of 4230–4340 cm^{−1}, which indicates that the difference is still present and can be used for quantitative and qualitative purposes. As the melamine molecule consists of rich NH bonds, the key spectral difference of these samples in the range of 4230–4340 cm^{−1} mainly comes from the combination modes of NH vibrational absorption with other energy transfers. For establishing optimal quantitative/qualitative models, several pretreatment methods including 1st derivate and mean-centering and standard normal variate (SNV) were attempted. As a result, the 1st-derivate spectra were optimal for constructing the quantitative model while the original spectra, that is, without any pretreatment, were optimal for qualitative model.

**(a)**

**(b)**

##### 3.2. Quantitative Model

Based on the original spectra, both PLS and UVE-PLS methods were used to construct calibration models. To obtain a robust model, the calibration/training set must capture the possible variability. It is crucial that the calibration set covers the range of concentrations of the melamine to be predicted. So, as described above, eight groups of samples were used as the calibration set and seven groups of samples constitute the test set. By this way, the test set can be covered by the design space of the calibration set. In the application of PLS-related algorithms, it is generally known that the number of latent variables (Lvs) was decisive. A maximum of 15 Lvs were tested. The selection of the optimal Lvs was based on the minimum root mean squared error of cross-validations (RMSECV). Figure 3 shows the RMSECV plot as a function versus the number of latent variables of PLS and UVE-PLS models. As seen from Figure 3, RMSECV decreases first with initial Lvs and achieves the lowest point and then basically remains with an increase of Lvs. The optimal number of Lvs seems to be 5 and 3 for the PLS and UVE-PLS models, respectively, since the utilization of a more than this number does not improve the results. Compared to the PLS model, the UVE-PLS model used fewer original variables to produce fewer Lvs, meaning a more concise model. In fact, the Lvs can be interpreted as a measure of the complexity of the PLS model.

According to the UVE-PLS procedure described above, the stability index () of all variables was calculated. Figure 4 shows the plot of for original variables in the region of 4000–5882 cm^{−1} and 200 artificial variables. The dotted lines show the upper and lower boundaries determined with . Variables corresponding to stability within the boundaries will be eliminated, and the variables whose stability lies out of the dot lines were used for PLS modeling. Also, the variables selected by UVE method are concentrated on a broad region around 5300 cm^{−1} and three narrow regions in the range of 4000–4700 cm^{−1}. The number of retained variables was investigated since it maybe has an influence on the stability and accuracy of the PLS model. Generally, when the number of retained variables is too small, the robustness and accuracy of the PLS model may be affected due to the loss of informative variables. On the contrary, if the number of retained variables is too large, uninformative variables may be contained in the model and make its performance poor. In this work, different values were attempted. As a result, was selected and only 89 variables were retained after the UVE procedure.

Figure 5 shows the relationship between the actual melamine concentration and the predicted ones from the PLS model by scatter plot. Similarly, Figure 6 shows the scatter plot for the UVE-PLS model. In these plots, predicted concentration values are plotted against actual ones and the points will fall on the diagonal only if the model predicts the concentration perfectly. As shown in Figures 5 and 6, for the training set, there is no obvious difference between the PLS and UVE-PLS model since those points corresponding to the training set (expressed by red triangles) have a similar distribution around the diagonal. However, on the test set (expressed by blue stars), the UVE-PLS model outperforms the full-spectrum PLS model. It is also clear in Figures 5 and 6 that the same model can provide different predictions for the same sample due to the spectral diversity. Table 1 summarizes the performance comparison of the optimal quantitative PLS and UVE-PLS models based on three indices. It is clear that UVE-PLS can not only compress the spectral data in a very high ratio but also provide excellent model/predictions corresponding to RPD = 6.23.

##### 3.3. Qualitative Model

To construct a PLS-DA model for identifying/classifying the milk samples, all 75 spectra were divided into a training set with 38 spectra and a test set with 37 spectra. For classification/identification purposes, each sample or spectrum was assigned a class label (1 for unadulterated and 2 for adulterated milk with melamine). The adulterated concentration of melamine was 0.01–0.29% and 0.03%–0.27% for the training set and the test set, respectively. The number of Lvs was optimized in the range of 2–10. It was found that when the number of variables was greater than 4, all samples were classified correctly, as Figure 7 showed, implying the sensitivity of 100% and specificity of 100%. The circles denoted the predicted class labels while the points denoted actual labels in Figure 7. Therefore, the calibrated PLS-DA model can be considered to be reliable and stable, since its performance on future samples is expected to be comparable to those achieved on the training samples. To prove the ability of the PLS-DA, the soft independent modeling of class analogy (SIMCA) was also used and the optimal model contained 4 principal components. As can be seen in Figure 8, there existed four samples with low adulterated concentration misclassified as pure samples for both the training set and the test set. For the optimal SIMCA model, even if the specificity was 100%, the sensitivity was 82.6% and 81.8% for the training set and the test set, respectively. The classification model developed based on PLS-DA and NIR spectroscopy could be used as a potential method for melamine identification in milk products.

**(a) Training**

**(b) Test**

**(a) Training**

**(b) Test**

#### 4. Conclusions

A fast-screening approach for detecting melamine in liquor milk with NIR spectroscopy was developed. Both quantitative UVE-PLS model and qualitative PLS-DA model were constructed and showed satisfactory performance compared to the optimal reference models, that is, PLS and SIMCA. It can serve as a promising complementary scheme for conventional methods with less interference and lower investment and could be suitably applied to on-site quality control of milk products. Even so, to reduce the detection limit, further research is still needed.

#### Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors thank the National Natural Science Foundation of China (Grant no. 21375118), the Applied Basic Research Programs of Science and Technology Department of Sichuan Province of China (2013JY0101), Innovative Research and Teaching Team Program of Yibin University (Cx201104), Scientific Research Foundation of Sichuan Provincial Education Department of China (13ZB0300), and Opening Fund of Key Lab of Process Analysis and Control of Sichuan Universities of China (2015006) for supporting this work.