Abstract

The aim of this work is to solve the practical problem that there are relatively few fast, intelligent, and objective methods to distinguish dairy products and to further improve the quality control methods of them. Therefore, an approach of cheese product brand discrimination method based on Raman spectroscopy and probabilistic neural network algorithm was developed. The experimental results show that the spectrum contains abundant molecular vibration information of carbohydrates, fats, proteins, and other components, and the Raman spectral data collection time of a single sample is only 100 s. Due to the high spectral similarity between samples, it is impossible to identify them with naked eyes. Characteristic peak intensity combined with statistical process control method was employed to study the fluctuation characteristics of samples. The results show that the characteristic peak of experimental samples fluctuates within a certain control limit. However, due to the high similarity between the Raman spectra of different brand samples, they cannot be effectively identified as well. This paper further studied and established the analytical approach based on Raman spectroscopy, including wavelet denoising, normalization, principal component analysis, and probabilistic neural network discrimination. In db1 wavelet processing, [−1, 1] normalization, 74 principal components (cumulative contribution rate of 100%) can realize the effective discrimination of different brands of cheese products in 1 s, with the average recognition accuracy of 96%. The discriminant method established in this work has the advantages of simple operation, rapid analysis, and accurate results. It provides a technical reference for the fight against counterfeit products and has a broad application prospect.

1. Introduction

In recent years, the quality and safety assurance of dairy products has become a hot issue of great concern to consumers, enterprises, and government regulators. The risks are mainly from illegal addition, toxic and harmful substances, and shoddy products [1, 2]. The existing detection method that has been widely studied and applied is component analysis, mainly represented by chromatography and chromatography-mass spectrometry [3]. For example, the detection of melamine, dicyandiamide, aflatoxin, and other components can obtain accurate qualitative and quantitative information through component analysis [4, 5]. However, this method has to face two challenges. First, it is a routine separation and analysis component test, which generally requires preprocessing and is time-consuming and laborious. Second, it is recently reported that some “counterfeit” products are all qualified products in fact, which will not cause harm to human body but are illegally used by criminals to pretend to be high-quality products. In this way, illegal profits can be obtained [6]. The traditional component analysis method may not effectively identify the true and counterfeit samples.

In view of the above problems, rapid detection methods have been widely used in the field of analytical chemistry, such as colorimetric method, strip method, and computer-aided discrimination method [79]. Colorimetry mainly uses the specific reaction between the marker and the reactant; the marker causes the dispersion and aggregation state change of nanomaterials (e.g., colloidal gold nanoparticles), and then the color change and the target molecules will be identified [10, 11]. The strip method mainly uses immune or competitive reactions to initiate or prevent the aggregation of colloidal gold nanoparticles on the paper strip detection line. The detection line shows red or colorless to indicate the presence or absence of the target molecules [12, 13]. The computer-aided discriminant analysis is mainly based on the data information of spectroscopy, chromatography, mass spectrometry, and so on, of the samples and combined with the chemometrics algorithm [1417]. Because spectral data can be obtained quickly and contain abundant sample molecular information, it has become a key area of rapid detection technology research and development. The spectroscopy mainly includes infrared spectroscopy, ultraviolet spectroscopy, fluorescence spectroscopy, and Raman spectroscopy. It is found that the infrared spectroscopy is easy to be interfered by water molecules. The ultraviolet spectroscopy requires that the tested object contains unsaturated compounds and the sample is mainly liquid. The fluorescence spectroscopy requires that the tested object contains a luminescent structure. Compared with the above-mentioned spectroscopy methods, Raman spectroscopy has many advantages, such as there is no obvious interference of water molecules, the object can be directly tested, characterization of rich vibration information of samples is possible, and the equipment is portable [18, 19]. It has become the research focus in the field of rapid detection. For example, Mendes et al. established a quantitative analysis of milk fat based on vibration spectroscopy [20]. Teixeira et al. evaluated the detection of β-lactam antibiotics in milk from the experimental and theoretical levels [21]. Nieuwoudt et al. reported a method for rapid quantitative determination of melamine, urea, ammonia sulfate, dicyandiamide, and sucrose in milk using partial least squares combined with Raman spectroscopy [22]. However, there are still relatively few research studies about Raman spectroscopy combined with chemometrics technology in the field of dairy products, and the current reports have mainly focused on component prediction [23].

Herein, an approach of cheese product classification based on Raman spectroscopy and probabilistic neural network was established. Cheese products contain rich nutrients; the quantity of products is relatively small and the price is high, so it is urgent to develop a rapid and intelligent discrimination method [24]. The newly established method mainly includes the following advantages. First of all, the traditional component analysis method is difficult to effectively identify different brands of cheese products with high similarity. The sample molecular information can be characterized by Raman spectroscopy, and the sample can be effectively identified by combining the probabilistic neural network classification algorithm. Second, Raman spectral signal acquisition of the experimental samples is simple and fast and do not need sample pretreatment, the water molecules in the cheese samples do not interfere with the test, the data acquisition time of each sample is only 100 s, and the probabilistic neural network algorithm operation is less than 1 s. Finally, the Raman spectrometer is portable, which is conducive to on-site detection. Combined with the multimolecular characteristics of the Raman spectroscopy, the proposed method can effectively achieve the fingerprint of experimental samples and provide an intelligent objective evaluation system.

2. Experimental Section

2.1. Samples and Instruments

Samples of three brands of cheese products were purchased from Suguo Supermarket (Nanjing, China) and marked as brand XX, brand YY, and brand ZZ, respectively. 25 samples were randomly collected for each brand. A 96-well plate was filled with appropriate amount of cheese products. The Raman spectra were recorded using a portable laser Raman spectrometer (ProttezRaman-d3; Enwave Optronics Inc., USA). The excitation wavelength of the laser was 785 nm, the laser power was about 450 mW, and the integration time was 100 s. The spectrometer operated over a spectral range from 250 to 2000 cm−1 with a resolution of 1 cm−1.

2.2. Data Analysis

The baseline calibration of the collected Raman spectra was carried out by the software SLSR Reader V8.3.9 (Enwave Optronics Inc., USA). Wavelet denoising, normalization, principal component analysis, and probabilistic neural network were performed using MATLAB software (MathWorks, Natick, MA, U.S.A.). The “wden” function was applied to implement wavelet denoising. The “mapminmax” function was applied to implement normalization. The “princomp” function was employed to implement principal component analysis. The “newpnn” function was used to construct the probabilistic neural network. A statistical control chart was obtained using Minitab software (Minitab Inc., USA).

3. Results and Discussions

3.1. Raman Spectroscopic Characterization Analysis of Cheese Products

Figure 1 shows the Raman spectra of cheese products. Referring to the existing literature reports [2527], the main Raman peaks of cheese products can be assigned as follows (Table 1). The peak of Raman spectra at 1760 cm−1 was mainly attributed to the C=O stretching ester of fat acid molecules. The Raman peak at 1670 cm−1 was characteristic of C = O stretching vibration of amide I of proteins and C = C stretching mode of unsaturated fatty acids. The weak Raman band at 1620 cm−1 could be attributed to the ring vibration of the amino acid phenylalanine. The prominent feature of the spectra was the CH2 deformation vibration of fats and carbohydrate molecules at 1458 cm−1. The Raman peak at 1313 cm−1 was CH2 twisting vibration related to the lipids. The region between 800 and 1200 cm−1 was very characteristic of carbohydrates; the main peaks could be attributed to C-O stretching vibration, C-C stretching vibration, and C-O-H deformation vibration (1143  cm−1, 1095  cm−1, and 1080  cm−1), C-O-C and C-O-H deformation vibration and C-O stretching vibration (938 cm−1), and C-C-H and C-O-C deformation vibration (851  cm−1); except the peak at 1019  cm−1, which was the ring breathing mode associated with the presence of phenylalanine. Vibrations in the 250–800 cm−1 region mainly included the C-C-O deformation vibration (636  cm−1), glucose (510 cm−1), and lactose (384 cm−1).

The rich information of material components and molecular vibration of cheese products are shown in Figure 1; at the same time, it can be seen that the Raman spectra of different brands of cheese products have high similarity, which cannot be effectively visually identified with naked eyes. Similarly, the appearance of cheese products is a yellowish viscous solid, and it is difficult to identify the sample brand from the appearance as well. Figures S1S3 show the ten randomly selected Raman spectra of brand XX, YY, and ZZ cheese products, respectively. It can be seen from the figures that the intensities of Raman spectra of cheese products of the same brand have some fluctuations, but the overall spectra maintain high consistency. There are also high similarities among these different brand samples, which suggests that we need to use statistical learning methods for their discriminant analysis.

3.2. Statistical Analysis of Raman Spectral Peak Intensity of Cheese Products

In the actual production management process, the statistical process control method is often used for the statistical analysis of sample quality fluctuation [28]. From the above analysis, it can be seen that the Raman spectrum is closely related to the molecular composition of the corresponding cheese products. Therefore, the statistical control chart method can be employed to analyze the fats (1760 cm−1), carbohydrates (1458 cm−1), and proteins (1019 cm−1), respectively.

The statistical control chart can be realized using the following individual and moving range chart formulae [29, 30]. For the individual () control chart, the formula is as follows:

For the moving range (MR) control chart, the formula is as follows:

In the formulae, and represent the Raman intensity and the average value of the samples, respectively; and MR represents the moving range, which is ; represents the Raman intensity of the sample variable, and changes from 1 to 24 in steps of 1 in this work. UCL = upper control limit; LCL = lower control limit;  = the average value of moving range control chart.

As shown in Figure 2, the control lines were calculated based on the Raman spectral intensities of brand XX at 1760 cm−1. It can be seen that the Raman peak intensity corresponding to the fat content of brand XX’s experimental sample fluctuates in the range of 55.7–163.1, and the moving range is located at 0–65.97. The experimental sample shows good quality stability, and there is no sample jumping out of the control limit. However, it is not difficult to find that a small number of experimental samples of brand YY and brand ZZ have jumped out of the control limit, which shows that only monitoring fat content cannot achieve effective discrimination of different brand samples. Similarly, Figures S4 and S5 show the control charts calculated based on 1760 cm−1 (fat-related) intensity values of brand YY and brand ZZ, respectively. Figures S6S8 show the control charts calculated based on 1458 cm−1 (carbohydrate-related) intensity values of brand XX, YY, and ZZ, respectively. Figures S9S11 show the control charts calculated from 1019 cm−1 (protein-related) intensity values of Raman spectra based on brand XX, YY, and ZZ, respectively. The results show that the control chart based on the above single index can effectively describe the quality fluctuation of each brand’s experimental samples, but because of the high similarity between the samples, it cannot effectively achieve brand differentiation.

3.3. Discriminant Analysis of Cheese Products Based on Probabilistic Neural Network

For this kind of product with both quality fluctuation and similarity, using a machine learning algorithm to establish an effective discriminant analysis process has become a research hotspot [31]. As a pattern classification algorithm, the probabilistic neural network (PNN) algorithm had the advantages of easy training and fast convergence, which was employed to construct the discriminant analysis method in this paper [32, 33]. This algorithm consists of input layer, model layer, sum layer, and output layer. The main function of the input layer is to receive the training sample Raman spectral data and transmit the data to the network. The number of neurons is equal to the attribute dimension of the samples. The model layer mainly describes the feature vector transferred from the previous layer and the pairing relationship of each pattern in all training samples. When vector is received, the input-output relationship of the neuron of the class sample in this layer is shown as follows:where , is the total number of classes corresponding to the training samples, is the sample space dimension, is the center of the class sample, is the smoothing factor, and is the output of the neuron of the class sample in the model layer.

In the sum layer, the outputs of neurons belonging to the same class in the model layer are weighted and averaged using the formula: , where is the category out of class and is the number of neurons of class .

The output layer is composed of competing neurons, whose function is to receive the output from the sum layer and to find one neuron with the largest posterior probability density among all the output layer neurons. Its output is the prediction category and the output of the other neurons is 0. The formula is as follows: , where is the results of the output layer.

The “newpnn” function of MATLAB software could be employed to build the probabilistic neural network. First, 80% of the experimental samples were selected to build the training set, and the remaining 20% were used to build the test set. The experimental results show that the recognition accuracy is only 33.33%. The reason may be that there is redundant information in the Raman spectral data of cheese samples, which affects the effective calculation and discrimination of the model.

According to the Raman spectral data of cheese products, wavelet denoising (“wden” function, db1 wavelet base, and decomposition layer = 3) was employed to effectively eliminate the spectral line noise of the Raman spectroscopy, and the normalization function (“mapminmax” function) was employed to classify the Raman spectral intensity into the range of [−1, 1], so as to effectively reduce the influence of dimensional differences. For wavelet denoising, the Raman spectral data is expressed as a linear combination of wavelet functions, which is , where is the component represented by wavelet function in the original Raman spectral data . In the calculation process, the original Raman spectral data are transformed into wavelet coefficients, then the smaller coefficients are weakened according to the soft threshold processing method, and finally, the denoised spectral data are reconstructed. The normalization formula is as follows: . The denoised Raman spectral data are mapped to in the range of [−1, 1], and .

As shown in Figure S12, it can be seen that the Raman spectral lines of cheese products become smooth and the ratio difference between peaks increases. Principal component analysis was applied to extract features and reduce data dimensions [34]. In this method, a few new feature variables are the linear combination of the original feature variables through the transformation of the original spectral data. At the same time, these variables should represent the data features of the original variables as much as possible, and the new feature variables are not related to each other. The steps of Raman spectral feature extraction of cheese products based on principal component analysis are described as follows: (1) The covariance matrix of the Raman spectral data matrix is calculated. (2) The eigenvalues of the covariance matrix (in descending order) and their corresponding eigenvectors are calculated. (3) The eigenvectors corresponding to the first eigenvalues whose cumulative contribution rate reaches a certain threshold value and form a projection matrix are selected, where the contribution rate of the eigenvalue is defined as and the cumulative contribution rate of the first eigenvalue is defined as . (4) The data are projected into the space formed by the feature vector , and is the new feature vector extracted. The results show that the first principal component can explain 30.20% of the original 1751-dimensional data information, and the second principal component can explain 11.51% of the original information (Figure S13). Only 74 principal components can achieve 100% of the original information. Figure 3 shows the three-dimensional scatter diagram of cheese products by principal component analysis. It can be seen that the samples of the same brand tend to gather, and the samples of different brands have some separation and some cross.

Randomly select 80% of the samples as the training set, input 74 principal components extracted by wavelet denoising, normalization, and principal component analysis as the data input, and reconstruct the probabilistic neural network model with the remaining 20% of the samples as the test set. The experimental results are shown in Figure 4, the recognition accuracy is up to 100%, and the average recognition accuracy is 96%. The discriminant analysis takes less than 1 s. The experimental results show that the Raman spectral collection and processing established in this work can realize the effective and rapid discrimination of cheese products with high similarity and provide technical support for their quality control.

4. Conclusions

In this paper, a novel method based on Raman spectroscopy and probabilistic neural network has been developed, which will strongly support to solve the practical challenge of the lack of fast and intelligent discrimination technology for dairy product quality control. This method has many advantages: Raman spectral signals of cheese products can be collected directly, the operation is convenient and fast, Raman spectroscopy contains abundant components and molecular vibration information of samples, and the method can solve the quality control problem that the traditional component analysis and statistical control cannot achieve brand discrimination. The total time of Raman spectral signal acquisition, discrimination algorithm analysis, and result output is only a few minutes, and the average recognition accuracy is 96%. This method can be employed for reference and potential application in food system discrimination analysis with high similarity between samples.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares no conflicts of interest with respect to research, authorship, and/or publication of this article.

Acknowledgments

This research was financially supported by the Open Project Program of State Key Laboratory of Dairy Biotechnology (No. SKLDB2018-007), National Natural Science Foundation of China (No. 61602217).

Supplementary Materials

The supplementary materials mainly show the Raman spectra of several cheese products (Figures S1S3), the quality fluctuation of cheese products based on different Raman peak intensities (Figures S4S11), the Raman spectra of these products after wavelet denoising and normalization pretreatment (Figure S12), and the principal component analysis result (Figure S13). (Supplementary Materials)