Abstract

Long-term storage can largely degrade the taste and quality of dried shiitake mushroom (Lentinula edodes). This paper aimed at developing a rapid method for discrimination of the regular and aged shiitake by near infrared (NIR) spectroscopic analysis and chemometrics. Regular () and aged () samples of shiitake were collected from six main producing areas in two successive years (2013 and 2014). NIR reflectance spectra (4000–12000 cm−1) were measured with finely ground powders. Different data preprocessing method including smoothing, taking second-order derivatives (D2), and standard normal variate (SNV) were investigated to reduce the unwanted spectral variations. Partial least squares discriminant analysis (PLSDA) and least squares support vector machine (LS-SVM) were used to develop classification models. The results indicate that SNV and D2 can largely enhance the classification accuracy. The best sensitivity, specificity, and accuracy of classification were 0.967, 0.953, and 0.961 obtained by SNV-LS-SVM and 0.933, 0.930, and 0.932 obtained by SNV-PLSDA, respectively. Moreover, the low model complexity and the high accuracy in predicting objects produced in different years demonstrate that the classification models had a good generalization performance.

1. Introduction

Shiitake (Lentinula edodes) is an edible medicinal mushroom native to China and originally cultivated in East Asia [1, 2]. Presently, shiitake mushroom is the second most cultivated edible mushroom around the world and accounts for about a quarter of the worldwide production [3, 4]. Shiitake mushroom is also the fastest-growing species among all cultivated mushrooms [1]. In the United States, the shiitake mushroom production has exceeded 9 million pounds per year during 2009 [5]. It has recently attracted much attention as physiologically functional food as well as a source for the development of novel drugs [6]. Its importance can be attributed to its nutritional value and healthy effects, as well as its special taste [7]. As a popular nonstaple food, shiitake is rich in dietary fiber, minerals, and vitamins and low in fat [8]. Several important components have been separated from its basidiocarp, mycelium, and culture medium [9, 10], including biologically active polysaccharides, ergothioneine, phenolics, amino acids, dietary fiber, ergosterol, vitamins B1, B2, and C, and minerals. By experimental researches, shiitake mushroom has demonstrated many important therapeutic properties, including antitumoral, hipocolesterolemic, antifungic, and antimicrobial activities, along with a high antioxidant potential [1113].

Fresh shiitake mushroom is highly perishable and tends to lose quality quickly after harvest. It has a short shelf life because of its high respiration rate, tendency to turn brown and lose water, and lack of physical protection against microbial attack [14]. Therefore, in China, only a small fraction of the total shiitake yield is freshly consumed and the bulk is made into dried mushroom to obtain a much longer shelf life. The quality of dried shiitake can be influenced by many factors, including species, origins, cultivations, and processing. Moreover, long-term storage can cause aging and the quality degradation of shiitake [15]; for example, during storage, dried shiitake tends to absorb moisture in the air, which may cause the loss and reactions of soluble components and even mustiness. Although producers and sellers usually will recover the appearance by some physical or/and chemical processing, the taste and quality of aged shiitake can no longer be recovered. The aged shiitake should be sorted out and sold as lower-grade products. However, for economic reasons, most producers will just get rid of the seriously aged and mildewy shiitake and prefer to mix and sell the regular and aged shiitake together to get a good price. Therefore, it is necessary to develop a rapid and effective method to distinguish the aged and regular shiitake.

As a promising alternative approach to the traditional analytical methods, near infrared (NIR) spectrometry, when combined with chemometrics, has demonstrated great potential for rapid analysis of food products. NIR has some advantages over traditional chemical analysis and sensory methods, including less sample preparation, reduced analysis time and cost, simultaneous multicomponent analysis, and the potential use for online analysis. This had made NIR widely used in quality control of various foods and agricultural products [16, 17].

The objective of this work was to develop a rapid and accurate method to discriminate regular and aged shiitake by NIR spectrometry combined with chemometrics. Considering the diversity of practical objects, shiitake objects were collected from six main producing areas in successive two years. To ensure the generalization performance of classification models, different data preprocessing methods and classification models were studied and compared to obtain the models with least complexity and preprocessing.

2. Materials and Methods

2.1. Collection of Shiitake Samples

The shiitake objects were collected from six main producing areas in China in successive two years (2013 and 2014). The regular objects were made from fresh shiitake mushrooms harvested in the current year. All the aged objects had been stored for over one year by the time of NIR analysis. All the regular and aged shiitake objects were provided and labeled by the producers. The seriously dampened and mildewy shiitake objects were manually sorted out, because such seriously dampened shiitake can be readily identified and it is unnecessary to include them in the classification models. As a result, regular (197 objects) and dampened (133 objects) shiitake objects were obtained and the detailed information is listed in Table 1.

2.2. Sample Preparation and NIR Spectroscopy Analysis

Before grinding, all the shiitake samples were fully dried in the sun. Each intact object (full basidiocarp with pileus and stipe) was ground by a disintegrator and then the particles were filtered through a 200-mesh screen. Impacted sample powders were analyzed in a quartz sample cup using a Bruker-TENSOR37 FTIR spectrometer (Bruker Optics, Ettlingen, Germany) in the reflectance mode. The spectra were measured using a PbS detector with an internal gold background as the reference. The working range of spectrometer was 4000−12000 cm−1. Each sample was measured triply while being stirred and impacted before each scanning. The spectrum was taken as the average of the three scans. The instrumental resolution was 4 cm−1 with a scanning interval of 1.929 cm−1, so each raw spectrum had 4148 wavelengths. The temperature was kept at around 25°C and the humidity was kept at a stable level during analysis.

2.3. Preprocessing and Data Splitting

All the data preprocessing and statistical analysis were performed on Matlab 7.0.1 (Mathworks, Sherborn, MA). Considering the uncertainty in the NIR analysis and the measured spectra, different preprocessing methods were studied to select the most proper method. Smoothing can reduce white noise in the spectra and improve the signal-to-noise ratio (SNR). The algorithm of polynomial fitting [18] was used for smoothing for its simplicity and effectiveness. Taking derivatives can improve spectral resolution and remove linear baseline shifts, so first-order (D1) and second-order (D2) derivatives were also applied. To avoid decreasing of the SNR by direct differencing, D1 and D2 spectra were also computed by polynomial fitting algorithms. Moreover, standard normal variate (SNV) [19] was used to reduce the effects of the difference in powder sizes. The Kennard and Stone (K&S) algorithm [20] was then applied to split the measured objects into a representative training set and test set. K&S algorithm selects the farthest points based on the Euclidean distance to be included in the training set to cover the widest range of experimental pace.

2.4. Classification Models

Partial least squares discriminant analysis (PLSDA) is a classification technique using partial least squares (PLS) regression. The usefulness and effectiveness of PLSDA as a classification technique have been demonstrated by the relationship between PLS, canonical correlation analysis (CCA), and linear discriminant analysis (LDA) [21]. For the two-class problem in this work, for the spectral matrix with objects arranged in rows, a dummy response vector was generated containing +1 and −1 to denote regular and aged samples, respectively. A PLS regression was built to relate the spectral matrix to the dummy response vector. For classification, an object with a predicted response value above/under 0 would be assigned to the regular/dampened group.

Support vector machine (SVM) [22] is a modeling technique based on the regularization of regression coefficients. By balancing the capacity and generalization performance of a learned model, SVM has been proved to be an effective tool to solve regression and classification problem. Least squares SVM (LS-SVM) [23] is a simplified version of SVM. Unlike the traditional SVM algorithms, which are based on quadratic programming, LS-SVM can obtain the solution by solving a set of linear equations. Moreover, nonlinear relationship can be learned using a nonlinear radial basis function (RBF).

In this work, linear PLSDA and nonlinear LS-SVM were used to classify the regular and dampened shiitake. For LS-SVM, the commonly used Gaussian RBF was adopted for nonlinear transformation. The number of PLSDA components and the parameters of LS-SVM were estimated using Monte Carlo cross validation (MCCV) [24].

2.5. Evaluation of Classification Performance

Sensitivity and specificity [25] were used to compare the classification models. In this work, the regular shiitake objects were denoted as “positives” and the dampened ones as “negatives”; model sensitivity (Sen) and specificity (Spe) were defined as where true positives (TP), false negatives (FN), true negatives (TN), and false positives (FP) can be obtained from the numbers of misclassified objects. As an overall criterion to evaluate classification models, total accuracy (Acc) of classification was also used:

3. Results and Discussions

For the raw data, the spectral range of 8000−12000 cm−1 was significantly influenced by baseline shifts and scattering, so this spectral range was not used for further data analysis. Some of the raw NIR spectra (4000−8000 cm−1) of regular and aged shiitake are shown in Figure 1. Seen from Figure 1, the regular and dampened samples have very similar peak patterns. Generally, due to the low resolution and serious peak overlapping, accurate assignments of most peaks were very difficult. Therefore, chemometrics methods were required to extract the useful information from spectral data for classification.

Figure 2 demonstrates the PCA plot of raw NIR spectra of regular and aged shiitake. The first two principal components (PCs) accounted for 92.7% of the total data variances. It can be seen that the projections of the objects in both classes onto the 2-PC subspace are very disperse, which can be attributed to the fact that both regular and aged shiitake included samples collected from different producing areas and years. The PCA plot also indicates that the difference in NIR data of the two classes is significant and can be used for classification. However, due to overlapping of classes, data preprocessing and classification models are required to accurately discriminate the regular and aged shiitake objects.

The preprocessed spectra of regular and aged shiitake samples are shown in Figure 3. By comparison of the raw spectra with smoothed spectra, although smoothed spectra can slightly improve the SNR, it might lose some useful high-frequency information in the raw data. Compared with first-order derivative spectra, the D2 spectra can remove most of the baselines and enhance some detailed information and peak resolution. SNV spectra can remove some spectral variations while enhancing others. The effects of data preprocessing should be evaluated by classification results.

The K&S algorithm was performed on the regular and aged samples separately. Each group was split into training and prediction objects, which were then combined to generate the final training and test sets. The final training set had 227 objects (137 regular and 90 aged objects) and 103 test samples (60 regular and 43 aged objects).

LS-SVM and PLSDA models were developed based on raw and preprocessed NIR data. For LS-SVM, the values of two parameters, γ and σ, need to be tuned. The magnitude of kernel width parameter, σ, is directly related to the nonlinear nature of the regression. The regularization parameter, γ, controls the tradeoff between the risk of model and the learning error. Therefore, γ and σ were optimized simultaneously by MCCV. Misclassification rate of MCCV (MRMCCV) was used to screen different combinations of γ and σ to obtain the lowest MRMCCV value. For MCCV, the number of random pieces of data splitting was 100 and each time 20% of the training objects were left out for prediction. The number of PLSDA latent variables was also estimated by MCCV.

The prediction results and optimized parameters of PLSDA and LS-SVM with different data preprocessing were presented in Tables 2 and 3. Obviously, data preprocessing by taking SNV and D2 spectra could significantly improve the classification accuracy. Data smoothing obtained almost the same classification results as using the raw data. Comparison of the results of different preprocessing methods indicates reducing baseline shifts and the influence of scattering effects was more important than improving the SNR of spectra. For the best classification accuracy, the sensitivity, specificity, and accuracy were 0.967, 0.953, and 0.961 and 0.933, 0.930, and 0.932 by SNV-LS-SVM and SNV-PLSDA, respectively. The SNV-LS-SVM model obtained slightly better results than the SNV-PLSDA model but the difference was insignificant. The prediction results by SNV-LS-SVM and SNV-PLSDA are shown in Figure 4. Compared with LS-SVM with RBF nonlinear transformation, PLSDA is a linear model and the risk of overfitting is lower. Moreover, the model complexity of PLSDA models was low (3 to 4 components), so PLSDA can be expected to have a better generalization performance when the model is applied to different types of shiitake objects and more objects should be included in the training set.

4. Conclusion

The feasibility of using NIR spectroscopy for rapid discrimination of regular/aged shiitake mushroom was investigated. Both D2 and SNV could improve the classification performance significantly compared with the raw data, indicating that it is important to control the unwanted variations caused by baselines and scatter effects. With SNV transformation, both the linear PLSDA and nonlinear LS-SVM obtained accuracy discrimination of aged shiitake. The PLSDA had a low model complexity and the model can be generalized to practical applications by including different types of shiitake for training.

Conflict of Interests

Lu Xu declares no conflict of interests; Xian-Shu Fu declares no conflict of interests; Chen-Bo Cai declares no conflict of interests; Yuan-Bin She declares no conflict of interests.

Authors’ Contribution

Lu Xu and Xian-Shu Fu equally contributed to this work.

Acknowledgments

Yuan-Bin She is financially supported by the General Projects of National Natural Science of China (Grant nos. 21476270, 21276006). Lu Xu is grateful for the financial help from the Open Research Program (no. GCTKF2014007) of State Key Laboratory Breeding Base of Green Chemistry Synthesis Technology (Zhejiang University of Technology) and the Research Fund for the Doctoral Program of Tongren University (no. trxyDH1501).