Abstract

The possibility of using visible/near infrared (Vis/NIR) transmission spectroscopic technique in the 513–850 nm region coupled with partial least squares-linear discriminant analysis (PLS-LDA) and other chemometric methods to classify potatoes with blackheart was investigated. The discrimination performance of different morphological correction methods, including weight correction, height correction, and volume correction, was compared. The results showed that height corrected transmittance has the best performance, with both calibration and validation sets having a success rate of 97.11%. Out of 1800 wavelengths, only six wavelengths (711, 817, 741, 839, 678, and 698 nm) were selected as the optimum wavelengths for the discrimination of blackheart tubers based on principal component analysis (PCA). The data analysis showed that the overall classification rate by PLS-LDA method decreased from 97.11% to 96.82% in calibration set and from 97.11% to 96.53% in validation set, which was acceptable. The importance of these conclusions may be helpful to transfer Vis/NIR transmission technology from laboratory to industrial application in nondestructive, real-time, or portable measurement of potatoes quality.

1. Introduction

Potato (Solanum tuberosum L.) plays an important role in the food market both for its large consumption in the world and for its richness in health-related food components such as vitamin C, protein, calcium, potassium, and dietary fiber [1]. China produced 88.99 million metric tons of potatoes which accounted for 24.17% of the world production in 2013 and the potato ranks fifth in Chinese food crops [2]. However, during the storage or transport period, potato is easy to suffer from internal defect such as blackheart. Blackheart is an important physiological disorder of potato, which is associated with high carbon dioxide conditions or rapid change of environment temperature [3]. It can cause considerable economic losses as the symptoms are internal and cannot be distinguished visually without cutting them into halves. The conventional techniques to identify the existence of blackheart always involve cutting the tubers on random sample set and making visual inspection. This destructive sampling method is unpractical and inapplicable. Therefore, it is quite necessary to develop a reliable, rapid, and nondestructive method to detect and segregate blackheart potatoes which might be also utilized for real-time inspection.

Recently, visible/near infrared (Vis/NIR) or near infrared spectroscopy (NIR), a fast and nondestructive method, has been extensively used in the agricultural products industry [46]. The technique can be implemented in different configurations such as reflection, transmission, and interactance modes for different purposes. Plentiful researches using Vis/NIR spectroscopy with a transmittance mode have been reported for internal quality prediction of agricultural products such as firmness [7, 8], acidity [6, 9], soluble solid content (SSC) [1012], and maturity [13, 14]. Additionally, it even shows great potential in inspecting internal defects such as brown heart in pears and apples [1518], disorder in mangosteen [19] and kiwifruit [20], internal infestation in cherries [21, 22], and internal defects in radishes [23]. In terms of potato quality detection, Vis/NIR has been successfully used to determine tuber dry matter content [2426], crude protein content [1, 25], and sugar content [27], and so forth. However, there are few studies on blackheart potatoes classification using visible and near infrared spectroscopy so far. In addition, due to one inherent drawback of Vis/NIR, that is, the serious collinearity in spectral responses, it may be desirable to select a reduced number of wavelengths. Such selection will avoid spectral collinearity, greatly reduce computation time, and make the model more interpretable [28].

Therefore, the objective of this study is to carry out a feasibility study of using the Vis/NIR spectroscopy techniques in transmission mode for detecting blackheart in potatoes. The specific objectives are to (1) assemble a Vis/NIR spectroscopy system in transmittance mode for potato tubers; (2) identify the effective wavelengths for blackheart potato detection; (3) develop algorithms to identify blackheart potatoes based on transmission spectrum.

2. Materials and Methods

2.1. Materials

Potato tubers (Kexing Six var.) were purchased from a local market in Wuhan, Hubei province, China. All potatoes were cleaned firstly. Defective samples, such as those suffering from bruises, rots, holes, or greening, were eliminated. Lack of ventilation is one of important contributing factors in producing blackheart [3]. According to this, samples were enclosed in zip-lock bags (one potato in each bag). Carbon dioxide released by respiration of the potato increases the carbon dioxide content in the bag, which would raise the chance of blackheart. All potatoes were stored in a cold storage at a temperature of 4°C and 85% relative humidity for four months which provided adequate time for the development of blackheart. Potatoes with decay were taken out, and 519 tubers were finally used for experiments. All tubers were removed from cold storage 24 h before testing to allow them to reach room temperature (~22°C). Every sample was labeled, and morphological properties including weight, maximum length (Max.length), maximum width (Max.width), maximum height (Max.height) and volume of each tuber were measured and recorded before spectral acquisition.

2.2. Spectral Measurements

Transmission spectra of each tuber were collected with an apparatus (Figure 1) which consisted of a fiber spectrometer, an optic fiber, a collimating lens, a light source, a sample holder, a sponge flexible shield, and a computer. The spectrometer was connected to a 400 μm diameter optic fiber and the optic fiber connected to a collimating lens. The light source contained seven reflector 35 W halogen lamps (Philips Inc., China) powered by a DC regulated supply, with six of them being arranged in an arc form and the other one centered toward the sample. The spectrometer having a range from 200 to 900 nm was a USB4000 (Ocean Optics Inc., United States) equipped with a 3648-element linear silicon CCD array detector. Tuber was placed centrally with the maximum height axis vertically and steadily on the sample holder manually and oriented to the collimating lens to facilitate light passing through the tuber. The sponge flexible shield was attached between tuber and the optic fiber, which acted as both a light seal and a flexible support to accommodate different sample geometry and size.

Spectrometer parameters settings and transmission spectra collection and storage were carried out via software SpectraSuite (Ocean Optics Inc., United States). The spectra were obtained across the ranges 200–900 nm at intervals of ~0.18 nm. Each transmittance were recorded as average of 9 scans with the integration time of 100 ms. Four replicates of each tuber were taken and their mean value was used in subsequent analysis. A reference transmission spectrum (a 2 mm thick white paper plate) and a dark spectrum were measured and stored prior to spectra measurement. The transmittance was calculated by the following equation:where is the transmittance and , , and are the light intensities at each wavelength for the sample, reference, and dark spectra, respectively.

2.3. Blackheart Assessment

Following spectra measurements, all potatoes were sliced perpendicular to the maximum height axis to determine the extent of blackheart. Cut surface of each tuber was photographed with a digital camera (Canon IXUS 105, Canon Zhuhai, Inc., China). Areas of blackheart and the whole surface were determined by image analysis using ENVI 4.7 software (ITT visual information solutions, Boulder, CO, United States). The area of affected flesh, expressed as a percent of the area of the whole segment, was subsequently calculated. The level of the blackheart tuber was assessed using the area of affected flesh with four grades ((1) normal; (2) slight: 0–20% of area affected; (3) moderate: 20–50% of area affected; (4) severe: more than 50% of area affected). Figure 2 shows four samples with different blackheart grades.

2.4. Spectra Data Pretreatment

Transmitted light intensity is sensitive to the pathlength of the sample. According to Beer’s Law, light transmittance in agricultural products such as potato is log-linear with the pathlength increasing. The size variation of potatoes will cause changes in the transmittance measurements for individual potatoes, which may not correspond to the quality attributes being measured. Therefore, it would be necessary to correct the transmittance for the pathlength effect. In this study, morphological data of potatoes such as weight, volume, and height were used for transmittance correction to minimize the effect of size variability in different tubers. The morphological-corrected transmittance was calculated by multiplying the relative transmittance with ratio of the sample’s morphological parameter to the mean morphological parameter of all samples [29]:where is the morphological parameter corrected transmittance, is the raw transmittance, is the morphological parameter such as weight, height, and volume, and is the mean morphological parameter of all samples.

2.5. Wavelength Selection

Vis/NIR transmittance in the study has thousands of wavelengths; it is not an appropriate strategy to use all of them for classification because it can produce the so-called “curse of dimensionality” [30]. Wavelength selection will not only enhance the stability of the model resulting from the collinearity in multivariate spectra and reduce computation time for model development due to a decreased number of wavelengths but also help in interpreting the relationship between the model and sample quality attributes [28, 31]. Thus, selecting the most significant wavelengths for blackheart potato detection becomes one of the most urgent procedures in chemometric methods based on pattern recognition.

Principal component analysis (PCA) was applied to select the most important wavelengths for discrimination of blackheart potatoes. PCA is a well-known multivariate analysis technique for wavelength selection in field of spectroscopy. Normally, PCA finds fewer independent components substitute for the original wavelengths through orthogonal transformation [32]. In PCA, spectral data matrix is decomposed into a score matrix and a loading matrix. The scores representing the weighted sums of the original variables without significant loss of useful information could be used to make score plots to present objects in a new space, whereas the loadings could be used to make loading plots to identify and investigate the important wavelengths that are responsible for the specific features that appeared in the corresponding scores. Commonly, the wavelengths corresponding to peaks and valleys at particular principal components are good candidates to be effective wavelengths that correspond to quality attributes [32, 33].

2.6. Discriminant Analysis for Classification and Prediction

Partial least squares-linear discriminant analysis (PLS-LDA) is a supervised method used for classification purposes. It is the combination of PLS for dimension reduction and LDA for classification. PLS maximizes the covariance between the response variable (class membership) and the input variables (wavelengths), whereas LDA finds a linear combination of the new projections that minimizes the ratio of the within-class variance to between-class variance [34]. In the present study, PLS-LDA and Monte Carlo cross-validation (MCCV) [35] were used for establishing calibration models for classification of blackheart potatoes. The optimum number of latent variables (LVs) used in PLS-LDA was determined by the lowest value of predicted mean fitting error, which is defined as the ratio of the incorrectly classified samples to the total samples when MCCV is used. The obtained model can be used to predict the class of the unknown potatoes on the basis of their transmittance.

Because any amount of blackheart would result in rejection of the potato, all of the tubers with blackheart were combined into a single class, while the normal tubers were classified as another class. No effort was made to distinguish between the extent of blackheart. Two classes of the tubers were assigned numeric value of 1 (normal potatoes) and −1 (blackheart potatoes), respectively.

All the tubers were divided into two data sets, two-thirds of the samples were used as a calibration set, and the remaining one-third was used as a validation set. The normal samples for calibration and validation were chosen using the Kennard-Stone algorithm [36], whereas the blackheart samples for calibration and validation were chosen using the concentration gradient method. All the blackheart tubers were sorted in descending order according to their area of affected flesh, and every third sample was removed to be used as a validation set, and the remaining blackheart samples were used for calibration. In this way, it was ensured that both sets covered appropriately and consistently the whole range of blackheart potatoes.

All computations were performed in MATLAB 7.0 (Mathworks, United States) under Windows 7. The principal components analysis (PCA) method, implemented in MATLAB with a PLS extension package (PLS Toolbox v.6.5, Eigenvector Research, United States), was used to model the data. The PLS-LDA algorithm codes were provided by Research Center of Modernization of Chinese Medicines, Central South University, China (PLS-LDA MATLAB codes can be downloaded from http://code.google.com/p/cars2009).

3. Results and Discussion

3.1. Morphological Properties and Blackheart Assessment Results

Table 1 shows the descriptive statistics (mean, standard deviation (SD), range of weight, volume, Max.length, Max.width, and Max.height) of the potatoes. Assessment results of blackheart of these samples were 331 tubers with sound tissue (Grade 1) and the remaining 188 samples were with black or browning flesh. Figure 3 shows the distribution of blackheart tubers with various areas affected. 74, 76, and 38 of them were assessed as grade 2, grade 3, and grade 4, respectively. Table 1 and Figure 3 suggest that a large morphological variation among the potatoes and a wide range of blackheart are good for building robust calibration models.

3.2. Overview of Spectra

Although the transmittance was acquired in the range of 200–900 nm, only spectra data between 513 and 850 nm with 1800 variables were taken into account for the analysis. Transmittances on both sides of the detection ranges were removed because of low signal to noise ratio. The average transmittance for each grade is shown in Figure 4. Generally, blackheart of the tissue within a potato affected the spectral content of the light transmitted through the potato, and the transmittance intensity declines with the severity of blackheart over the full wavelength region. Transmittance of individual normal potatoes was dominated by three absorbencies, which appear as dips in the transmittance. These was a strong chlorophyll absorbance peak around 670 nm, and two absorbance peaks normally attributed to OH functional groups near 740 and 840 nm [15]. Spectral curves of slight affected potatoes were similar to those of normal tubers. Moreover, significant variations in spectra among samples with different morphological property were also found (figure not shown). The slight difference between normal and slightly affected spectra is the magnitude of transmittance which makes the discrimination potato with blackheart a difficult procedure. The differences between the moderately affected spectra and the normal spectra were larger than those between the slightly affected and the normal spectra, not only in terms of amplitude but also in the shape of the curves. It can be seen from Figure 4 that severely affected tubers differ to a large extent from the other samples in their transmittance values through the whole spectrum. Severely affected tubers were found to have much stronger absorbency in the whole wavelength of the spectrum, relative to the samples less affected by blackheart. These changes were probably associated with black or browning flesh, the residence of drier (i.e., lower OH), and flour-textured cortical tissue in the moderately affected and severely affected potatoes. Drier tissue might normally be expected to increase the apparent absorbance, as light is scattered more resulted from the increased numbers of air-tissue interfaces that scatter light and the transmitted intensity is thus reduced [15]. However, these changes in transmittance were different to the transmittance changes observed by Clark et al. (2003), Upchurch et al. (1997), and Fu et al. (2007). In their research, badly affected fruits were found to have much stronger absorbency in the red area of the spectrum (650–750 nm), whereas weaker absorbance was above 840 nm [15, 18, 37]. The possible reason might be the fruit in their research having core and flesh while potatoes in this study have only flesh and they are having different chemical compounds.

3.3. Full Wavelength Classification Models

Based on the analysis above, it is expected to be possible to classify the potatoes with or without blackheart. Full wavelength spectrums were used to develop PLS-LDA model and three types of morphological correction methods were explored for their influence on distinguishing blackheart and normal potatoes.

346 samples (125 tubers with blackheart and 221 tubers without blackheart) were selected as the calibration set and the remaining 173 samples (63 tubers with blackheart and 110 tubers without blackheart) were used as the validation set. Classification results obtained from the full wavelength spectrum are listed in Table 2. Based on MCCV for calibration set, the PLS-LDA model calibrated on raw transmittance (no preprocessing) needed 13 LVs and had low overall correct classification rates of 94.80% (330/346) and 94.22% (163/173) for calibration and validation sets, respectively. In calibration, 16 samples were found to be misclassified, with 12 normal samples being classified as defective potatoes and 6 samples determined as normal potatoes. In validation, six normal potatoes were predicted as defective potatoes and four defective potatoes were predicted incorrectly as normal. The PLS-LDA model with volume correction preprocessing needed 1 extra LV (14 instead of 13) and had lower number of misclassifications for both calibration (12/346) and validation (7/173) sets. In calibration, 10 normal samples are being classified as defective potatoes and two samples are determined as normal potatoes. And in validation, five normal potatoes and two defective potatoes were predicted incorrectly. The PLS-LDA model using weight correction preprocessing had the same classification accuracy as the model using volume correction preprocessing for calibration set; however, the performance was slightly better than the latter for validation set, and there were only four normal potatoes predicted incorrectly as defective. The PLS-LDA model calibrated on height corrected transmittance, using 13LVs, was slightly parsimonious and yielded the best performance. In calibration, only ten out of 221 normal samples were found to be misclassified as defective potatoes and two out of 125 defective samples were determined as normal potatoes. While in validation, only 3 out of 110 normal potatoes were predicted as defective potatoes and 2 out of 63 defective potatoes were predicted incorrectly as normal. The overall model accuracy was increased from 94.80% to 97.11% in calibration set and from 94.22% to 97.11% in validation set.

In terms of classification accuracy, all preprocessing methods (i.e., weight correction, height correction, and volume correction) increased the performance. Among them, height corrected transmittance is more effective. This might be due to the fact that light transmittance decreases as the pathlength increase, and height correction would reduce the effect of light path variation to a great extent on the transmittance measurements [29].

3.4. Effective Wavelengths Selection

The selection of the most relevant wavelengths has a wide variety of functions for discrimination purposes. The selection can avoid the curse of dimensionality, save the time for model development, and make the model more understandable [28, 30].

PCA was implemented to the raw transmittance and transmittance with different spectra pretreatment (weight corrected transmittance, height corrected transmittance, and volume corrected transmittance) of the calibration set to select some important wavelengths for establishing simplified models in order to classify blackheart potatoes. Samples having similar spectral characteristics tend to be projected in the same location in the principal component space. The first two PCs would cover more than 98% of the total variance of samples in calibration set for all the four types of transmittance, so that these two PCs can be used as alternatives of the 1800 variables for classification of blackheart potatoes. The interpretation of the results of PCA is usually carried out by visualizing its PC scores. Figure 5 shows the scores plots of PC1 × PC2 of the four types of transmittance. All the clusters of blackheart and normal potatoes were separated to a high extent, which revealed the feasibility of discrimination between the potatoes.

The loadings resulting from PCA were considered as an indication of the effective wavelengths that were responsible for the specific features that appeared in the corresponding scores and contributed to the classification of blackheart potatoes. The loadings of PC1-PC2 of raw transmittance in the entire spectral range were used for wavelength selection because of better clustering obtained by using these PCs as shown in Figure 6 (the loading curves of weight corrected transmittance, height corrected transmittance, and volume corrected transmittance are similar to the raw transmittance, so figures were not shown). The wavelengths corresponding to peaks and valleys at these particular principal components were selected as optimum wavelengths. Six optimum wavelengths (711, 817, 741, 839, 678, and 698 nm) were then selected as the effective wavelengths which can later be used for blackheart potato detection instead of the whole spectral range. The PC1 accounted for 94.31% of the total spectral variation. The corresponding loading vector (loading 1) had a shape similar to the mean transmittance of normal and slightly affected potatoes (Figure 3). Five wavelengths (711, 817, 741, 839, and 678 nm) were selected from this component. The PC2 accounted for 3.86% of the total spectral variation. And only one wavelength (698 nm) was selected from loading 2.

Due to the fact that chemical bonds absorb light energy at specific wavelengths, some information can be determined from the transmittance. Among the optimum wavelengths (711, 817, 741, 839, 678, and 698 nm) selected by PCA loadings, absorption at 741 nm is due to water absorption bands related to O-H stretching third overtones, and absorption at 839 nm is due to O-H stretching third combination overtones, which is attributed to sugar content [38]. 817 nm which is a maxima of the transmittance in the near infrared range is probably associated with chemical composition in potatoes. 711 nm is another maximum of the transmittance, together with 698 nm in the visible region. Both of them might represent the colour characteristics of the flesh and the skin in potatoes from appearance. 678 nm represents chlorophyll pigments; without further investigation, the only reason to choose it is to improve the model.

3.5. Simplified Models

Despite the encouraging results obtained using full wavelengths model, it is advantageous to use only a few variables for accurate, simplified, and robust classifications. In Section 3.4, six wavelengths were selected by PCA loading analysis. The selected six wavelengths were used as classifying parameters in PLS-LDA.

Table 3 is the statistics for potato samples using PLS-LDA models on the basis of the six wavelengths for the raw spectra and spectra with different morphological correction methods. From this table, it can be observed that the simplified PLS-LDA model calibrated on raw transmittance needed 6 LVs and had low overall correct classification rates of 93.60% (322/346) and 92.49% (160/173) for calibration and validation sets, respectively. In calibration, 24 samples were found to be misclassified, with 21 normal samples being classified as defective potatoes and three samples being determined as normal potatoes. In validation, nine normal potatoes were predicted as defective potatoes and four defective potatoes were predicted incorrectly as normal. The simplified PLS-LDA model with weight correction preprocessing needed 1 reduction LV (5 instead of 6) and had lower number of misclassifications for both calibration (17/346) and validation (7/173) sets. In calibration, 15 normal samples are being classified as defective potatoes and two samples are determined as normal potatoes. And, in validation, five normal potatoes and two defective potatoes were predicted incorrectly. The simplified PLS-LDA model using volume correction preprocessing had the same classification accuracy as the model using weight correction preprocessing for validation set. However, the performance was slightly better than the latter for calibration set, and there were 12 out of 221 normal potatoes predicted incorrectly as defective. The simplified PLS-LDA model with height correction preprocessing produced the best performance. In calibration, 11 samples were found to be misclassified, with ten normal samples being classified as defective potatoes and only one samples being determined as normal potatoes. In validation, five normal potatoes were predicted as defective potatoes and two defective potatoes were predicted incorrectly as normal. This multivariate model achieved overall accuracy of 96.82, 96.53% for calibration and validation sets, respectively.

It can be found from Tables 2 and 3 that, compared to using the whole spectral region, the overall identification rate decreases from 97.11% to 96.82% in calibration set and from 97.11% to 96.53% in validation set using the selected wavelengths, which is acceptable. Moreover, the simplified model only used six wavelengths with 6 LVs whereas the full wavelength model used 1800 wavelengths with 14 LVs; the former was more parsimonious. This is very important in practical applications to classify blackheart potato tubers. It enables the use of simpler devices with only several sensors to achieve fast and accurate classification performance, instead of complex, slow, and expensive devices with hundreds of wavelengths. Due to using the selected wavelengths, both accuracy and speed can be assured.

4. Conclusions

A transmission spectrum system in the visible/near infrared (Vis/NIR) region of 513–850 nm was developed to classify blackheart potatoes. Principal component analysis (PCA) was used to select sensitive wavelengths and partial least squares-linear discriminant analysis (PLS-LDA) was used to develop classification models. Height corrected transmittance was of the best performance, and the overall classification rate by PLS-LDA method decreased from 97.11% to 96.82% in calibration and from 97.11% to 96.53% in validation set when six optimal wavelengths (711, 817, 741, 839, 678, and 698 nm) were used comparing to the corresponding full wavelengths transmittance. These conclusions might be helpful to design real-time and portable systems for the classification of blackheart potatoes. However, much more potato tubers with different size, shape, and cultivars should be studied to ascertain properly the classification capability of this method.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is funded by the National Natural Science Foundation of China (no. 61275156), the Natural Science Foundation of Zhejiang Province, China (no. LY13C20014, LQ13F050006, Y3110450), Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology Research (no. 2013ZHNL03), and the Natural Science Foundation of Zhejiang A&F University (no. 2012FR085, no. 2008FR053).