Abstract

The vitality of corn seeds is a significant indicator for assessing the quality and yield of crops. In recent years, numerous information technologies have been adopted to analyze the seed vitality and provide support for efficient equipment. However, there are still some shortcomings in these technologies, which decrease the accuracy of identifying the seed vitality for various practical applications. In this paper, a synthesized classification method for seed vitality was proposed based on multisensor hyperspectral imaging. Firstly, hyperspectral images in the range of 370-1042 nm were collected for waxy corn seeds, which were subjected to aging processing with four periods of time (0, 3, 6, and 9 d). Besides, some preprocessing techniques including standard normal variate, multiplicative scatter correction, Savitzky-Golay smoothing, and first-order and second-order derivatives were employed to suppress noise interference in raw spectra. In addition, principal component analysis (PCA), 2nd derivatization, and successive projection algorithm (SPA) were adopted to select feature wavelengths. Moreover, SVM classification models based on full spectra and feature wavelengths were established. The results showed that, based on feature wavelengths selected by SPA, the SVM model preprocessed by multiplicative scatter correction (MSC) had the optimal performance. The training accuracy and testing accuracy of this model were 100% and 97.9167%, respectively. RMSE was 0.018 and was 0.875. Therefore, it can be demonstrated that the pattern recognition algorithm could achieve a high accuracy in classifying accelerated aging seeds. This algorithm provides a new method for machine learning (ML) in nondestructive detection of crops.

1. Introduction

Seed vitality is one of the most important parameters that is directly related to seed germination performance and seedling emergence [1]. A suitable method for seed vitality detection can help farmers and seed companies reduce the deficit and favorably engage in agricultural production activities. The traditional method mainly relies on a manual germination test to distinguish seed vitality. It is time-consuming, inefficient, and inaccurate. Therefore, there is an urgent demand for a rapid and high-accuracy method for seed vitality detection.

The methods for seed vitality classification mainly contain chemical/biological methods and hyperspectral imaging methods, apart from the manual germination test. Mcdonough et al. studied the vitality change trend of corn, sorghum, and sorghum flour under different aging grades by detecting the biological features of seeds and the chemical composition obtained by gel chromatography [2]. Cheyed measured the activities of amylase, phospholipase, protease, and phytase and explored the viability of wheat seeds in different storage periods [3]. The hydrogen peroxide (H2O2), ascorbic acid, and activity of catalase were determined to distinguish the stigma vitality of the rice [4]. RNA sequencing and DNA affinity purification sequencing analysis were performed to probe into the molecular mechanism of the rice seed germination [5]. Wei et al. investigated the protein and ultrastructure of the cotyledon and embryo, in an attempt to classify the quality of different soybean seeds [6]. Although these methods are objective and accurate, they have such common disadvantages as heavy workload, inefficiency, and high professional requirements. Therefore, it can be hypothesized that they cannot detect the seed vitality efficiently.

Hyperspectral imaging is an emerging technique that integrates both spectroscopic and imaging techniques into one system [7]. It can be employed to reflect the internal information of seeds [8]. In recent years, it has been widely used in seed detection and has achieved excellent results. Yang et al. built a predicted model to detect whether the sugar beet seeds can germinate based on hyperspectral reflectance [9]. Şentaş et al. conducted an investigation into the yield of soybean (Glycine max) seeds based on the hyperspectral reflectance, which had favorable robustness [10]. Zhang et al. constructed models with spectral data to quickly, nondestructively, and accurately determine the germinated power of seeds [11]. Dumont et al. evaluated the seed quality of Norway spruce. They divided these seeds into three categories based on sparse logistic regression feature selection, and the accuracy of spectrum measurement reached 99% [12]. Hyperspectral technology provides a new way for rapid and nondestructive detection of seeds. However, each sample in the hyperspectral remote-sensing image has high-dimensional features and contains rich spatial and spectral information, which dramatically increase the difficulty of feature selection and mining [13]. With the advancement of artificial intelligence (AI), intelligence has been considered as the major challenge in promoting the economic potential and production efficiency of precision agriculture [14].

As an AI method, ML can effectively solve the problem of hyperspectral information feature selection and mining. Since the 1990s, AI has received extensive attention and has been adopted as a new learning method. The purpose is to determine the rules contained in a series of known samples, so that the machine can acquire a certain self-learning ability for unknown samples. A learning method, named support vector machine (SVM), has been developed on the basis of ML theory. It is a novel small sample learning method and can avoid the traditional process from “induction” to “deduction.” SVM could simplify the usual classification problems and has presented multiple advantages over existing methods. Scholars maintain that SVM will strongly promote the development of ML theories and technologies [15].

SVM was the first classifier developed from the generalized portrait algorithm in pattern recognition. It was proposed by Soviet scholars Vladimir N. Vapnik and Alexander Y. Lerner in 1963 [16]. With the progression of theoretical research, SVM is gradually theorized and becomes a part of statistical learning theory. After decades of technical accumulation, SVM has been extensively applied in the field of classification and regression, including portrait recognition, text classification, handwritten character recognition, and bioinformatics.

In recent years, SVM has also been widely used in seed detection and has achieved excellent results. Baek et al. developed an SVM model to detect those rice seeds stained with lesions. The results showed that it was feasible to screen diseased rice seeds based on ML algorithms and spectral imaging technology [17]. Pattern recognition technology and data mining methods have become hotspots in chemometrics. SVM has been employed to classify different corn seeds based on spectral data. It has a high classification accuracy, which demonstrates the effectiveness of this method [18]. Despite the fact that significant efforts have been made to conduct investigations with respect to precision agriculture in previous studies, there remains a lack of a mature detection method for corn (Jingke 2000), an important economic crop, based on hyperspectral technology. Hence, the main purpose of this study is to explore the utilization of the SVM method to achieve rapid detection of the vitality of waxy corn seeds under different aging degrees. The main research contents are elucidated as follows. (1)Elaborate on the relationship between seed vitality level and artificial aging time through standard germination tests, which could provide an experimental basis for the model construction(2)Obtain the hyperspectral data of waxy corn seeds under different accelerated aging periods of time and use five preprocessing methods, including S-G smoothing, MSC, and SNV(3)Adopt PCA and two other methods to filter feature wavelengths for the subsequent classification models(4)Construct and compare SVM classification models under different pretreatment and feature selection methods, in an attempt to select the optimal model to identify waxy corn seeds

The ML methods can be applied to the processing of hyperspectral imaging data, so as to realize the detection of seed vitality. Puneet et al. compared data visualization methods, such as principal component analysis (PCA) with multidimensional scaling (MDS), etc. They divided six kinds of tea into three different processing degrees according to the near-infrared spectral information [19]. Baek et al. tested several spectral preprocessing methods, such as continuous wavelet transform and feature selection methods, and improved the predicted accuracy by partial least square regression (PLSR) [20]. Wang et al. proposed an orthogonal signal correction (OSC) method for noise reduction and applied SPA to select the optimal wavelengths. A favorable PLSR model was obtained and can be employed to predict seed hardness based on hyperspectral imaging [21]. Insuck et al. used variable importance in projection (VIP) to remove redundant information and reduce the computation time for data processing. Two kinds of soybeans were classified by partial least square discrimination analysis (PLS-DA) based on spectra, which was confirmed to be feasible and effective [22]. These researchers continue to optimize the PLS model; for example, the PLS model processed by the orthogonal signal correction (OSC) method reduces the calculation factor and improves the accuracy. Meanwhile, the PLS model processed by the variable importance in projection (VIP) method removes redundant information and improves the efficiency of the model. Nevertheless, the accuracy of the model is not significantly improved. Additionally, the PLS method is more suitable for the construction of a linear model, and it cannot accurately simulate the nonlinear relationship between seed vitality and spectra. However, the SVM model based on finding the optimal hyperplane for the feature space division is more suitable for analyzing the correlation between seed vitality and spectra. There are only a few support vectors that can be adopted to determine the classification results, due to the fact that they can grasp the key samples and remove the majority of redundant samples. The SVM method is characterized by high usability and better “robustness.” Therefore, the SVM model was adopted in this study to classify the waxy corn seeds with different vitality levels based on hyperspectral imaging. Further, some other parameters were also analyzed.

3. Materials and Methods

The detailed flowchart of using SVM to detect the seed vitality of waxy corn is shown in Figure 1. The ultimate goal is to select the optimal SVM model combined with other pretreatment methods to detect seed vitality rapidly.

3.1. Modeling Methods
3.1.1. Preprocess Methods

After selecting the region of interest (ROI) of all corn seeds, the spectral data were preprocessed with five methods: MSC, SNV, S-G smoothing, 1st derivative, and 2nd derivative.

(1) Multiplicative Scatter Correction (MSC). MSC can be employed to compare the difference between the ideal spectra and the actual one [23]. It uses linear regression to achieve baseline correction and drift correction, which can correct the effects of scattering.

The basic steps are presented as follows. (1)Average spectra calculation: (2)Linear regression analysis: where represents the matrix of all spectral data, represents the spectral reflectance of the -th sample, and and represent the slope and intercept parameters, respectively between and the average spectra.(3)Correction using slope and intercept : where represents the spectral matrix after multiplicative scatter correction. This method can be employed to successfully correct all spectra and reduce the effects of scattering.

(2) Standard Normal Variate (SNV). SNV is a process in which each spectral curve is processed by standard normal transformation. It can weaken such interfering effects as scattering effects and light path changes [23]. It is more suitable for processing spectral data with large differences in samples. It can be assumed that in each spectrum, the spectral absorption value of each wavelength satisfies some conditions, such as a normal distribution; SNV can be employed to perform standard normal transformation processing on each spectrum, namely, where represents the original spectra value, represents the average value of the spectra, and represents the standard deviation of the raw data.

(3) Savitzky-Golay Smoothing. S-G smoothing is a filtering method based on local polynomial least square fitting to eliminate high-frequency random errors. Its prominent feature is that it can keep the shape and width of the signal unchanged while filtering the noise [24]. The basic theory can be illustrated by Figure 2.

A column of data is represented by solid dots in the figure. A set of data can be considered to be centered on . It can be fitted with the following polynomial:

The residual of the least square fit is

Only the constant term of the fitted polynomial needs to be obtained. It can be realized by convolution operation:

(4) Derivative Analysis. The 1st-derivative and 2nd-derivative analyses can be adopted to reduce baseline correction and smooth background interference. It provides higher resolution and clearer spectral profile changes than the original spectra. The basic principle is to calculate the derivative of the spectral information [25].

The 1st-derivative formula is as follows:

The 2nd-derivative formula is as follows:

3.1.2. Principal Component Analysis (PCA)

The basic principle of principal component analysis (PCA) is to explore the correlation between multiple variables. The main principle of PCA is to project the original features onto the information-rich dimension to achieve dimensionality reduction, and the first few principal components (PCs) contain much of the useful information [26].

It can be assumed that the data have a total of ; if you want to reduce it to the dimension, the basic operation of PCA is as follows: (1)Centralize all features and remove the mean value: (2)Calculate the covariance matrix (3)Calculate the eigenvalues of the covariance matrix and the matching eigenvectors(4)The original feature is projected onto the feature vector; the unit feature vector corresponding to the largest feature values is ; and the new -dimensional feature after dimensionality reduction is obtained. The output is the projection matrix .

3.1.3. Feature Extraction Methods

The amount of spectral data obtained from hyperspectral images is very large and contains a lot of redundant information, which will increase the burden of data processing and affect the accuracy of the model [27]. Therefore, it is necessary to select feature wavelengths to reduce input and improve the performance of models [28].

The peaks and valleys with significant differences in the 2nd-derivative spectra can be selected as the feature wavelengths. It can eliminate interference from other backgrounds and highlight useful information in the spectra [29].

The successive projection algorithm (SPA) can be employed to compare the projection vectors of different wavelengths on others and extract the largest vector as the feature wavelength [30]. During the operation of this algorithm, a wavelength would be arbitrarily selected at first; then, the wavelength corresponding to the vector with the largest projection would be put into the variable group; it would operate in a circular selection mode until the end of the last variable.

The principle of the continuous projection algorithm is as follows. Let be the spectra matrix of the collected samples, where represents the number of samples, represents the number of wavelengths, and represents the number of variables: (1)First, select columns from the spectral data to form the spectral matrix (2)The remaining spectra are aggregated: (3)Calculate the projection of the column vector: (4)Let ; if , return to equation (11), and the final feature wavelength is

3.1.4. Discriminant Model

The support vector machine (SVM) can be employed to map the raw data from a low dimension to a higher dimension and utilize hyperplanes to define the decision boundaries for classification [31]. The core of the SVM algorithm is to determine the optimal hyperplane separation class. The basic theory is illustrated as Figure 3.

This hyperplane can be described by the following equation: where and represent the normal vector and offset of the hyperplane, respectively. In the training process, the determination of the optimal hyperplane can achieve the maximum discrimination of training samples while minimizing misclassification. The two types of support vectors in the two planes parallel to the optimal hyperplane are defined as . Thus, the solution of this optimal plane can be transformed into a constrained optimization problem: where represents the distance between the misclassified sample and the corresponding classification hyperplane. represents the penalty coefficient, which determines the importance of the outlier value. According to Lagrange duality [32], the optimal hyperplane can be expressed as a combination of linear variances: where represents the Lagrange multiplier, corresponding to the correlation coefficient of each sample, and the variation range is between 0 and .

Kernel function is important for conducting SVM [33]. There are some popularly used kernel functions, such as polynomial kernel function, linear kernel function, sigmoid kernel function, and radial basis function (RBF). Among them, RBF has the advantages of high efficiency and fast approaching speed [34]. Simultaneously, according to the feature of small feature dimension and the normal number of this data sample, the derivation formula can be expressed as follows: where and represent the mapping functions of objects and , and represents the width parameters of the function.

3.1.5. Model Evaluation Index

In this study, the quality of the corresponding model was evaluated based on accuracy, root mean square error, and coefficient of determination.

(1) Accuracy. Accuracy indicates the percentage of the number of positive and negative samples that is correctly predicted to the number of all samples [35]. The overall prediction situation of the applied model can be presented. It can be expressed as follows: where represents true if it is a positive sample and the model prediction is also a positive sample, represents true if it is a negative sample and the model prediction is also a negative sample, represents true if it is a negative sample but the model prediction is a positive sample, and represents true if it is a positive sample but the model prediction is negative sample.

(2) Root Mean Square Error (RMSE). RMSE subtracts the predicted value from the actual value, finds the square of the square and adds the average value, and finally opens the root sign [36]. The model detection ability is stronger when the RMSE is smaller. When the RMSE is 0, that is, the predicted value is completely equal to the true value, the model performance is optimal; the larger the RMSE, the greater the prediction error of the model.

The RMSE calculation formula is as follows:

(3) Coefficient of Determination (). can also be called the coefficient of determination. It is mainly employed to evaluate the stability of the model based on the mean value and compared with the prediction error [37].

The calculation formula is as follows: where represents the sum of squares due to regression and represents the sum of squares due to errors.

When , the predicted value of this model is equal to the true value of the sample, and the error is 0; when , the predicted value of this model is equal to the mean value of the sample. can reflect the stability of the model. The more tends to 1, the more stable the model.

(4) Receiver Operating Characteristic (ROC) Curve. The ROC curve is a comprehensive index that reflects the sensitivity and specificity of continuous variables. It can be employed to calculate the sensitivity and specificity by setting different critical values and generating the ROC curve. The larger the area under the ROC curve (AUC), the better the performance of the model.

The formula for calculating the horizontal axis value is

The formula for calculating the vertical axis value is where is the false positive rate and is the true positive rate.

3.2. Data Analysis Materials
3.2.1. Sample Preparation

The variety of corn seeds used in this study is a kind of waxy corn named Jingke 2000. The residual granules, overly dry granules, and impurities were removed in advance. A total of 384 seeds were selected as samples, and they were randomly divided into four groups (96 samples in each group). Information on the samples is shown in Table 1. One group was used as a control group, and the other three groups were placed in an artificial aging box for 3, 6, and 9 days of aging treatment. After the treatment was completed, 48 seeds in each group were randomly selected for standard germination analysis, and the hyperspectral data of each seed of the other half of the samples were collected every 5 minutes.

3.2.2. Standard Germination Tests

Before obtaining the hyperspectral image, the selected samples were tested for germination according to the International Seed Testing Association (ISTA) standard [38]. The corn seeds were cultured for 10 days at 25°C with a relative humidity of 99%. The sprout length was manually measured, and the germination rate was calculated according to the ISTA standard.

3.2.3. Hyperspectral Imaging System

The experiment was performed using a hyperspectral imaging system (Figure 4(a)) composed of a hyperspectral imager, two halogen lamps, a mobile platform, and a computer. The hyperspectral imager SOC 710-VP manufactured by Polytec, France, was used to obtain reflected light from the seeds. It could cover a spectral range of 370–1042 nm with a spectral resolution of 4.7 nm and spatial resolution of pixels. Each of the 24 corn seeds was placed in a specific sample holder with 24 holes (Figure 4(b)). The exposure time of the imaging system was set to 3 ms, and the platform movement speed was 20 mm/s. The obtained spectra were processed by dark and white calibration initially to calibrate the system. The postdata processing and model construction were performed by the MATLAB R2018a software (The Math Works, Natick, MA, USA).

4. Results

4.1. Germination Test Analysis

Table 2 lists the germination rate and average sprout length of waxy Jingke 2000 with different aging periods of time. It can be seen that during the aging process of corn seeds from 0 to 9 days, both the germination rate and the sprout length decrease with the increase of aging time.

The overall results show that the longer the aging time, the relatively lower the seed vitality. This indicates that it is justified to prepare samples of different vitality classes by varying aging periods of time in this study. These findings provide experimental support for later seed vitality detection with SVM at different levels of aging.

4.2. Spectral Profile

The raw spectral reflectance curves of all corn samples in the spectral range of 370–1042 nm and the average spectra of the samples at four different aging times are shown in Figures 5(a) and 5(b), respectively. The corn seeds in the same variety have similar change trends of the spectral reflectance curves under different aging processes. The spectral reflectance gradually increases in the range of 400-650 nm, with a clear absorption peak near 650 nm; then, the reflectance decreases, reaching a minimum value in the first stage near 700 nm. After that, the curve rises sharply, reaching a maximum around 750 nm; then, it falls and rises again, reaching a minimum in the second stage around 850 nm; subsequently, the reflectance continues to fall.

The average spectra of corn seeds preprocessed by S-G smoothing and 2nd derivative are shown in Figure 6. A five-point model was applied in S-G smoothing. Compared with the unpretreated spectra, it shows that after S-G smoothing, the spurs in the original curve are obviously eliminated and the noise is smoothed. Moreover, the preprocessing with the 2nd-derivative algorithm also removes some high-frequency noise and mutual interference of different components. Therefore, it can be concluded that both the S-G smoothing and the 2nd-derivative algorithm could improve smoothness and reduce noise interference. They can be used to preprocess other continuous and irregular data.

4.3. PCA

PCA was conducted on the average spectra of corn seeds, in an attempt to obtain the weight coefficients of different PCs. The first three PCs were adopted for qualitative analysis, due to the fact that they contained much of the corn seed information, with 99.98% explained variance for Jingke 2000 (95.06% for PC1, 2.83% for PC2, and 0.94% for PC3). The contribution rate of PC1, PC2, and PC3 is shown in Figure 7(a). The weight coefficients of PC1, PC2, and PC3 are shown in Figures 7(b)7(d), respectively. Similarly, the weight coefficients of PC2 and PC3 can also be obtained. The first three PCs were selected as the input of the models for the further processing.

4.4. Classification Models Based on Full Spectra

An SVM model was constructed to detect corn seeds with different aging periods of time. Besides, the sample data of each grade were randomly divided into a training set and a testing set at a ratio of 3 : 1 (the training set contained 36 seeds, and the testing set contained 12 seeds). The RBF kernel function was applied to this model. The grid optimization method was used for parameter optimization. The penalty coefficient was 100 and the regularization coefficient was 10.

The overall classification results are shown in Table 3. As shown in Table 3, the overall classification accuracy of the training set for the waxy corn under PCA is relatively higher than that for the raw data. The training accuracy is 93.75% and the testing accuracy is 89.5833%. It can be inferred that PCA can reduce the dimension of the data and increase the accuracy of the classification model. As for the PCA-SVM model, RMSE is 0.0262 and is 0.834, which improves the stability of model detection errors. Due to the fact that the spectral data processed by PCA can represent as many data features as possible without relying on too many components, the calculation time of the later model is significantly reduced, and the redundant information in the raw data is removed at the same time.

4.5. Feature Wavelength Selection

In this study, the 2nd derivative and SPA were adopted to select feature wavelengths. The RMSE of different wavelengths selected after SPA preprocessing and the selected variables are shown in Figure 8. The number of wavelengths with a locally lowest RMSE value is regarded as the number of feature wavelengths. The selected feature wavelengths are shown in Table 4. A total of 20 and 8 feature wavelengths are obtained, finally, to reduce the data volume.

4.6. Classification Models Based on Feature Wavelengths

Table 5 shows the detection results of the SVM model constructed based on the feature wavelengths selected by the 2nd derivative and SPA. The classification accuracy of both the training set and the testing set is over 95%, which is significantly higher than the overall results based on the full spectra. The highest accuracy of the training set is 98.75%, and that of the testing set is 97.1111%. As for the SPA-SVM model, RMSE is 0.0238 and is 0.9435. Through the feature wavelength selection, the effective information is highlighted, so that the accuracy of the model is improved. Therefore, it is of great significance to adopt a classification model based on feature wavelengths. Meanwhile, SPA is more suitable for feature wavelength selection.

Figure 9 shows the ROC curves of the two classification models. The AUC of the SPA-SVM model is higher, reaching 0.9783, while the AUC of the 2nd-derivative-SVM model is 0.9416. Both the classification accuracy and the AUC of the SPA-SVM model are higher than those of the 2nd-derivative-SVM model, which indicate that the SPA-SVM model has better performance and generalization ability in the classification of corn seeds.

The overall results of SVM models using the feature wavelengths selected by SPA spectra with different processing methods are listed in Table 6.

The accuracy of the training set is higher than that of the model based on the raw spectra, all over 90%. The accuracy of the testing set is over 85%, which indicates that the SVM classification model constructed based on hyperspectral data can effectively achieve the detection of corn seeds with different vitality levels. The results of the MSC method are better than those of other pretreatment methods. The accuracy of the training set is 100% and the accuracy of the testing set is 97.9167%, followed by the 2nd derivative, with the accuracy of 96.5278% in the training set and 95.8333% in the testing set. Besides, the RMSE of the two is the lowest among all the methods, and is the highest. In terms of the MSC-SPA-SVM model, RMSE is 0.018 and is 0.875. In terms of the 2nd-derivative-SPA-SVM model, RMSE is 0.0343 and is 0.888. It can be proven that the SVM model based on feature wavelengths is better after being preprocessed with the MSC and the 2nd derivative. Further, the classification accuracy is significantly improved compared with those based on the raw spectra data. Moreover, the robustness of the models has also been improved. The preprocessing method has great significance for improving the accuracy of the classification models.

5. Discussion

As mentioned above, several preprocessing and feature wavelength selecting methods are employed to construct the SVM model based on the spectral information of Jingke 2000 waxy corn. As is revealed from Table 3, the SVM model based on PCA could achieve a significantly higher accuracy in the training set and the testing set compared with the model based on the raw spectra. The spectral data processed by PCA can represent as comprehensive features as possible with fewer PCs. The redundant information in the raw data is removed, and the time of the subsequent model calculation is significantly reduced. From Tables 3 and 5, it can be concluded that the detection results of the model based on the feature wavelengths have been improved compared with the model based on the full wavelengths. This indicates that feature extraction can eliminate some irrelevant information in the spectral data and make the remaining features more obvious. The ROC-AUC curve of the classification model based on the 2nd derivative and SPA shows that the classifier has better generalization performance under different thresholds. The ROC curve of the SPA-SVM model is closer to the upper left corner of the coordinate and has a larger AUC, indicating a higher accuracy. SPA can be employed to compare the projection vectors of different wavelengths on the others and extract the feature wavelengths corresponding to the largest vector. It can achieve a better performance in minimizing the correlation of the wavelengths and extracting features than other methods. After extracting the feature wavelengths by SPA, different preprocessing methods are applied to the performance comparison. The results suggest that the accuracy of the model has been further improved, with MSC achieving the optimal performance, followed by the 2nd derivative. The MSC method can be employed to remove noise and stray light interference from the raw spectra. The 2nd derivative can be adopted to eliminate the noise and the baseline drift, through which the features of the spectral curve can be highlighted. Other methods can only be adopted to smooth the noise horizontally, with a poor performance. The SVM model for the classification of waxy corn seed vitality with MSC preprocessing and the SPA method to extract the feature wavelengths could achieve the optimal performance. The accuracy of the training set is 100%, the accuracy of the testing set is 97.9167%, and there are few samples with incorrect classification. RMSE is 0.018 and is 0.875, which means that the deviation is also within the acceptable range, thus ensuring the robustness of the model. This method is faster and more nondestructive than traditional and chemical methods. The SVM model has improved the accuracy and robustness compared with other ML methods, such as PLS. Therefore, the MSC-SPA-SVM model is of great significance for the classification of Jingke 2000 waxy corn seeds.

6. Conclusions

The SVM model based on hyperspectral imaging is quite effective for detecting the waxy corn seeds with different vitality levels. The detection results of the SVM model based on the feature wavelengths combined with different preprocessing methods are generally better than the model based on the full spectra. The MSC-SPA-SVM model could achieve the highest accuracy of 97.9167% in the testing set.

Through this study, it can be concluded that it is feasible to use ML algorithms to detect the vitality of waxy corn seeds. They are fast and nondestructive during the classification compared with traditional methods. Besides, they can achieve a higher accuracy, showing great potential in the future.

However, there are some limitations and threats in the results due to the restriction of the experimental environment and other conditions. The spectral imager in this study could only cover 370 nm-1042 nm, and the wavelength range should be expanded to enrich the spectral features of the seed vitality. Therefore, it is required to explore the application of SVM and other ML algorithms in crop detection. Besides, the structure and parameters to improve the classification accuracy and versatility of the model should be optimized. In the future, it is necessary to explore its potential in data modeling, predictive analytics, and deep-learning methods [3943]. Moreover, we will conduct further application explorations for the development of other fields including crop identification and food hazard detection [4449].

Data Availability

All data included in this study are available upon request by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

Conceptualization was overseen by Jinghua Wang and Lei Yan; methodology was overseen by Jinghua Wang; software was handled by Jinghua Wang; validation was overseen by Jinghua Wang and Fan Wang; formal analysis was handled by Lei Yan and Fan Wang; investigation was overseen by Lei Yan and Fan Wang; resources were overseen by Lei Yan; data curation was handled by Jinghua Wang and Lei Yan; original draft preparation was handled by Jinghua Wang; review and editing were handled by Jinghua Wang and Shanshan Qi; visualization was overseen by Jinghua Wang; supervision was handled by Lei Yan; project administration was overseen by Lei Yan; funding acquisition was handled by Lei Yan. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This research was funded by the National Key Research and Development Program of China (No. 2021YFD2100605), the National Natural Science Foundation of China (Nos. 62006008, 62173007, and 31770769), and the Fundamental Research Funds for the Central Universities (No. 2015ZCQ-GX-03).