Abstract

Locality preserving projection (LPP) retains only partial information, and category information of samples is not considered, which causes misclassification of feature extraction. An improved locality preserving projection algorithm is proposed to optimize the extraction of growth characteristics. Firstly, preliminary dimensionality reduction of sample data is constructed by using two-dimensional principal component analysis (2DPCA) to retain the spatial information. Then, two optimized subgraphs are defined to describe the neighborhood relation between different categories of data. Finally, feature parameters set are obtained to extract local information of samples by improved LPP algorithm. The experiments show that the improved LPP algorithm has good adaptability, and the highest SVM classification accuracy rate of this method can reach more than 96%. Compared with other methods, the improved LPP has superior optimized performance in terms of multidimensional data analysis and optimization.

1. Introduction

Nowadays, precision agriculture has become a new trend in agricultural development both at home and abroad. It has higher requirements about the intelligence, real-time, and accuracy of monitor for greenhouse crops. Moreover, it requires all-around real-time monitoring and analysis on crop condition while having a good master of growth environment. Large amounts of data will be generated in the process of collecting and extracting crop growth characteristics. These feature parameters are not independent from each other. Furthermore, the high digits and the complexity of the data will cause difficulties in data processing, such as the great amount of calculation, the increased storage space, the interference, and noise. These factors above will have a bad influence on the accurate judgment for crops growth [14]. Thus, it is necessary to optimize the crop growth data collected by various methods.

There are lots of well-developed methods for the optimization of characteristic data, and some algorithms perform well in image processing, pattern recognition, machine vision, and other fields. These methods mainly include kernel based principle component analysis (KPCA) [5], two-dimensional principal component analysis (2DPCA) [6], independent component analysis (ICA) [7], locality preserving projection (LPP) [8], and many fusion algorithms based on the algorithms mentioned above. LPP algorithm is able to effectively maintain the local features and has the nonlinear manifold learning ability which other algorithms do not have [9, 10]. However, this unsupervised feature extraction algorithm which do not consider the category information of samples easily causes misclassification. In the paper [11], SVD (singular value decomposition) is used to achieve the optimization of null space, and the discriminant locality preserving projections of element space and null space are proposed to solve the small sample problems. In the paper [12], the image structural information is incorporated into the objective function of LPP algorithm in the dimensionality reduction process, which can acquire the image structural information and achieve the projection optimization at the same time. Thus the method is able to increase the recognition rate. An algorithm is proposed in the paper [13] to improve the accuracy of recognition and overcome the disadvantage of locality preserving projection algorithm by a novel dimensionality reduction algorithm using the information of original feature samples.

In allusion to the disadvantages of LPP, an optimization method based on improved LPP model is proposed in this paper. This method is able to preserve the local information of crop feature samples and cover the overall spatial information at the same time. By using category information of samples, the relationship matrix is regenerated and the objective function is defined. Hence, the eigenvectors with higher discrimination performance and a better data optimized performance are obtained. The method is able to meet the demands for information perception of new agriculture as well as the optimization of crop growth characteristic parameters.

2. Growth Characteristics

Nowadays, the evaluation for crop growth is based on various growth characteristics. Thus, feature extraction plays an important role in the process.

On the basis of various studies of crops (most of them are leafy crops) done by our lab, we can draw the conclusion that the main growth characteristics of crops include altitude feature, area feature, shape feature, color feature, and texture feature. Every feature contains several second level indexes. Part of the crop characteristics are shown in Figure 1.

On the basis of multiple eigenvalues extracting from the 5 main characteristics mentioned above, better eigenvalues can be obtained by using the improved LPP algorithm to achieve feature optimization. Moreover, this method is able to acquire the effective crop growth information while reducing the amount of calculation. Therefore, it can improve the efficiency of both data processing and information perception.

3. Improved LPP Method

3.1. Basic LPP Method

LPP is used to achieve the dimensionality reduction of high-dimensional data while keeping the invariance of its internal local structure. LPP mapping is based on nearest neighbor graph. Thus LPP algorithm has the manifold learning ability which the general linear algorithms do not possess. The basic theory of dimensionality reduction can be described as follows.

It is assumed that a high-dimensional data set can be described as and (). Find the transformation matrix according to the necessary performance target. After the projection , the sample is mapped to a low-dimensional space (), which means the accomplishment of dimensionality reduction. Then, the original feature is described as in this low-dimensional space.

The objective function of LPP algorithm is defined as where , , and represents a similar matrix, which is used to describe the relationship between and . , and the value is determined as follows:

Under the constraint condition of , the projection matrix is the projection transformation matrix need to be acquired. It can be proved that the problem arg min can to be solved by generalized eigenvalue mentioned in the following formula:

The top () smallest nonzero eigenvalues in formula (3) constitute the projection matrix and the dimension of matrix is . After the projection , the expression of high-dimension data set can be described in the new characteristic space as .

The purpose of LPP algorithm is to find a linear subspace which can keep the local feature of original high-dimension data. After being projected via the algorithm, the adjacent sample is able to maintain the original neighboring state while the original distant samples do not keep the old state. Obviously, this result is not satisfactory for data optimization [14]. With the help of improved LPP algorithm, we are able to reduce the misrecognition rate generated by the above-mentioned reasons as far as possible. Thus, it is feasible to reach a better result of both dimensionality reduction and classification.

3.2. Improved LPP Method

Category information is ignored when seeking the adjacent relations between samples by basic LPP, and this will make it more difficult to describe sparseness of data distribution when only considering the overall neighborhoods [15]. In order to strengthen the effect of category separation, two subgraphs instead of original nearest neighbor graph are used to describe the relationship between homogeneous and heterogeneous data in this paper. Then the optimized global matrix and the improved objective function are provided, the method for feature extraction is designed, and the category information of sample data is added for LPP algorithm. After that, nearest neighbor graph is divided into gathered subgraph and separated subgraph. The relationship matrixes and objective functions of these subgraphs are defined as follows.

3.2.1. Gathered Subgraph

Gathered subgraph is improved on the basis of -nearest neighbor graph. The addition of category information makes the homogeneous nonadjacent samples stay closer to each other after projection. The relationship matrix of two samples is defined as : where represents homogeneous weight factor which is determined by experience.

According to the relationship matrix defined above, the objective function needs to be minimized to achieve the aggregation of the data. Consider

Formula (5) can be simplified as follows: where , , and represents a Laplacian matrix.

3.2.2. Separated Subgraph

The application of -nearest neighbor graph may reduce the accuracy of classification when the data are projected into low-dimensional space. The separated subgraph is constructed to solve this problem.

The relationship matrix of two samples is defined as : where represents homogeneous weight factor which is determined by experience.

According to the relationship matrix defined in formula (7), the objective function in formula (8) needs to be maximized to achieve the segregation of the data:

Formula (8) can be simplified as follows: where , , and represents a Laplacian matrix.

Combining gathered subgraph and separated subgraph, we can get the formula shown as follows:

Formula (10) can be simplified:

The projection matrix which meets the demands of minimized objective function is exactly the required projection matrix. The proximity relation among data can be maintained when the high-dimensional sample information is projected into low-dimension space by the improved LPP algorithm. Moreover, this method is able to make two homogeneous nonadjacent samples more closer while making two heterogeneous adjacent samples more remoter. Thus the accuracy of optimization and classification can be improved.

4. Optimizing Process Based on Improved LPP

The optimization model for crop growth characteristics based on improved LPP algorithm is introduced as follows.

It is assumed that matrix represents the sample matrix reflecting the crop growth characteristics:

The dimensionality reduction based on improved LPP algorithm is achieved by mapping the sample information of -dimensional space into -dimensional space . The map is shown as follows:

The optimizing process for multidimensional crop growth characteristics based on improved LPP algorithm is shown as follows.

Step  1. Standardize the multidimensional crop growth characteristics data.

Step  2.   Obtain the characteristic matrixes after completing dimensionality reduction using 2DPCA algorithm.

Step  3. Construct two types of optimized subgraphs. Then obtain the matrixes , , and .

Step  4.  Acquire the projection matrix by solving the problem .

Step  5.  Via the projection , we can know that the characteristic sample set is expressed in a new characteristic space as . Thus, the samples are projected into a low-dimensional space .

The process of the algorithm proposed in this paper is shown as in Figure 2.

5. The Experimental Results and Analysis

5.1. The Effect of Dimensionality Reduction

In order to evaluate the performance of improved LPP algorithm to achieve dimensionality reduction and optimizing for crop growth characteristics, a set of data from pakchoi is chosen to act as test sample. These data come from 30 pakchoi, and the feature data of each pakchoi include 30 eigenvalues.

When the 2DPCA algorithm is used to process the data, we carry out dimensionality reduction according to the standard that the accumulating contribution rate is greater than 95%. Then we get the new low-dimensional characteristic matrix. Carry out dimensionality reduction for test sample using 2DPCA algorithm. According to the descending order of accumulating contribution rate, choose the top 30 principal components of which the total rate is more than 96.48% to generate the characteristic matrix . Then use the improved LPP algorithm to achieve dimensionality reduction, and choose 5 as the number of neighboring. Optimize the 30-dimensional feature data obtained by 2DPCA algorithm to acquire the final 20-dimensional crop growth characteristics. The efficiency comparison among different algorithms is shown in Table 1.

According to Table 1, when different algorithms are used to process the same set of data, the time consuming for each algorithm is , the runtime of LPP algorithm is the shortest, and 2DPCA, ICA, and PCA use much more time. Moreover, considering the complexity of these algorithms, the runtime of improved LPP algorithm is not much longer than the one of LPP algorithm. Hence, the improved LPP algorithm meets the requirements of data processing.

5.2. Classification Test by SVM

By analyzing the performance of improved LPP algorithm for dimensionality reduction, the data of some pakchoi and lettuces are chosen to act as test data. Contrast experiments using different algorithms (PCA, 2DPCA, ICA, KPCA, LPP, and improved LPP) for dimensionality reduction are carried out. Achieve dimensionality reduction for every test data in database via the above-mentioned algorithms. Meanwhile, accomplish data classification by SVM after accomplishing dimensionality reduction. Crop growth characteristics can be described by multiple eigenvalues of altitude feature, area feature, shape feature, color feature, and texture feature. The description of test data is shown in Table 2.

SVM classifier is an implementation of the structure risk minimization principle in the statistical learning theory; it can not only solve linearly separable problem, but also solve nonlinear separable problem. At the same time, it can control the small sample, high dimension pattern situation to study the generalization performance and gives a unique solution.

The optimal classification decision function is where represents the symbol function; represents the sample label; represents Lagrange factor corresponding to the support vector; represents the input sample vector; represents the support vector; represents Kernel function; represents the threshold classification; represents the number of support vectors.

The kernel function of the classifier using radial basis kernel function is shown as follows:

The sample solution in formula (13) is sent to the training SVM classifier; the support vector which is obtained is defined as (); formula (14) can be written as

The SVM algorithm applied in this paper employs LIBSVM software package. The test results of these two crops are shown as follows.

The classification result of pakchoi samples obtained by 6 algorithms mentioned above is shown in Figure 3. The dimensionality ranges from 0 to 60. According to Figure 3, the accuracy and stability of PCA algorithm are poor when the dimensionality is low. After that, it tends towards stability. Although ICA is relatively stable, the overall dimension precision is relatively low; the fluctuation of KPCA is large in both the low dimension and high dimension; the overall accuracy of this method is unsatisfactory. The overall trend of 2DPCA algorithm is the same with LPP algorithm, but the stability of accuracy is poor when the dimensionality ranges from 34 to 47. There are still fluctuations after reaching the 47 dimensions, but the fluctuation is small. When it comes to LPP algorithm, the stability of accuracy is always poor, and when reaching the 40 dimensions, the amplitude was not significantly reduced. On the contrary, the accuracy of improved LPP algorithm reaches a high level when the dimensionality is greater than 20. After that, the accuracy tends towards stability. Thus its performance is better than the other five algorithms.

The classification result of lettuce samples obtained by 6 algorithms mentioned above is shown in Figure 4. The dimensionality ranges from 0 to 50. According to Figure 4, the classification accuracy of 2DPCA algorithm is lower than the one of PCA algorithm when the dimensionality ranges from 0 to 17, but the accuracies of both algorithms fluctuate in some degree when the dimensionality ranges from 23 to 45; 2DPCA algorithm has better performance than PCA algorithm after reaching 23 dimensions; the dimension reduction trend of ICA and KPCA is equal, but KPCA has higher accuracy than that of ICA, KPCA, and ICA. The number of dimensionality reduction in the 20 dimensions has drastic fluctuations and maintains a balance. The accuracy of LPP algorithm reaches a high level and it trends towards stability when the dimensionality is low. Moreover, the accuracy fluctuates acutely only when the dimensionality ranges from 10 to 20.Therefore, its performance is better than PCA and 2DPCA algorithms. Compared with LPP algorithm, the improved LPP algorithm reaches the optimal performance when the dimensionality is about 24 and does not fluctuate acutely on the whole. In general, the improved LPP algorithm has the best performance.

Table 3 makes comparisons among the 4 methods for achieving dimensionality reduction. The biggest difference lies in optimal dimensionality (O-D) and accuracy. The results of pakchoi samples are much better because of the more data collected in the experiments.

When it comes to the data from pakchoi samples, LPP algorithm and 2DPCA-LPP algorithm have significant advantages. The accuracy of PCA algorithm is low; the accuracy of nonlinear mapping of the KPCA is higher than that of linear PCA, 2DPCA, and ICA, while the accuracies of LPP algorithm and improved LPP algorithm are both higher than 96%. Moreover, the dimensionality of them is much less than that of ICA and PCA algorithm. The optimal dimensionality of improved LPP algorithm only accounts for 67.3% of the optimal dimensionality of PCA algorithm and 74% of the optimal dimensionality of ICA and 77.1% of the optimal dimensionality of 2DPCA. When it comes to the data from lettuce samples, the accuracy of improved LPP algorithm is only 0.01% below the accuracy of LPP algorithm while its dimensionality is much less than those of others (only accounting for 67.3% of the dimensionality of PCA algorithm). By comparing the data, it is obvious that the improved LPP algorithm has excellent performance in terms of dimensionality reduction, and it is able to reach the optimal classification accuracy in a low feature dimension.

6. Conclusion

An optimization method for crop growth characteristics based on improved locality preserving projection is studied in this paper. Combining with the novel monitoring system, a series of characteristic groups to evaluate the growth conditions of leafy crops in greenhouse are generated. Achieve the preliminary dimensionality reduction for the data from LPP algorithm via 2DPCA algorithm. Meanwhile, optimize the relation matrix of LPP algorithm by using the two types of optimized subgraphs. Further describe the neighborhood relation among different types of data to put forward the improved LPP algorithm for optimizing the crop characteristics data. Both local characteristics and global characteristics are taken into consideration in the improved algorithm. Through comparison and analysis, it is concluded that the improved LPP algorithm has efficient optimal performance, pretty stability, and adaptability. Therefore, the performance of improved LPP algorithm is better than other algorithms when it is applied to data optimization and analysis of crop growth characteristics.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.