Abstract

In order to achieve the accuracy of mango grading, a mango grading system was designed by using the deep learning method. The system mainly includes CCD camera image acquisition, image preprocessing, model training, and model evaluation. Aiming at the traditional deep learning, neural network training needs a large number of sample data sets; a convolutional neural network is proposed to realize the efficient grading of mangoes through the continuous adjustment and optimization of super-parameters and batch size. The ultra-lightweight SqueezeNet related algorithm is introduced. Compared with AlexNet and other related algorithms with the same accuracy level, it has the advantages of small model scale and fast operation speed. The experimental results show that the convolutional neural network model after super-parameters optimization and adjustment has excellent effect on deep learning image processing of small sample data set. Two hundred thirty-four Jinhuang mangoes of Panzhihua were picked in the natural environment and tested. The analysis results can meet the requirements of the agricultural industry standard of the People’s Republic of China—mango and mango grade specification. At the same time, the average accuracy rate was 97.37%, the average error rate was 2.63%, and the average loss value of the model was 0.44. The processing time of an original image with a resolution of 500 374 was only 2.57 milliseconds. This method has important theoretical and application value and can provide a powerful means for mango automatic grading.

1. Introduction

Mango is an important economic crop in southeast China. It is native to tropical areas, and its shape is similar to eggs and kidneys. It is rich in vitamins. As the second largest country in the world in mango production, China has established mango plantations in Sichuan, Yunnan, and Hainan, which have become the local supporting industries. With the rapid development of mango planting industry and people’s increasing demand for mango quality, the quality of mango directly affects its market competitiveness. Therefore, the mango grading has become an indispensable step. China has formulated some standards for mangoes based on shape, color, and surface defects of mango. Based on the standard of the People’s Republic of China “NY/T492-2002 [1]” and “NY/T3011-2016 [2],” mango was divided into three grades according to the characteristics of mango surface defects.

At present, mango is mainly classified by manual detection or chemical extraction method, but the manual detection cost is high and the accuracy is low. The classification of mango by chemical extraction will damage the appearance quality of mango [3, 4]. In recent years, with the rapid development of machine vision and deep learning theory, it has been possible to use machine learning to classify and sort fruits in large quantities, which not only reduces the labor force, but also improves the accuracy [58]. Li and Eng [9] took apple image as the research object, improved the deep learning target detection framework, and built the corresponding learning model. After training and testing, the accuracy rate reached 97.6%. Aiming at the difficulty of sample acquisition in fruit quality supervision learning method, Li et al. [10] took green plum as the research object and proposed an intelligent algorithm based on deep learning. The simulation analysis showed that the accuracy rate was 98.2%. Li et al. [11] proposed a mango quality grading algorithm based on computer vision and extreme learning machine neural network. Compared with the traditional back propagation neural network, the proposed algorithm has higher grading accuracy.

Great progress has been made in the application of machine vision in the detection and classification of globular fruits [1217]. The application of convolutional neural network in transfer learning can effectively solve the problems in the field of agriculture. It does not need to manually extract features and automatically classify the sample images [1822]. Saad et al. [23] presented an improved algorithm for mango grading and measuring mango weight, and the accuracy of weight grading is 95%. He et al. [24] used image processing technology to automatically detect the shape of mango fruit. The evaluation indexes and methods of mango fruit shape were put forward. The cluster analysis of 50 mango fruit shape indexes was carried out to determine the classification basis of each evaluation index. The results shown that the accuracy rate of mango shape evaluation can reach 92%. Mohd et al. [25] aimed at the problems of time-consuming and high cost in traditional mango grading and proposed using computer vision to recognize the shape and irregularity of mango, so as to realize mango grading. The experimental results showed that the average success rate of mango grading was 94%. Combining the methods of machine vision and image processing, to sum up, compared with spherical fruit, mango is ellipsoidal in shape and soft in texture. The existing researches have detected and graded mango by extracting mango images, but did not refer to Chinese standards. At present, there is no research on mango grading according to Chinese national standards.

Therefore, the paper took the Panzhihua mango as the research object and graded it according to Chinese national standards. A convolutional neural network (CNN) deep learning model based on HDevelop development environment is established and trained. Through automatic recognition of mango surface features, the identified image features are put into CNN model for training and testing, and the trained model can quickly grade mango. This model is compared with ResNet-50, MobileNetV2, and other models. After constantly adjusting parameters of “batch size” and “epoch,” the model is evaluated and compared. The results show that this model is better than the other convolutional neural networks in processing speed and accuracy.

2. Data Set Preparation and Image Preprocessing

2.1. Hardware and Software Structure

The mango samples are collected by Daheng Mercury MER-500-14GC, and its main technical parameters are shown in Table 1. The mango images collected are saved as BMP format and written into the image processing software.

The overall image acquisition system is shown in Figure 1. The experiment is used to process data sets in the Intel Core i7 CPU 2.2 GHz, 8 GB running memory, NVIDIA GeForce GTX 1050Ti graphics card configuration, and 4 GB memory environment.

The traditional method of image processing with machine vision is generally divided into three steps. They are image acquisition, feature extraction, and graphics recognition. Because the deep learning classification method is used for image processing, a large number of images need to be preprocessed, trained, verified, and tested in order to get the expected classification results. The mango image processing flowchart is shown in Figure 2.

2.2. Image Preprocessing

During experiment, a total of 234 natural mango samples were collected, and the deep learning tool was used to label the samples. There were 55 first-grade mangoes, 60 second-grade mangoes, 58 third-grade mangoes, and 61 inferior mangoes. The labeled mango image is exported to “hdict” data set format and entered into the integrated development environment of HDevelop. It is divided into 70% training set, 15% verification set, and 15% test set; that is, 161 training samples, 35 verification samples, and 38 test samples are randomly assigned to start the deep learning model pretraining.

In order to meet the requirement of train CNN classifier, the following steps was carried out.

Step 1. The pixel size of the imported mango image is 500374, and the imported image needs to be changed to 224 224 3, so the mango image is scaled.

Step 2. The image is enhanced to maintain the gray value of each image between 0 and 255, and the gray level of the single-channel image is converted to make the pixel value between −127 and 128. The specific method for gray conversion is shown in the following formula:where (x, y) is the output image, Gmax is the maximum gray value, Gmin is the minimum gray value, and f(x, y) is the input image.

Step 3. The preprocessing result image is obtained by connecting processing and threshold segmentation. The image preprocessing process is shown in Figure 3.

3. Construction of the Mango Deep Learning Training Model

3.1. Convolutional Neural Network

Convolutional neural network (CNN) is a feed forward neural network with deep learning function based on convolution operation [2628]. As one of the most classic deep learning algorithms, convolutional neural network is widely used in the field of image recognition. While processing large data image, it is different from fully connected neural network (FCNN) [29]. Convolution is used to replace matrix multiplication in convolutional neural network. A complete convolutional neural network mainly includes the following parts. The input layer is the input of the whole neural network. While processing images, it is usually the pixel matrix of the image. The convolution layer is the most important part of convolutional neural network. Its function is to analyze every small part of the neural network in depth to obtain higher abstract features. The pooling layer is used to reduce the matrix, that is, to convert a higher-resolution image into a lower-resolution image. The fully connected layer gives the classification results by using feature extraction. Among them, the fully connected layer is generated iteratively through the convolution layer and pooling layer.

In this paper, the convolutional neural network is optimized based on the ultra-lightweight network SqueezeNet algorithm. The basic module used is called fire, as shown in Figure 4. The fire basic module consists of three convolution layers. In the expand part, the results of two different core sizes are combined and output through concat. The size of squeeze partial convolution kernel is set to11, and the size of expand convolution kernel is set to 11 and 33, respectively.

In Figure 4, k represents the side length of convolution kernel and c represents the number of channels. If the input and output dimensions are the same, the number of input channels is unlimited, and the number of output channels is e1 + e2. In the SqueezeNet structure proposed in this paper, e1 = e2 = 4s1.

The structure of the whole hierarchical convolutional neural network is shown in Figure 5. In order to improve the training effect, considering the size and number of input samples and the number and size of convolution kernel, a 12-layer CNN structure based on ultra-lightweight network is constructed. The specific structure is as follows. The first layer is the convolution layer, which reduces the input image and extracts 64 dimensional features. The second to ninth layers are fire modules. Reduce the number of channels inside each module and then expand. After every two modules, the number of channels will increase. Add downsampling MaxPooling after layer 1, layer 3, and layer 5, respectively, to reduce the size by half. The tenth level is used as a convolution layer to predict each pixel. Finally, in order to reduce the amount of calculation, the global average pooling is used to replace the fully connected layer, and the SoftMax function is used to normalize it to probability. In order to improve the generalization ability of the model, dropout technology is used to avoid overfitting, and the dropout probability is set to 0.2. The global ReLu function is used as the activation function in the training process.

Convolution layer is the most important model of CNN. Its input is multiple two-dimensional characteristic data graph. Convolution kernel is used as a filter to calculate the local data on the neuron node, and the two-dimensional characteristic data graph with convolution layer is obtained. The principle of convolution layer can be expressed by the following formula:where represents the jth output characteristic diagram of l layer, represents the index set of multiple output features corresponding to the jth output feature graph of layer l, represents the ith output characteristic diagram of l − 1 layer, represents convolution operation, and and represent convolution kernel and bias term, respectively.

Convolutional neural network model is inseparable from the transfer of activation function to data. In the application of convolutional neural network, activation function must be nonlinear. The functions of sigmoid, SoftMax, and ReLu are usually used in convolutional neural network. ReLu function has been proved to be able to deal with complex problems such as gradient disappearance. The formula of ReLu functions is shown in the following formula:

Pooling layer is usually inserted between successive convolution layers. Pooling layer is used to follow convolution layer to gradually reduce the space size (width and height) of data representation. The essence of pooling layer is to further select the features of the convoluted data and reduce the dimension of features by convolution kernel of different sizes, which is executed independently on each depth slice of the input. If the input layer is a convolution layer and the first layer is a pooling layer, the expression of the convolution layer is shown in the following formula:where down(·) represents the downsampling function and represents multiplicative bias.

After the continuous iterative cycle of the convolution layer and the pooling layer, it becomes the fully connected layer. The fully connected layer is used as the output layer, which is used to calculate the score used as the output category of the network. The fully connected layer has general parameters used for layer and super-parameters. The fully connected layer performs conversion on the input data volume, which is a function of the activation and parameters (weights and biases of neurons) in the input space. The SoftMax function is used as shown in the following formula:where Si represents the output of the ith neuron, n represents the number of neurons, and xi represents the input signal.

In this paper, mango grades are divided into four categories, which are first-grade mango, second-grade mango, third-grade mango, and NG.

3.2. Model Building Based on Convolutional Neural Networks

The pretraining model based on HDevelop is used in HALCON software. The input layer of convolutional neural network is an image that has three channels. The size of the original image is 500374 pixels. Based on the pretraining model of CNN, the convolutional neural network is built. Based on the pretraining model of CNN, a convolutional neural network is constructed by superimposing convolution layer, pooling layer, and fully connected layer. Then, the processed image is output, and finally mango grade classification is carried out. The specific convolution process is shown in Figure 6.

3.3. Super Parameter Setting

In machine learning, there are not only the parameters of the model, but also the parameters that can make the network train better and faster by tuning. These tuning parameters are called super-parameters. They are responsible for controlling the selection of optimization function and model during the training of learning algorithm. The key point of selecting super-parameters is to ensure that the model is neither under fitting nor just fitting the training data set and learn the data structure as soon as possible.

Parameter of learning rate refers to the amount of parameter adjustment in the process of optimization in order to minimize the error of neural network prediction. A large coefficient of learning rate will make the parameters jump, while a small coefficient of learning rate (e.g., 0.000001) will cause the parameters to change slowly. Therefore, the selection of learning rate is particularly important. The parameter of momentum can help the learning algorithm get rid of the search space and keep the whole system in a stagnant state. A suitable momentum value can help to build a higher quality model. In order to prevent overfitting in machine learning, resulting in parameters out of control, regularization is needed to find a suitable fitting and maintain some low feature weight value.

Because the number of samples is relatively small, a small number of samples are used to train the network. Based on the small batch stochastic gradient descent algorithm with momentum, the loss function is a polynomial combining cross entropy error and regularization term. The calculation method is shown in the following formulas:where α represents momentum, σ represents learning rate, f(z, n) represents the results of classification, L(·) represents loss function, n represents weight parameter, z represents the input batch, ya represents the encoding of the ath image za, β represents regularization parameter, and k represents the number of weights.

4. Experiment and Data Analysis

4.1. Parameter Test and Results

During experiments, the parameter “batch_size” is set as 8 and the “epoch” is set as 30. In order to avoid overfitting, the regularization parameter is set as 0.0005. The specific experiment is based on the influence on the verification set and accuracy rate, while the learning rate is “0.1, 0.01, and 0.001,” and the momentum is “0.5, 0.7, and 0.9,” respectively. The experimental data are shown in Table 2.

Based on the experimental data, the final choice of learning rate is 0.001, and momentum size is 0.9. But from the accuracy, it does not achieve the ideal accuracy. Therefore, it is necessary to adjust the parameters “batch_size” and “epoch” to achieve the ideal training effect. Due to the limitation of experimental hardware test conditions and the number of samples, the batch size interval 4 is selected for the test, which is “12, 16, and 20,” respectively, and the cycle number interval 20 is selected for the test, which is “80, 100, and 120,” respectively. The experimental data are shown in Table 3.

Obviously, when the “batch_size” is set as 16 and the cycle of “epoch” is set as 120, the accuracy is significantly higher than that of other groups. After a total of 720 iterations, the training time is 73 s, which means that it takes only 0.23 s on average to process a 500 × 374 image, achieving the expected effect. So far, the whole model preprocessing and training process parameter adjustment have been completed.

4.2. Model Evaluation and Performance Analysis

Through the above optimal adjustment of the super-parameters of the neural network, with the increasing number of iterations, the loss function curve gradually becomes convergent with the increasing cycle, and the results are shown in Figure 7. By adjusting the appropriate super-parameters, the average error rate of the training set and the average error rate of the verification set are convergent with the increase of the period, as shown in Figure 8.

It can be seen from Figures 7 and 8 that with the increase of the training cycle to about 35 cycles, the accuracy of the whole CNN-based neural network is significantly improved. At the same time, the overall error rapidly drops to less than 0.5, and the overall error drops to less than 0.21, which achieves good model processing effect.

4.3. Robustness and Comparative Analysis

In order to verify the robustness of the network model, 35 samples were tested. Among them, mango grading refers to the mango standard of the People’s Republic of China “NY/T3011-2016,” as shown in Table 4.

The test results are shown in Table 5, which gives the confusion matrix of thirty-five mangoes verification sets. The experimental results show that only one first-grade mango is misjudged to third-grade one in machine recognition.

Confusion matrix, also known as error matrix, is a standard format for accuracy evaluation. The confusion matrix is a matrix with n row × n column. It has many evaluation indexes, such as overall accuracy and user accuracy. It can characterize the accuracy of image classification through the evaluation index. In the process of deep learning, confusion matrix is a visual tool to supervise the learning process. Among them, the column of confusion matrix represents the prediction category, and the horizontal row represents the real belonging category of data.

The 38 mango images used in the test are imported into the convolutional neural network model, which has been pretrained and run in HDevelop environment for classification. The result is as shown in Table 6. Only one first-grade mango is judged as positive sample, but in fact it is negative sample, that is, the type II error in statistics. The accuracy is 90.91%. There is a certain similarity between first-grade mango and second-grade mango, and their F1-score is 95.24%. The principle expression of F1-score is as follows:where Precision represents the accuracy of a single class and Recall represents the regression rate (the accuracy of the true value is zero).

According to the mango quality grade index NY/T3011-2016 corresponding to Table 4, the accuracy of mango grade classification is analyzed. The analysis results are shown in Table 7. The overall accuracy of the test results reaches 97.37%, and only one first-grade mango is misjudged as second grade, which achieves the expected accuracy.

Heat map, also known as thermal map, is a graphic representation of the features of an object with the form of a special highlight. By observing the heat map, we can intuitively find the user’s overall access and other characteristics. Heat map analysis can often intuitively observe the prominent features of the image, with the help of heat map can accurately capture the target features. By using heat map, HDevelop development environment successfully combines the results of deep learning confidence and visual features, which make the whole classification process more intuitive. Two mango images are randomly selected from each grade of the test results, and the unprocessed images are analyzed with the heat map to find the target defect location, as shown in Figure 9.

The belief propagation algorithm updates the mark state of the whole Markov random field (MRF) by transferring information between nodes, which is an approximate calculation based on MRF. The belief propagation algorithm is an iterative algorithm, which is mainly used to solve the probability inference problem of probability graph model. At the same time, all information can be spread in parallel. In this experiment, some confidence data of mango grade recognition result graph are randomly selected, as shown in Table 8.

The confidence level is replaced by the probability distribution principle formula of probability, as shown in the following formula:where bi(xi) represents the joint probability distribution of node i. mji(xi) represents the message that the implied node j passes to the implied node i. Φi(xi, yi) represents the local evidence of node i. 1/z represents sum of confidence which can be 1. N(i) represents the MRF first-order neighborhood of node i.The information formula expression of message propagation is shown in the following formula:where N(j)/i represents the neighborhood of target node. i is excluded from the MRF first-order neighborhood of node j. xi and xj represents the hidden node. mji represents the dissemination of information.

4.4. Algorithm Comparison

In order to verify the rationality of the proposed algorithm, it is compared with the model based on HDevelop. Due to the limitation of hardware, the same size data set is used to uniformly set the batch size as 8, and the cycle as 60 times, the momentum is defined as 0.9, the learning rate is defined as 0.001, and the regularization parameter is defined as 0.0005. The comparison results are shown in Table 9. The recognition accuracy of this method is the highest, reaching 96.94%, and the single recognition speed is only 2.57 ms. ResNet-50 model is usually used to deal with more complex environmental tasks, and it has no obvious advantages in dealing with small batch data sets similar to the one shown in this paper. The recognition accuracy of Enhanced is almost the same as that of this method, but the processing speed is still not as fast as the model used in this paper. The MobileNetV2 has poor performance in dealing with small batch classification tasks, with low recognition accuracy and the top1_error rate is 26.67%.

5. Conclusions

In this paper, the deep learning method based on convolutional neural network can effectively improve the recognition accuracy of mango grade classification, which is more robust and efficient than the traditional feature recognition algorithm. By adjusting the super-parameters, batch size and period of convolutional neural network, the CNN model can achieve high recognition rate while processing small batch data sets. The whole experimental error analysis converges to the expected range. Through the algorithm comparison experiment, it is proved that the ultra-lightweight network SqueezeNet has the advantages of saving memory and running time when dealing with hierarchical classification tasks. It is proved that this model can better deal with the task of nondamage deep learning classification of small batch mango data sets. The optimized CNN in this paper is used to classify mangoes. Compared with the current relevant models such as AlexNet and ResNet-50, the accuracy is 97.37%, the average error rate is only 2.63%, and the processing time of an original image with a resolution of 500 × 374 was only 2.57 millisecond. The slight blemishes or color spots on the surface of mango have a certain influence on mango grading. Reducing the influence of defects and color spots on mango grading is the part to be optimized. More training samples are needed to improve the application value of the system. This system can be applied at an industrial production, but some modifications should be done.

Data Availability

The data can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the “Seed Fund” of Science and Technology Park of Panzhihua University (no. 2020-3) and the Panzhihua Guiding Science and Technology Plan (no. 2019ZD-N-2).