Abstract

The traditional digital image processing technology has its limitations. It requires manual design features, which consumes manpower and material resources, and identifies crops with a single type, and the results are bad. Therefore, to find an efficient and fast real-time disease image recognition method is very meaningful. Deep learning is a machine learning algorithm that can automatically learn representative features to achieve better results in areas of image recognition. Therefore, the purpose of this paper is to use deep learning methods to identify crop pests and diseases and to find efficient and fast real-time image recognition methods of disease. Deep learning is a newly developed discipline in recent years. Its purpose is to study how to actively obtain a variety of feature representation methods from data samples and rely on data-driven methods, a series of nonlinear transformations are applied to finally collect the original data from specific to abstract, from general to specified semantics, and from low-level to high-level characteristic forms. This paper analyzes the classical and the latest neural network structure based on the theory of deep learning. For the problem that the network based on natural image classification is not suitable for crop pest and disease identification tasks, this paper has improved the network structure that can take care of both recognition speed and recognition accuracy. We discussed the influence of the crop pest and disease feature extraction layer on recognition performance. Finally, we used the inner layer as the main structure to be the pest and disease feature extraction layer by comparing the advantages and disadvantages of the inner and global average pooling layers. We analyze various loss functions such as Softmax Loss, Center Loss, and Angular Softmax Loss for pest identification. In view of the shortcomings of difficulty in loss function training, convergence, and operation, making the distance between pests and diseases smaller and the distance between classes more greater improved the loss function and introduced techniques such as feature normalization and weight normalization. The experimental results show that the method can effectively enhance the characteristic expression ability of pests and diseases and thus improve the recognition rate of pests and diseases. Moreover, the method makes the pest identification network training simpler and can improve the pest and disease recognition rate better.

1. Introduction

Crop diseases can lead to the abuse of pesticides while causing a decline in the yield and quality of agricultural products [1, 2]. This not only increases the cost of agricultural production but also brings food safety and environmental pollution problems [3]. The proposal of deep learning comes from artificial neural networks. In the middle of the 20th century, as humans made major breakthroughs in the domain of neuroscience, researchers began to imitate the structure of the human brain and proposed the idea of artificial neural networks. Subsequently, the research field was gradually expanded and scholars began to pay attention to how to make machines as intelligent as humans. The area of cultivated land currently polluted by pesticides in China has exceeded 16 million hectares. Practice has shown that rational application of pesticides is the most effective means of controlling crop diseases. Rational application of pesticides can not only effectively control the occurrence of crop diseases but also reduce the pollution of pesticides to the environment and agricultural products. It is well known that the rational application of pesticides requires accurate access to crop growth status information, the most critical of which is the rapid and accurate identification of the types of crop diseases. The identification of traditional crop diseases mainly relies on the experience of farmers or professional technicians in farmland. Due to the limitations of professional knowledge, farmers often cannot judge or misjudge some crop diseases [4]. In addition, because some crop diseases are not obvious or the symptoms are similar at the beginning of the disease, even professional technicians are difficult to identify them accurately [5]. When the symptoms are clearly identifiable, they miss the best prevention time, which makes it difficult to obtain accurate and timely control of crop diseases. Therefore, traditional crop disease identification is not only time consuming and laborious but also inefficient [6].

The deep learning is derived from artificial neural networks. In the mid-20th century, as humans made major breakthroughs in the field of neuroscience, researchers began to imitate the structure of the human brain and proposed the idea of artificial neural networks. Then, they gradually expand the field of research; scholars began to pay attention to how to make the machine as intelligent as human beings. As a special neural network that was originally constructed and implemented, the perceptron has gradually attracted more attention to the research of neural networks. However, further research by humans has found that a perceptron with a two-layer structure is only applicable to prebuilt feature learning linear functions, but not linear indivisible problems [7]. The phenomenon of crop diseases and insect pests is obvious in the early stage. If appropriate control measures are not taken in the early stage, it will cause irreversible losses to yield and crop survival, which will affect the yield, quality, and productivity of agricultural products. Limited by the theoretical and technical levels at that time, the research on artificial neural networks is becoming less and less optimistic. Until the end of the 20th century, the back propagation algorithm was ridiculed, making the artificial neural network attract attention. Compared with the perceptron, the BP neural network has increased the hidden layer unit, which has the advantage of being able to construct a complicated data model and flexible feature expression ability, but it is because of the increased level that the learning difficulty becomes large and it is easy to fall into the local minimum. However, when calculating the gradient of each layer node, the gradient weakening will occur in the lower layer direction of the network. It is precisely because these shortcomings affect the performance of the BP neural network, so that it can only build a shallow learning model, and then, its scalability is affected [8].

Considering that the depth of the multilayer network in deep learning will easily lead to the increase of training complexity and the disappearance of gradients, the ResNet convolutional neural network model is used to train the crop disease dataset. The residual learning unit in the ResNet network can simplify the convolutional layer. The learning process only learns the difference between the input and output, deepening the network layer and accelerating the model. Alvaro et al. proposed a tomato pest detection method based on deep learning, which uses images captured by different resolution camera devices for detection. Their goal is to find a more appropriate deep learning framework for the task. Therefore, they considered three main detectors: a faster region-based convolutional neural network, a region-based complete convolutional network, and a single-point multibox detector. These three detectors are called “deep learning elements.” They combined these meta-architectures with the “deep feature extractor.” They demonstrate the performance of deep meta-architectures and feature extractors and propose a method for local and global class annotation and data expansion to improve accuracy during training and reduce the number of false positives [9]. Using convolutional neural network models and deep learning methods to detect and diagnose simple leaf images of healthy and diseased plants. [10, 11]. The model was trained using an open database of 87,848 images containing 25 different plants and 58 different (plant, disease) combinations, including healthy plants. Several model architectures were trained to achieve the best performance of 99.53% in identifying the corresponding (plant, disease) combination (or healthy plant) [12]. Zhang et al. proposed a plant disease identification method based on plant leaf images. Firstly, the spots are segmented to extract the disease feature vector. Then, the K-nearest neighbor classifier is used to identify the plant diseases to provide extraction features [13]. Kyosuke et al. applied the super-resolution method to low-resolution images of tomato diseases to restore the detailed performance of plant organ damage. They also used high-resolution, low-resolution, and super-resolution images for disease classification to assess the effectiveness of super-resolution methods in disease classification [14]. The phenomena of crop diseases and insect pests are obvious in the early stage. If appropriate control measures are not taken in the early stage, it will cause irreversible losses to yield and crop survival, which will affect the yield, quality, and productivity of agricultural products.

Based on the theory of deep learning, this paper analyzes the classical and the latest neural network structure [15]. The network based on natural image classification is not suitable for crop pest identification tasks. This paper improves the network structure that can simultaneously recognize the recognition speed and recognition accuracy. The research objects of this article are wheat powdery mildew and leaf rust, peanut black and brown spots, tobacco wildfire, brown spot, and mosaic disease collected in a field environment. Use deep convolutional neural networks to identify diseases in complex backgrounds. The influence of crop pest and disease feature extraction layer on recognition performance is discussed. The advantages and disadvantages of the layered and global average pooling layer are finally adopted as the structure of the pest and disease feature extraction layer. Analyze various loss functions such as Softmax Loss, Center Loss, and Angular Softmax Loss for pest identification. Plant diseases and insect pests will cause national economic losses. Pests and diseases in farming are a complex issue of timeliness. During the process of crops from sowing to growth and harvest, the pathogens are different at different stages, which in turn affects the final agricultural products and quality. It is difficult to train, difficult to converge, and difficult to operate, so that the distance between pests and diseases is smaller and the distance between classes is larger, improved the loss function, and introduced techniques such as feature normalization and weight normalization. The experimental results show that the method can effectively enhance the characteristic expression ability of pests and diseases and thus improve the recognition rate of pests and diseases. Moreover, the method makes the pest identification network training simpler and can better improve the pest and disease recognition rate.

2. Proposed Method

2.1. Deep Learning
2.1.1. Overview of Deep Learning

With the development of big data and general parallel computing units, applications based on deep learning have also developed rapidly. Deep learning is a neural network with multiple hidden layers. Similar to shallow neural networks, deep neural networks can model complex nonlinear systems. Many layers hidden layer networks provide a higher level of abstraction for the model, thus improving the capabilities of the model. The biggest difference between deep learning and other machine learning is how to find features. The extraction process of features is an abstract process. Deep learning abstraction simulates the way human neurons communicate information. In theory, many classification and prediction problems can be solved. Convolutional neural networks are a typical deep neural network. Great success has been achieved in the many categories and recognition tasks of images.

2.1.2. Artificial Neural Network

There are many diseases and insect pests of crops. There are external factors, including their own factors, their own plant attributes, external weather factors, and the influence of other plants such as weeds in the growing environment and pests in the natural environment, which will affect the crops. The output and quality have a huge impact.

The research of the artificial neural network is largely inspired by the bionics. It consists of a series of simple artificial neurons connected to each other. Each neuron has three parts: input, artificial nerve cells, and output. As shown in Figure 1, the input signal of one neuron can come from an externally given initial value or it can be the output of another neuron. Artificial nerve cells integrate these input signals and perform threshold operations. If the integrated stimulus value exceeds a certain threshold, the neuron enters an active state; otherwise, the neuron is in a suppressed state.

Just as the brain’s neurons can constantly adjust the connections between neurons to achieve continuous learning and progress. The ANN network can also adapt the network to the training set by adjusting the weight values on the continuous input.

(1) Perceptron

A perceptron is a simple artificial neuron with only two outputs. Input into the perceptron in turn, and each input value corresponds to a weight , in addition to an offset as shown in equation (1):

If , then, equation (1) can be expressed as

The output of the perceptron is as shown in equation (2) after the thresholding process:

The sensor works in much the same way as a neuron, and it can classify two types of samples. For the input sample output 1 on one side and the sample output −1 on the other side, the training process is to continuously adjust the weight .

(2) Linear Units

Sensors with only 1 and −1 outputs limit the classification of processing. A simple generalization is to cancel the threshold module. One input is a linear unit of , and its output is a linear combination of inputs based on weights, plus offset . The core task of training a linear unit is to adjust the weight so that the output is as similar as possible to the target output of the training sample.

(3) Error Criteria

A metric is defined to measure the training error of the ANN relative to the training example under the current weight vector . A common training error criterion is the squared error: where is the trained sample set, is the target output of the training sample , and is the actual output of the linear unit pair training sample . It is necessary to calculate the direction of the fastest fall of the error, so the maximum direction of the directional derivative can be obtained by calculating the partial derivative of the with respect to each component of the vector , denoted as .

Since itself is a vector that represents the largest directional derivative, it corresponds to the fastest rising direction of the and the direction of the fastest falling of the error is only required to require . So, the law of training should be

in equation (6) is that the learning rate is a positive super parameter and is the step size of the gradient descent search. is the current search point in the solution space, represents a displacement to the current fastest descent direction, and means to move a short distance from the current point along the fastest descent direction and update the current position to this position.

All neurons are of Boolean type, that is, state 0 means inhibition or 1 means activation, and then, the energy of RBM is definedp as

The probability distribution of the visible layer is as follows, where is the normalization factor:

The cost (loss) function of the Softmax regression algorithm is

Prove that the current loss function is strictly convex and its local minimum is not unique, so the optimal parameter to be solved is not unique. The proof process is as follows:

Usually, add regular items to modify the cost function, as follows:

is the weight of the regular term. At this time, the cost function can also be updated with the gradient descent method. The gradient after derivation is

Update the parameters with the help of partial derivatives:

The parameter update of Softmax is completed through repeated introduction.

(4) Backpropagation

The weighting parameters of the neural network need to be able to be adjusted like human nerve cells, so the ability to automatically adjust parameters is required, and the backpropagation algorithm can be used to adjust the error according to the error.

(5) Many Layer Artificial Neural Networks

In a single neuron or linear unit, only linear problems can be solved. However, there are great deficiencies in solving nonlinear problems, but real-life problems are often nonlinear. The cells in the brain and other cells are interconnected, mimicking the cellular connections of the human brain, and the artificial neural network can also form a many-layered neural network in a similar manner. The characteristics of pests and diseases in specific areas are periodicity, seasonality, and repetitiveness. That is, the diseases and insect pests in the same area show different diseases and insect pests in different seasons, multiple diseases and insect pests occur repeatedly in the same season throughout the year, and several fixed diseases and insect pests appear in turn. The emergence of many layers of neural networks can solve nonlinear problems. In the layered feedforward network, as shown in Figure 2, generally, there are at least three layers: one input layer, one output layer, and hidden layer, and the hidden layer is generally multiple. Adjacent neurons are interconnected.

2.1.3. Convolutional Neural Network

Before studying the subject of crop disease image recognition, you need to be familiar with relevant basic theoretical knowledge, such as the recognition model used in deep learning, image recognition process, convolutional neural network structure, deep learning framework, experimental dataset, preprocessing and labeling categories, and many more. As the most representative deep learning network model, the convolutional neural network has benefited from the development of computer networks and computing power. The two-dimensional image is stored in the form of a matrix in a computer, and the relative position between each pixel is fixed. The convolution operation of the fixed convolution kernel size can extract the features of the two-dimensional image. It is this feature of two-dimensional images that enables convolutional neural networks to be successfully applied to identify images and target detection. The CNN network is an important network model for deep learning. CNN’s connection method and network structure are very suitable for dealing with image problems, as shown in Figure 3 (picture from the network, http://www.baidu.com/).

In the convolutional neural network, the convolution kernel represents the local features of the picture. The image is processed by multilayer convolution and pooling to convert the image information into features with higher information. At the same time, it is necessary to constantly adjust the convolution kernel. The parameters enable the convolution kernel to complete the work of a certain image feature. In image recognition or classification tasks, images are extracted by a convolution operation. Then, the size of the feature map is reduced through the pooling layer and the classification is output through the full connection layer and finally through some activation functions. The main process of accurate crop management is shown in Figure 4.

(1) Convolutional Layer

The function of the convolutional layer is to extract features of the original image. There may be many convolution kernels in the same layer, each of which extracts a feature. The difference between convolution and full join is that convolution uses local joins and feature extraction in order to reduce computation. The local connection is the image area of the convolution kernel size only after the input image is multiplied, and the convolution kernel image is smoothly moved to obtain the feature map, so that the pixel that originally needs to connect the entire image is reduced to a convolution kernel size. Although the operation is reduced after the partial connection, the parameters are still many and each convolution kernel slip has to learn different parameters. Reducing the amount of computation by using the weight parameters of the same convolution kernel to be equal is called weight sharing. The weight sharing is mainly based on the information learned from a local area, applied to other positions of the image, that is, the same convolution kernel is used to convolve the entire image, that is, a full filtering. The local statistical properties of the image are repetitive (position independent) across the image. If there is every basic graphic in the image, the graphic may appear anywhere, and sharing the same weight at different locations enables the same mode to be detected at different locations of the data. For example, if the feature obtained after convolution in a window is an edge, then, the convolution kernel corresponds to the edge feature extraction method, and then, the convolution kernel can be used to extract edge features of other regions at other locations.

In the input layer, some preprocessing is performed on the original data, so that the original data can be more adapted to the neural network for training and feature extraction. The quality of the original dataset varies due to different collection methods. Therefore, it is necessary to perform a series of processing on the collected and labeled raw data.

(2) Pooling Layer

In the CNN network, a special convolution operation is usually called pooling after the convolution operation. Through the pooling operation, the input feature dimension can be reduced, the number of training parameters can be reduced, and overfitting can be prevented. The more commonly used pooling methods are average pooling and maximum pooling. The specific operation is to obtain the mean and maximum values of the elements in the pooled area as the output. Pooling can be seen as a fusion of local information, and a combination of multiple operations can be combined to a larger extent to obtain more complete features.

(3) Regularization Layer

For convolutional neural networks, network training is a complex process. Due to the migration of internal variables, the changes in the previous training parameters are amplified as the training progresses. The probability distribution of the current feature is inconsistent with the probability distribution of the initial layer feature, and the previous training parameters are not applicable. There is a hypothesis in machine learning and deep learning that the distribution of training data should be the same as the distribution of features. Here, enter the data of the network using batch normalization (BN) as shown in equation (7):

The corresponding equation is

An optimization problem that can be transformed into a function, let the error function be

Assuming that the final output is an ideal model, we can get

From this, we can see

At this time, each individual in the offspring population corresponds to the amount of change in the initial solution

Make the input data of each layer have a mean of 0 and a variance of 1. However, the output limit is 0 and the variance is 1. This will make the generalization ability of the model worse. It is therefore necessary to add the parameters and that can be learned to scale and translate the data. The values of and are constantly updated during model training. BN is an optimized data model after the model is trained in batches. That is, when the batch input data is guaranteed, the distribution of the generated and confronted data is limited to a controllable range.

(4) Activation Function

When the network is running, it activates a certain part of the neurons in the neural network and transmits the activation information back to the next layer of the neural network. The reason why neural networks can solve nonlinear problems is that the activation function adds nonlinear factors to make up for the expressive ability of the linear model and retains the characteristics of the activated neurons through the function to the next layer. Because the mathematical foundation of the neural network is differentiated everywhere, the activation function is chosen to ensure that the data input and output are also differentiable.

2.2. Pest Detection

With the successful application of deep learning in the fields of target detection, recognition, and segmentation, many different depth-based learning-based pest detection algorithms have been proposed. Among them, the MTCNN algorithm and the single-step detection algorithm SSH of the cascaded multitasking network are the most the representative algorithm, the former not only works well in detection but also has an outstanding effect on the alignment of pests and diseases; the most prominent feature of the latter is the scale invariance, which occupies less memory and is faster.

2.2.1. MTCNN Algorithm

The MTCNN pest detection algorithm turns the pest detection task into a coarse-to-fine process through the cascade convolutional neural network. The specific process can be divided into three stages: in the first stage, the shallow convolutional neural network quickly generates candidate windows, removing a large number of negative samples; in the second stage, further refining the candidate window through a more complex convolutional neural network is made, using nonmaximum suppression to discard a large number of repeated windows; in the third stage, using a more powerful convolutional neural network further decides the retention of the candidate window and finally completes the detection and outputs the key points of the pest image (the type of pests and diseases, the degree of pests and diseases, and the crops of pests and diseases). (1)Pnet is a full convolutional network structure. The input of the network is . After three convolutional layers with a convolution kernel size of 3 and a pooling layer, the convolution is used to generate classification, border regression, and key points. Pnet classifies the samples and returns the four candidate regions to the frame parameters of the target region in each candidate region. After that, the candidate regions are corrected by four parameters to obtain a more accurate target frame and the nonmaximum suppression filter is selected(2)Rnet has an inner layer (full connection layer) compared to Pnet. The network input is that we can get more detailed features through the inner layer and eliminate a large number of candidate pest and disease areas that do not meet the requirements. The pest and disease areas were calibrated and finally combined by nonmaximum suppression(3)Onet’s network input is , and the network structure has a layer of convolution layer more than Rnet, which can get more complicated and accurate image characteristics of pests and diseases, which plays an important role in frame regression and key point return of pests and diseases

2.2.2. Nonmaximum Suppression

Nonmaximum value suppression (NMS) refers to the suppression of elements that are not maximal values. In essence, it is a local search. In target detection, it is generally used for filtering and removing the late target-repeating bounding box.

Target detection generates a number of bounding boxes in a region when positioning. Each bounding box has a classification score, and most of them are redundant. NMS is to perform the local maximum search for all detected bounding boxes. Search for the maximum value within the domain to filter out part of the bounding box. The main process is as follows: (1)First, get the bounding box of the object; the same object may have many bounding boxes and its classification score(2)Sort the scores of all bounding boxes and select the highest score and its corresponding bounding box(3)Traverse all the remaining boxes; if the IOU value of the current highest score box is greater than the preset threshold, delete it(4)Continue to select the box with the highest score in the unprocessed bounding box and repeat the 3rd process until all reserved bounding boxes are found

2.2.3. Difficult Sample Mining

Difficult sample mining is an important strategy to improve the quality of models. It has a wide range of applications in image classification and general target detection. The main idea is to find some difficult positive samples and difficult negative samples in each step and then further training these difficult samples to enhance the performance of the model.

In recent years, with the continuous improvement of the performance of target detection frameworks, online difficult sample mining and Focal Loss have been widely used to screen difficult samples. OHEM selects the most difficult and difficult samples in each mini batch to calculate the neural network gradient, while Foal Loss is aimed at focusing more on the difficult and wrongly classified samples, adding a factor to control the standard cross-entropy.

(1) OHEM

When optimizing SVM, some samples in the algorithm that cannot be distinguished due to concentration are removed, samples that cannot be correctly judged are added to the model, and then a new training is started. The OHEM forwards all samples, then sorts the losses generated by each sample, and selects the former part of the sample for back propagation. For example, only 1 : 3 positive and negative samples are taken to calculate the loss and the remaining negative sample weights are all set to zero. A simple sample with a small loss value has a small contribution to network training, while a difficult sample with a large loss value enhances the generalization ability of the network. Unlike Focal Loss, OHEM takes only 1 : 3 positive and negative samples and the remaining negative sample weights are reset to zero, while Focal Loss trains all negative samples, giving the positive and negative samples different weights based on the loss value.

(2) Focal Loss

In the one-stage-based target detection algorithm, the positive and negative samples can reach 1 : 1000, which leads to the problem that the positive and negative samples are extremely unbalanced and the negative sample dominates the loss function. Although the negative sample has a high accuracy and the loss value is small, a large number of negative samples eventually produce a loss value that is sometimes greater than the loss value generated by the positive sample. Focal Loss first tries to solve the problem of sample imbalance and introduces a weight coefficient in the cross entropy loss function. When the sample is detected as foreground, the cross entropy loss function is multiplied by the weight . When it is the background, the cross entropy loss function is multiplied. For weight , the loss function can be changed to

After solving the problem of sample imbalance, the author also multiplies the by the cross-entropy loss function, which means that if the accuracy of the sample prediction is higher, the smaller the , the smaller the overall loss value. The higher the accuracy of the sample, the more the loss will be attenuated, and the lower the accuracy of the sample with less attenuation, then the overall loss function will be dominated by the lower accuracy of the sample, then the final loss function is

3. Experiments

3.1. Experimental Platform

At present, convolutional neural networks are all three-dimensional input data, which reflects two-dimensional pixels and RGB channels on a flat image. The input is information such as image pixels in matrix form. The length and width of the image and the depth of the color channel constitute a three-dimensional matrix. The depth of the black and white image is 1, and the depth of the color image is 3.

Due to the batch processing of image data, a large amount of memory is required. The computation of deep learning relies on a generic GPU acceleration module. Therefore, the experimental hardware platform used in this paper is the DGX-1 deep learning server of NVIDIA. The experimental platform configuration is shown in Table 1.

3.2. Training Sample Generation

Considering that the depth of the multilayer network in deep learning can easily lead to the increase in training complexity and the disappearance of gradients; the ResNet convolutional neural network model is used to train the crop disease dataset. The residual learning unit in the ResNet network can simplify the convolutional layer. In the learning process, only the difference between input and output is learned, deepening the network layer and accelerating model training.

For the training samples, an offset value can be randomly taken at the real pest area and the candidate pest and disease window can be randomly selected in the original map by the offset value and the real pest area, and then, the candidate pest and insect window can be obtained and the size of the IOU through the real pest area is determined. Negative samples, suspect samples, and positive samples can be generated. The IOU is specified to be greater than 0.7 as a positive sample. If the maximum IOU value is less than 0.7, the highest threshold is considered as a positive sample, the IOU is less than 0.3 as a negative sample, and between 0.3 and 0.7 is a suspect sample. The interpolation is scaled to the specified scale, and the IOU is the ratio of the real sample to the candidate window area, as shown in Figure 5 (picture from the network, http://www.baidu.com/):

The calculation formula is

The ratio of positive samples, suspect samples, and negative samples is about 1 : 1.5 : 3. The negative samples are only responsible for detecting whether there are pests and diseases, which is a two-class problem. The suspected samples and positive samples are subject to the pest return frame and the key points of pest and disease. In order to enhance the robustness of the network, the samples obtained by random cropping can be sent to the training network to obtain an initial network model to further obtain training samples. After training Pnet, the original image can be input into the Pnet training model and the detected pest area and the real pest area IOU can be obtained, and finally, the Rnet training sample is generated. As described above, when the IOU is less than 0.3, it is regarded as a negative sample. An IOU greater than 0.7 is considered a positive sample and an IOU is considered a suspect sample between 0.3 and 0.7. Repeat this step to send the original image to the trained Rnet model to generate the training samples of Onet, so that the samples sent to Rnet and Onet have both samples generated by offset values and samples predicted by the network, which can enhance the generalization of the model.

4. Discussion

4.1. Pest Image Recognition
4.1.1. Pretreatment of Crop Disease Images

The training process of using the ResNet network to recognize crop disease images is initializing the weight parameters of the network model, the convolutional layer in the ResNet network extracts the characteristics of the training dataset image, the characteristics are pooled to obtain the output, and the output value and the difference between the target values use the error to update the weight parameters of the network to obtain a trained crop disease image recognition model.

The purpose of preprocessing is to remove the noise and redundant information in the sample picture, which can simplify the image and improve the recognition rate of valuable information, thereby improving the performance of model identification. And the use of deep learning method for classification and identification requires a high amount of data. This paper only collects about 5,000 images, so the data enhancement process needs to be done later. There may be content in the image that has nothing to do with the target recognition. They not only are useful for extracting image features but also sometimes may reduce the effect of feature extraction. Therefore, the image should be extracted before image recognition. In this paper, ACDSee9.0 software is used to manually cut the picture. For the disease image, the cutting leaves a leaf and the main cutting leaves the diseased part. The cutting effect is shown in Figure 6.

4.1.2. Pathological Leaf Lesion Extraction

By multiplying the background part and the preprocessed image, the green leaf part can be removed and the leaf lesions can be extracted and the treatment effect is shown in Figure 7. By processing the image of the diseased leaf collected, a clear image of the lesion is obtained, but the color, shape, and texture characteristic parameters of the lesion image need to be separately extracted to identify the disease.

4.2. Simulation Experiment of Disease Image Fusion

Under the complex background of the field, in the same scene, the collection equipment shoots diseased leaves under different focal lengths and different lighting conditions. Some images are clear and blurred in the foreground, and some images are blurred in the foreground and clear. The disease image fusion method based on wavelet transform is adopted. The disease image is decomposed by a wavelet, and the schematic diagram of wavelet decomposition is shown in Figure 8.

After that, the disease image fusion is performed on the two fuzzy disease images of wolfberry leaves and the result of disease fusion is shown in Figure 9.

It can be seen from the figure above that there are fuzzy parts in the first two disease images. After the disease images are fused, the fused image obtained is clearer. Based on the wireless network communication and deep learning image fusion method, the coefficient with a large absolute value is selected as the wavelet coefficient to retain the details, which not only corresponds to the significant brightness change in the image, but also retains the salient features in the image, making the disease image clearer.

For the diseased images of Lycium barbarum leaves collected under the complex background of the field, the Sobel operator, Robert operator, Prewitt operator, Cannon operator, Gauss-Laplacian operator, Canny operator, and phase-based edge detection method were used to detect the diseased leaves. For edge detection, the detection result is shown in Figure 10.

For fuzzy disease images of the same diseased leaf, six objective evaluation methods, including contrast, sharpness, average fuzzy degree, fuzzy degree, color aggregation vector, and hue count, are used to evaluate the image quality. The value of the color aggregation vector is the histogram of the aggregation pixel. For graph feature values, perform quality evaluation on one of the blurred images and the evaluation results are shown in Table 2.

The objective quality evaluation of the regional image after fusion of the diseased image is carried out by six indicators such as contrast, sharpness, average fuzzy degree, fuzzy degree, color aggregation vector, and hue count. The evaluation results are shown in Table 3.

In order to prove that the segmentation method based on wireless network communication and deep learning can effectively segment disease images under complex background, this paper compares it with traditional segmentation algorithms. When the segmentation results are basically the same, the segmentation algorithm based on wireless network communication and deep learning has obvious advantages. The comparison between the number of iterations and the split time is shown in Table 4.

When the segmentation results are basically the same, the number of iterations of the segmentation experiment on the standard image set is reduced by about 13% and the segmentation time is reduced by about 27%; the number of iterations of the segmentation experiment on the diseased image set is reduced by about 12%, and the segmentation time is reduced by about 30%, and as the number of iterations increases, the time for algorithm segmentation also continues to increase.

Aiming at five different disease types (no disease, gray spot, powdery mildew, gall mites, and anthracnose) of wolfberry disease images in small sample data, the segmentation method based on wireless network communication and deep learning proposed in this paper is compared with traditional segmentation and comparison of fuzzy C-means clustering segmentation algorithms. The average error rate of the method is shown in Table 5.

As can be seen in Table 5, the average error rate of the fuzzy C-means clustering segmentation algorithm is large, which will cause a large number of redundant backgrounds; the average error rate of the traditional algorithm segmentation is lower than the average error rate of the fuzzy C-means clustering algorithm. The average error rate of the methods exceeds 10%, and the average error rate of the deep learning algorithm used in this paper is below 6%.

In order to verify the segmentation effectiveness of the deep learning algorithm on the fine-grained category dataset, compared with the GrabCut algorithm segmentation method based on the saliency map, the average accuracy of the two segmentation methods is shown in Table 6.

Finally, in order to prove the segmentation effect based on the deep learning algorithm in the light-affected environment, for the diseased images in the normal field environment and the light-affected environment, the segmentation algorithm proposed in this paper is mainly based on the average error rate and the average recall rate, as shown in Table 7.

As can be seen in Table 7, for the disease images in the normal field environment and the light-affected environment, the average recall rate under normal lighting conditions based on the deep learning algorithm proposed in this paper can reach 90%, while in a strong lighting environment, the following will affect the performance of the segmentation algorithm, so the lighting conditions are an important factor in the performance evaluation of the segmentation algorithm.

4.3. Analysis of the Accuracy of Image Recognition of Pests and Diseases
4.3.1. Accuracy of the Test Set under Different Settings of the First Layer of Convolution

The image of crop pests and diseases can be well satisfied with the crop pest identification task after the background is removed at a resolution of about . Therefore, all relevant experiments are to align the image detection of crop pests and diseases to and the pixels of the image of three channels. Normalize by subtracting 127.5 and dividing by 128. Like most convolutional neural networks, the experiments in this article are based on the ImageNet dataset, so the network input size is generally large (such as ). The first convolutional layer is generally , and the size of the convolution kernel in step A is a sample image of length 2, but for crop pest images, greater perception also means smaller feature maps. Therefore, the first layer of convolution kernel is set to convolution kernel, the step size is 2, and it is named A. The following is a ResNet-50 network. The first layer of convolutional layers is set to A and B, namely, AResNet-50 and BResNet-50, the loss function is the standard Softmax loss function, and the training dataset is CASIA-Web, in LFW, and the YTF test set was tested, and the experimental results are shown in Table 8 and Figure 11.

We can find that the use of setting B is better than A. The reason may be that the large convolution kernel will make the feature size drop too fast, and the network cannot effectively learn the features with strong expressive ability. Using setting B, the first-layer output feature map size is , and the setting A first-layer output feature map size is , so that the latter feature image size is twice that of the former, improving the ability and speed of pest identification.

The experiment uses BResNet-50 as the baseline model, and the experiment is carried out separately. The loss function is the standard Softmax Loss, the training dataset is CASIA-Web, the test set is LFW and YTF, the parameter in Dropout is set to 0.4, and the Dropout has a slight regularization. The effect can alleviate the problem that the network is overfitting. The results of the six groups of experiments are shown in Figure 12. We can see that the 5th setting has the best recognition effect, that is, Dropout, inner layer, and bulk lie can effectively enhance the expression of pests and diseases. Improve the ability and speed of pest identification.

In the feature extraction layer setting, the final output of the pest characteristic dimension has multiple choices and which dimension is most suitable for expressing pest and disease characteristics still depends on the experience of the researchers. If you experiment with each dimension, the cost of the experiment can be very high. In the past, the characteristics of pests and diseases in the study of pest and disease identification were also different. Generally speaking, the higher the characteristic dimension of pests and diseases, the stronger the expression ability, but the increase of the number of feature channels will also increase the computational burden of the network, resulting in inefficiency. Therefore, it is particularly important to find a suitable pest and disease feature dimension, not only to reduce network parameters but also to accelerate the network training process.

The 7 Hu moment features of each lesion image are shown in Table 9.

It can be seen in Table 9 that the parameter values of the Hu moment feature between different lesions have obvious differences, which can effectively distinguish 4 different diseases.

Through each kind of lesion image (choose one image for each), the size is unchanged, and when the rotation degree is different, such as −30 degrees, −15 degrees, 0 degrees, 15 degrees, and 30 degrees, the calculated value of the Zernike moment is 4 kinds. Lycium barbarum diseases are analyzed, as shown in Table 10.

It can be seen in Table 10 that for each type of lesion image, when the image size is unchanged, the feature parameters are relatively stable and the shape characteristics of the lesion can be well represented by rotating at different angles.

In order to verify the stability of the characteristics, for a kind of lesion image, under different rotation degrees and different zoom magnifications, the calculated values are compared and analyzed for the two diseases, as shown in Table 11.

It can be known that the combination of feature and Hu moment feature extraction of lesion features can be better applied to the recognition of disease images and the recognition effect of the feature is better than the single use of the Hu moment feature.

The average accuracy, precision, recall, and F-1 value are used to evaluate the three models as a whole. The average accuracy is the average of the last five results of the model. The specific indicators are shown in Table 12. The data in the table shows that DCNN has a better performance. With strong representation ability, excellent results have been achieved on the crop disease dataset in this article.

5. Conclusions

(1)Based on the theory of deep learning, this paper analyzes the classical and the latest neural network structure. The network based on natural image classification is not suitable for crop pest identification tasks. This paper improves the network structure that can simultaneously recognize the recognition speed and recognition accuracy. The influence of the crop pest and disease feature extraction layer on recognition performance is discussed. The advantages and disadvantages of the layered and global average pooling layer are finally adopted as the structure of the pest and disease feature extraction layer(2)For the Softmax cross entropy loss function, the problem of within-the-class austerity and interclass increase cannot be satisfied at the same time. The distance between classes is increased by adding intervals from the metric learning perspective; the Softmax cross entropy loss function is changed to the cosine distance loss function. Forcing the loss function to shrink the inner distance of the class, so that the interface becomes a boundary area, the distance between classes increases, which significantly improves the recognition rate of pests and diseases. This paper is different from the shortcomings of multiplication interval, which is difficult to train and difficult to adjust. The additive interval loss function method is proposed. The experimental results show that not only the training becomes simple but also the recognition rate of pests and diseases can be improved(3)There may be overlapping problems between the pest and disease datasets, which is not conducive to the generalization of the final model. This paper carefully screens the commonly used pest and disease datasets. After removing the overlapping data, the final accuracy of the test set will be slightly reduced but can improve the generalization ability of the model. The pest and disease detection system, the improved pest network, and the loss function were used to develop the interface of the pest identification system. The tensor flow deep learning framework was used as the model forward reasoning tool to verify the effectiveness of the system under attitude and occlusion. In the later work, the diseased leaves and the diseased parts were identified in crop disease samples with complex environments. At the same time, the recognition of the pictures taken directly in the natural environment makes the recognition accuracy of the model higher, so that the article as a whole has more practical guiding significance

Data Availability

This article does not cover data research. No data were used to support this study.

Conflicts of Interest

The author declares that they have no conflicts of interest.

Acknowledgments

This work was supported by the Department of Higher Education of the Ministry of Education of the People’s Republic of China for its financial support for the cooperative education project of industry-university cooperation under the contract 201702038005.