Abstract

Pernicious insects and plant diseases threaten the food science and agriculture sector. Therefore, diagnosis and detection of such diseases are essential. Plant disease detection and classification is a much-developed research area due to enormous development in machine learning (ML). Over the last ten years, computer vision researchers proposed different algorithms for plant disease identification using ML. This paper proposes an end-to-end semantic leaf segmentation model for plant disease identification. Our model uses a deep convolutional neural network based on semantic segmentation (SS). The proposed algorithm highlights diseased and healthy parts and allows the classification of ten different diseases affecting a specific plant leaf. The model successfully highlights the foreground (leaf) and background (nonleaf) regions through SS, identifying regions as healthy and diseased parts. As the semantic label is provided by the proposed method for each pixel, the information about how much area of a specific leaf is affected due to a disease is also estimated. We use tomato plant leaves as a test case in our work. We test the proposed CNN-based model on the publicly available database, PlantVillage. Along with PlantVillage, we also collected a dataset of twenty thousand images and tested our framework on it. Our proposed model obtained an average accuracy of 97.6%, which shows substantial improvement in performance on the same dataset compared to previous results.

1. Introduction

The plants’ diseases in crops and fruits have adverse effects on agriculture production. If these diseases are not identified and treated on time, an increase in food insecurity can occur. Some particular crops, such as wheat, rice, and maize are vital for ensuring the food supply as well as agriculture production. Early warnings and some forecasting are very effective prevention in controlling plants’ diseases. Forecasting and prevention play an essential role in adequately managing agricultural production. However, until now, visual observations of producers are the only approach for various plants’ disease identification in mostly rural areas, specifically in less developed countries. Continuous monitoring of experts is needed, which might be prohibitively very expensive in large farms.

Similarly, to contact experts’ farmers may have to travel large distances, which also makes the consultation expensive and time consuming. We argue that this conventional approach is not practical for large farming areas looking into the demands of the crops in the production industry. Therefore, automatic plant disease recognition and classification are still crucial topics in computer vision (CV).

Diseases seriously affect the health of every living organism, including plants and animals. The state-of-the-art (S.O.A) algorithms in the CV and ML domains have enabled us to identify diseases beyond human accuracy. Interestingly, the CV and ML algorithms can be applied to all domains, including plants and humans, with almost no difference in implementation. Modern technology enables human society to generate sufficient food to fulfill the human population’s requirements. Conversely, numerous factors still affect food safety, including plant diseases, climate change, and the decline of pollinators. Plant diseases are the primary danger to food safety. At the same time, the deficiency of essential infrastructure makes it hard to identify these diseases in many parts of the world quickly. With the latest developments in CV algorithms, the ML paradigm paves its way for agile and on-spot disease diagnosis.

Plant diseases not taken seriously have caused a decline in agricultural productivity in several countries worldwide. The disease symptoms have a detrimental effect on crop growth, restraining yields and rendering agricultural goods unsuitable to be consumed. Therefore, their early detection, modeling, and recognition are essential. This article explores the modeling, detection, and recognition of plant diseases that involve appearance-based analysis and can be captured and modeled using ML. Since the leaves of plants provide expressive appearance-based modeling, from modeling perspectives, our interest is inclined towards disease detection in tomato plants using deep learning (DL).

Determining the health quality of a plant is essential. Several models have been created to deter the loss of crops to pests and diseases. Plant disease symptoms are usually noticeable when the leaves change color or shape. Traditionally, the identification of pests and diseases was done using the naked eye and was supported by agronomic organizations. Presently, detection of plant diseases and pests can be done through machine vision. Plant disease identification using ML is not a new research field. CV experts have reported many good papers worth mentioning in [16]. The extensive penetration of smartphones, high-density cameras, and high-performance processors have made it possible for diseases to be detected by automated image recognition.

This paper proposes an end-to-end (E2E) segmentation model for plant disease identification and classification. The model uses semantic leaf segmentation (SLS) using an optimized CNN. By successfully highlighting the foreground and background regions, the proposed model classifies them into healthy and disease parts. The proposed model encodes the high-density maps and classifies tomato plant leaves into ten different categories of various diseases. Our model outperforms previous approaches in an evaluation setup on the PlantVillage database. The significant contributions of the proposed work are as follows:(i)A new CNN-based algorithm for plants’ disease recognition and classification has been presented in the paper. ML and DL experts have already proposed numerous methods for plants disease recognition; the novelty of our model is providing information about each pixel of a leaf image, which tells if a pixel belongs to a diseased or healthy part.(ii)Second, we contributed a new dataset for tomato leaves disease classification. We collected these images from the Internet. These images are labeled manually as healthy and diseased images. The database will be available after the publication of the proposed work for research purposes.(iii)The proposed model also provides information regarding how much leaf area is affected by a specific disease. Most of the previous ML-based approaches do not provide this information. Along with predicting the diseased and healthy part, information regarding how much leaf area is affected by a specific disease is also provided.

The presentation of the remaining paper is arranged as follows. In Section 2, we discuss previous research work on the topic. Both conventional ML and DL-based methods are discussed in this part. The proposed CNN-based method is discussed in Section 3. Section 4 presents the experimental setup, obtained results, and comparison of obtained results with previous results. Finally, the conclusion is presented in Section 5 with some promising future directions.

Plant diseases typically affect the growth of crops in all stages of development and sometimes may lead to the death of the plant. Plant diseases affect food security globally and affect small subsistence farmers who depend on their crops for food and livelihood. Therefore, determining the health quality of a plant is very important. Several models have been created to deter the loss of crops to pests and diseases. Traditionally, the identification of pests and diseases was done using the naked eye and was supported by agronomic organizations. Plant disease identification using ML is a well-researched area. CV experts have reported many good papers on the topic [19].

In meticulous agriculture, the subdivision of crops in agricultural images is vital. Various techniques have been deployed for the segmentation process, such as SS. The SS marks the multiple features in an image into semantically meaningful items and classifies each item into a class. For example, the various classes can be leaf, stalk, or flower in plants. Several studies have used different SS techniques to identify plants from nonplants. For example, Sodjinou et al. [10] suggest a method grounded on the mixture of SS and K-means for detecting weed from images. K-means algorithm is used for categorizing things that belong to similar groups. The proposed technique provided a more accurate segmentation of weeds and plants from the study results. Several approaches were used by Miao et al. [11] to semantically segment hyperspectral images of sorghum plants, such as manual pixel annotation and classifying each of the pixels as either nonplant or plant. The scholars further classified the plant as belonging to either a panicle, leaf, or stalk of the sorghum plant. They could separate the plant pixels from the background, only that they could not classify to what organ the plant pixel belonged. In another study, Li et al. [12] used the region-based segmentation to detect crops from the images derived from a natural field. They used the method to detect cotton specifically. The model was successful as it could even detect the boll opening stage of the cotton plant.

While identifying a plant from a nonplant can be an easy task using SS, identifying plant diseases through an image is tough since plants are complex environments. Through the developmental stages of crops, their flowers, fruits, and leaves change constantly. During the day, solar radiation affects plants’ spectral response, so their appearance also changes slightly. Additionally, different shapes, layouts, and colors of plant diseases make them difficult to recognize. Regardless, several successful techniques improve detection methods for diseases in plants, both in a controlled environment and in natural conditions.

Chen et al. [13] proposed using BLSNet to recognize the rice bacterial leaf streak in rice and segmentation based on UNet network. Rice bacterial leaf streak (BLS) is a threatening disease usually found in rice leaves. BLS affects the yield and quality of rice. BLSNet used a large-scale extraction and an attention mechanism to increase the precision of segmentation of the lesion.

One technique that has been widely successful in identifying plant diseases is SS through CNN. The layers of CNN can be viewed as corresponding filters that are directly taken from the input data. CNNs bring out a hierarchy of visual images adjusted for a precise task. The accuracy of CNNs in detecting objects such as plant diseases and image organization has made incredible growth over time [14]. The CNN-based classification network implementation is the most regularly used pattern in categorizing plant diseases and pests, owing to CNN’s strong feature extraction capability. Zabawa et al. [15] used SS using convolutional neural networks to extract phenotypic traits in grapevine berries.

According to Bhatt et al. [16], CNN-based methods have been used to achieve extraordinary results in supervised image segmentation of leaves. Usually, the methods used work under fully controlled conditions, whereas the deep CNN models are built on various changing parameters. However, the images would have different backgrounds, lighting conditions, obstructions, and overlapping in an environmental setting. In various stages of growth, plants do have a reasonable amount of variation. The authors propose unsupervised machine learning algorithms to segment the leaves images to make it possible to be applied to various crops and regions. Afterward, the specific segments are then assessed for their texture, size, and color to measure any change, such as the presence of a pest or disease.

Unsupervised feature learning, with fully convolutional networks (FCN) followed by conditional random fields, makes it possible to segment images into an optimal number of clusters devoid of any prior training. The real-time performance of this technique allows easy distribution of devices such as cameras and mobile phones in the fields. In addition, Shao et al. [17] propose using localization and DL-based method to recognize dense rice images. The proposed model can be used to determine rice diseases. The results from the study show that better results can be obtained compared to conventional ML methods. The SS method grounded on deep CNNs can also identify crops from the compound and natural field environments [18]. According to Martins et al. [19], it can also detect tree canopies in an urban setting.

The SS method based on DL demonstrates great precision in remote sensing categorization as well, and it necessitates vast sets of data in controlled learning [20]. The simple notion of DL is using a neural network for analyzing information and learning image feature. In their study of estimating sorghum panicles, Malambo et al. [21] applied an image analysis method founded on a SegNet framework. Sorghum panicles are critical phenotypic data in the improvement of sorghum crops. The study results demonstrated that DL combined with SS shows excellent precision with large data. On the other hand, Pena et al. [22] suggest using data fusion to enrich images used in remote sensing.

In very recent works [2326], plant disease recognition models have been improved for better results and performance. Manjula et al. [24] have used ResNet-50 architecture, a variant of the Resnet model that has 48 convolution layers. The accuracy of the developed system is around 97-98%. Chen et al. [25] have improved the plant disease-recognition model based on the original YOLOv5 network model, which accurately identified plant diseases under natural conditions. Hassan and Maji [26] have proposed a novel deep learning model based on the inception layer and residual connection. They used Depthwise separable convolution to reduce the number of parameters, which led the model to achieve higher accuracy.

3. Materials and Methods

3.1. Dataset Description and Data Annotation

The typical DL-based methods require sufficient data for the training phase. In contrast, conventional ML can also be trained on limited data scenarios. One of the main drawbacks of DL techniques is requiring a large amount of data. In this research work, we used an already available dataset and also collected our own database. We use two kinds of data in our experiments, the details of which are provided next.

PlantVillage database [27]: The PlantVillage database is publicly available for downloading and research purposes. It is an open-access repository having more than 54K images. PlantVillage is a large dataset with various plants’ leaves and related materials collection. Most of the data in this database are collected in controlled laboratory conditions. Exposure to the real-time scenario is significantly less in the PlantVillage database. Therefore, most researchers using only PlantVillage database get nearly perfect results. The database includes images of 14 crops, including grape, corn, tomato, and soybean. The database consists of 10 folders, one for healthy leaves and the remaining for nine different kinds of diseases listed in Tables 1 and 2. We use a subset of images for the tomato plant. Our subset consists of around 16012 images of plant leaves collected from tomato plants. The total number of classes in these images is limited to ten only. Nine classes are of various diseases for tomato plant leaves, whereas one class is for healthy leaves. We keep the resolution of each image as 250 x 250 pixels. Some sample images of the database used are shown in Figure 1.

We use all images of the tomato plant contained in the PlantVillage. The diseased leaf images vary from 373 to 5357, as clear from Table 1. The total number of healthy images for tomatoes in PlantVillage is 1591. It is clear from Table 1 that all the ten classes in the dataset are not balanced as far as the number of images is concerned. On the one hand, the minimum value is 373, with a maximum of 5357. We use data augmentation methods to balance all classes, including adjusting the contrast, flipping images vertically and horizontally, and changing brightness levels.

TomatoDB: since images in the PlantVillage are simple and less challenging, comparatively good results are reported in the literature. To assess the framework’s performance more precisely, we also tested our model on a collection of images we had taken from the Internet. Our own collected dataset consists of more than 20000 images taken from tomato plant leaves. We collected these images from the Internet. While image collection, real-time scenarios, and more challenging conditions have been considered. The database TomatoDB will be available to the research community after the publication of the proposed research article. All ten classes are equally considered while collecting the database.

For SLS, correctly labeled leaf data for each pixel is needed. This ground truth data are created through annotation. We annotated these images manually using the interface we developed. This labeling involves selecting the areas of interest, random sketch application, adjustment of contrast and brightness, and assigning a label. Such kind of manual labeling is prone to errors. No automatic tool is used in such labeling. The labeling is highly dependent on the subjective perception of the human doing this labeling process. Hence, chances of error exist while providing an exact label to every pixel.

Images setup for experiments: The model we presented in this paper is applicable and valid to any plant disease with some visible symptoms. However, manual labeling will be needed to create an SS framework for training purposes. As a test case, we select a tomato plant with ten classes. However, since images in the PlantVillage dataset are not exposed to light and other variations, we collected some images (20000) from the Internet. Some tomato leaves images we collected from the Internet are shown in Figure 2. We use a combination of PlantVillage dataset images and our own collected data set. We split the dataset into the ratio of 80 to 20, a commonly used strategy for training and testing DL-based models. Some of the authors also adopt 5-fold or 10-fold cross validation. We set 80% data for training and 10% for validation to know the model overfitting problem. We resize each image in PlantVillage and TomatoDB to a size of 250 x250 before training and testing. The following section discusses all the hyperparameters of the deep CNN-based model.

3.2. Deep Model Learning

The performance of the visual recognition tasks is improved with the introduction of DL-based methods [2833]. The proposed paper addresses leaf disease recognition and classification using deep CNNs. We utilize the concept of SLS in the proposed research.

Convolution layer: This layer plays a vital role in the features extraction stage. The CovL is an essential component of the CNN model. The layers consist of a set of learnable filters. These terms are also known as kernels [34]. In this convolution process, the filter with a specific size slides over the image and is convolved with pixel values of the target image. The dot product is computed between kernel and input image pixels producing a feature map.

ReLU: We use ReLU as activation function. This function plays a crucial role in converting the input signal from a specific network node to the output signal. The resultant signal obtained a form as shown in equation.

Pooling layer: The pooling layer follows the CovL. The output from the CovL is given to the pooling layer. ML experts use three pooling strategies: random pooling, maximum pooling (MPL), and average pooling. We, in the proposed work, adapted MPL. The MPL achieved spatial invariance by reducing the feature map size obtained previously from CovL [35]. In this strategy, the max operation is applied to the feature map when the feature map is passed through MPL. This operation can be performed as described by :

Classification: we use the SoftMax classifier for classification. The pooling layers provide a feature vector to the SoftMax in the output layer. In the output layer, a function that appears is the activation function for a multiclass classification problem. The activation function calculates a vector having a real number (k) and performs the normalization task. The normalization converts input values into vectors consisting of probability values in the range 0 to 1. The Softmax returns each class probability value, having the maximum probability value as the target class [36].

Adam: It is a standard optimizer that computes individual adaptive learning rates for each parameter [37]. The exponential decaying average of previous gradients nt is used by this optimizer.

The proposed framework is presented in Figure 3.

Tables 3 and 4 summarize various parameters in the proposed CNN framework. As an activation function, we use ReLU. For constructing CNN-based model, we use three layers containing CovL, MPL, and FCL. The details of these layers with feature map description, kernel size, and stride are summarized in Table 3. The feature extractor extracts the features from the images of the leaves, including healthy and affected leaves. More description of the feature extraction part is in Figure 4. Features variation is handled by stage 1. Certain environmental factors produce scaling variations in images. These receptive fields overcome all the variations. Each field has sixteen filters. Stage 1 output is given to stage 2. We use the 2 × 2 kernel in MPL in stage 2. Each layer of ConvL is followed by ReLU. We place a special pyramid (SPD) between CovL and FCL. In stage 3, output from SPD is given to FCL. Both stages 3 and 4 extract desired features. More details about deep CNN parameters are presented in Tables 3 and 4 and Figure 4.

Data including both training images and ground truth are given to the framework. The density map is predicted in density estimation (DE), taking supervision from the ground truth data. We combined the segmentation map and DE map, feeding the results to the CovL. Loss is added to the algorithm (Dice Coefficient) in the SS section. Additionally, we add Euclidean distance loss for optimizing the estimated density maps.

3.3. CNN Optimization

A complete illustration of how hyperparameters are tuned and optimization is performed is presented in this subsection. Overfitting is a severe problem faced mainly by ML models. We use the methodology as suggested and used in [38] to tackle this problem. We use a combination of four different loss functions. We use the Euclidean distance for better optimization. The obtained segmentation density map can be written as shown in (3) and (4)

In (3), shows the estimated density in the supervision process. Similarly, represents the estimated density, and represents the ground truth density value. Similarly, M represents the pixel numbers in the GT density map.

We also introduce a loss in the SS part of the framework. The loss in the framework is due to the dice coefficient. The dice coefficient is two times the overlap area between the predicted segmentation and true values. The result is then divided by the total pixels in the ground truth and the original image. The range of the dice coefficient is between 0 and 1. We use another special loss function, called cross-entropy loss, which we represent as

In (5), the symbol Q represents the total sample, and C shows the number of classes used. Similarly, the ground truth class is shown by , whereas the estimated output is represented by . The final weighted loss function is represented by :

In (6), the value of was 0.3.

4. Results and Discussion

4.1. Performance Evaluation Measures

We use different evaluation measures, including precision (Pr), recall (Rc), accuracy (Acc), F-measure (Fm), and confusion matrix (Cmat). Most of these measures are defined with some terms called false positive (FP), true positive (TP), true negative (TN), and false-negative (FN). Pr is lower if the number of FPs is more. The Rc measures the correct prediction (positive only) by calculating the proportion of the number of TPs to the total sample (TP + FN). The range of both Pr and Rc is between 0 and 1. The Fm assesses the performance of the model by calculating the weighted harmonic mean between Pr and Rc. Mathematically, all the evaluation measures are defined in equations (7)–(10):

4.2. Experimental Setup

We perform our experiments with an Intel i7 workstation and employ NVIDIA GPU 840 graphics card. We perform all our experimental work with Tensor-flow, Keras, and Python. The number of epochs we use is 500, having a batch size of 150. We use the base learning rate as 0.0001 and the dropout rate as 0.4. We use two datasets for experimental work: including PlantVillage and our own collected database. The PlantVillage consists of more than 16K images, and our own database consists of more than 20K images. We combined both datasets and performed our experiments in the ratio of training to testing as 80 to 20.

4.3. Limitations of the Proposed Work

The numerical solutions and results reported in this paper show that a good performance is achieved by the proposed method; however, our proposed algorithm still has limitations. It is a fact that the research community has concerns about using DL architectures. All DL-based methods are complex and require inputs at several stages. Researchers using these techniques rely on a trial and error strategy. To summarize, these methods are time consuming and very well engineered. However, it is also confirmed that the only choice CV experts have for any CV-based task is DL methods. We use the idea of SLS in our proposed work. Ground truth data are needed for the training and testing phases to implement this model. In order to create the ground truth data, manual labeling is required. Since a single person does all this manual labeling, errors are expected most of the time in labeling. We also did this labeling manually through humans, which is a weakness of our proposed method.

4.4. Reported Results and Its Discussion

Some conclusions that emerge from the results and experiments are summarized in the following paragraphs.(i)Plant disease classification and identification using ML is not a new research area for CV and ML experts. The state-of-the-art reports many good papers on this topic. Due to diverse applications in agriculture, researchers explored this field sufficiently. However, we notice less emphasis, particularly on interclass disease identification. Researchers mainly focus on a single plant disease recognition, whereas our proposed work focuses on tomato plant disease classification with ten classes.(ii)Initially, we run the whole experimental setup for a maximum of 14 epochs (please see Figure 5). We run this setup to know how the model performance varies on training and validation databases. As clear from Figure 5, training along with validation accuracy changes very quickly up to value 6. After value 6, change occurs very slowly in the upcoming epochs. Both training and validation losses are also shown. It is clear that loss is high in the initial stages and is gradually reduced after increasing the epochs. This loss reduction clearly shows that the network is fine-tuned gradually with increasing epochs.(iii)We use ten class disease problems in our work. The names of the classes, along with abbreviations, are shown in Table 5. We report the results for Pr, Rc, and Fm for all the ten classes. It is clear from Table 5 that near-perfect results are reported for the class BS using all three evaluation measures. Similarly, better results are reported for the classes LB, TMV, SM, and HL. The worst performance has been shown for the class EB with precision 0.93, recall 0.95, and F-measure 0.95, which also shows acceptable and good results. Our proposed method semantically segments leaf images into background and foreground. Please see some images in Figure 6, where column 1 represents the original images, column 2 ground truth, and column 3 segmentation results. After foreground estimation, each disease classification is performed. Moreover, it is also estimated how much percent of the leaf area is affected by a disease.(iv)Cmat is the best choice for multiclass evaluation problems, which ML experts commonly use. It shows the corresponding percentage of the predicted class and true class. The Cmat for the reported results for the 10-class problem is demonstrated in Table 6. The results vary from 94% (lowest) to 100% (highest). The lowest results are reported for EB, whereas the highest values are reported for HL. The LM, YLCV, BS, and LB results are comparatively better, with predicted accuracy values as 99%, 98%, 98%, and 97%, respectively.

4.5. Performance Comparison with Previous Results

We compared the reported results with S.O.A. in Table 7. It is clear that the reported results are far better than previous results. The reported results and their comparison with S.O.A. are for accuracy measure only. As most of the papers reported their accuracy results, we compared our work with this metric only. We want to add that some research papers reporting results on plant disease classification using hand-crafted features show better results than DL-based methods. However, we believe a better understanding of DL methods is still required to address a specific task. For example, the requirement of a large amount of data is a problem DL methods face. Generally, traditional ML methods perform well on data collected in indoor scenes; however, researchers report a significant drop in performance when these methods are tested in real-time scenarios. On the contrary, DL architectures extract a higher level of abstraction from the data with much better results. Thus, the need for feature engineering is minimized to a large extent with DL algorithms.

5. Summary and Concluding Remarks

Due to diverse applications in the agriculture sector, plant disease identification using DL is an active area of research. Plant disease recognition is more challenging when the method is exposed to real-time data. However, CV researchers have shown tremendous progress in the past 5 to 10 years. Our current research provides unification and extension of our previous work reported in [7]. Our study is mainly motivated by looking into the human visual cortex to design an E2E trainable neural network architecture. We propose an E2E SS framework for plant disease identification using DL. We introduce the idea of SS for plant disease recognition. The proposed model predicts the nature of the disease of the tomato plant and tells how much area of a specific leaf is affected due to a certain disease. The model successfully classifies tomato plant leaves into ten distinct classes. We present a novel loss function that improves the model’s performance on a state-of-the-art dataset. We evaluate our model with the standard dataset PlantVillage, noticing much better results than previous results. Along with the PlantVillage database, we also collected a database of more than 20000 images and tested our framework on it. We expect more evaluation using a much better optimized DL model for plant disease recognition from the research community. In the future, we intend to analyze some more tasks to develop robust continual DL models, considering some complex combinations of the neural network along with information extraction.

Data Availability

The dataset used in this research, PlantsVillage, is available from https://www.kaggle.com/emmarex/plantdisease.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

The authors gratefully acknowledge Qassim University, represented by the Deanship of Scientific Research, on the financial support for this research, under the no. 10269-coc-2020-1-3-I, during the academic year 1441 AH/2020 AD.