Abstract

The economic prosperity of a country is highly dependent on agriculture. The use of technology in agriculture has greatly contributed to the economic prosperity of industrialized countries and is crucial for the growth of emerging countries. One major challenge in agriculture is the detection and control of plant diseases, which can greatly affect food production and population well-being. Plant illnesses have a substantial effect on plant productivity and quality. The detection of various types of diseases in plants with bare eyes is time consuming and a difficult task with little precision. Mainly our primary concern is tomato crops. The economic demand for tomatoes has grown dramatically over time. The complicated task of controlling tomato infection requires ongoing care during the crop cycle and consumes a considerable amount of the total cost of production. To classify tomato diseases, we made the use of the pretrained deep neural networks and automation, which are crucial for this method. Digital image processing can be used to monitor plant disease. Deep learning has made remarkable improvements in digital image processing in recent years, surpassing the older techniques. This article identifies tomato leaf disease using a deep convolutional neural network (CNN) and transfer learning. The CNN’s backbone comprises AlexNet, ResNet, VGG-16, and DenseNet. The Adam and RmsProp optimization methods examine these networks’ relative performance, demonstrating that the DenseNet model with the RmsProp optimization approach achieves the most significant outcomes with the best accuracy of 99.9%.

1. Introduction

Edible plants have been crucial to human sustenance, and on many occasions, disease management for such crops proves to be challenging. Farming and agriculture are ancient professions, and the United Nations Food and Agriculture Organization (FAO) estimated that the worldwide population will be about 9.2 billion by 2050 [1].

Reducing the yield loss is crucial to feeding a society of over 7 billion people today. Still, factors such as climate change, pollinator decline, and plant diseases continue to threaten food security. Early-stage detection of diseases will aid in reducing treatment costs and chemical inputs, consequently reducing the environmental impact. In developing states, smallholder farmers, who are responsible for over 80% of agricultural production, face significant challenges such as pests and diseases, leading to yield losses of over 50% (International Fund for Agricultural Development, 2013). Additionally, most hungry people (50 percent) live in smallholder agricultural households, making smallholder farmers particularly vulnerable to food supply disruptions caused by pathogens.

Despite numerous techniques implemented to prevent crop loss due to disease, early detection is imperative to the larger yield goals of various farmers today. Integrated pest management (IPM) strategies have gradually supplanted traditional approaches to widespread pesticide use over the last decade. While initially, plant clinics assisted in disease control and management, more recent developments include utilizing technology, such as mobile phone-based tools, and capitalizing on mobile phone technology’s historically unprecedented rapid adoption in every corner of the globe. Disease detection techniques today are limited to specialists who must locate and identify the malady. The commercial volume of plants in tandem with the minuscule size of the disease symptoms may further complicate things.

Tomatoes’ economic demand has increased significantly over time [2]. Tomato infection control is a complex operation that needs constant care during the whole crop cycle and accounts for a large portion of overall production expenses. In addition, although new viral infections continue to appear, viruses that attack tomatoes have been discovered.

There are various techniques that have been implemented to detect tomato plant diseases. Direct approaches, like chemical analysis of diseased plant sections, as well as indirect techniques, for example, imaging and spectroscopy [3], are used to identify plant features and stress levels in response to disease detection, respectively.

For the categorization of tomato illnesses, we utilized the pretrained deep neural networks with transfer learning AlexNet, VGG16, ResNet, and DenseNet in this study. The models are trained and fine-tuned using various hyper-parameter modifications. The networks’ performance was tested on a well-known dataset for illness prediction, the “Plant village dataset,” and exciting results were discovered.

The article is structured in the following manner: the next section summarizes the advances achieved in predicting tomato plant disease. Section 3 delves deeper into the dataset, while Section 4 details the suggested technique. Section 5 describes the experiments and their results, while Section 6 concludes the study.

2. Corresponding Work

In the past, most studies, particularly in agriculture, depend on image processing, pattern classification, and machine-learning methods [4]. The initial step is to employ video cameras to record photographs of the surroundings. The image was then subjected to several processes in order to extract important features. The main goal is to identify sick areas in a picture. Growing plant disease categorization research has been conducted recently to create efficient plant diagnostics tools for farmers [5]. Numerous artificial intelligence (AI) methods were implemented to categorize and identify different plant abnormalities, including those that affect the olive, plum, mango, pomegranate, rice, citrus, tomato, oranges, cassava, tea leaf, and apple. [6].

Many studies have been conducted on categorizing and preventing plant leaf diseases in general and tomato leaf diseases specifically. However, there are certain limitations to this. Prior research study has relied chiefly on the principal component analysis (PCA), which involves comparing a single sample and using the same value as a reference filter throughout the illness detection and classification process. Moreover, during the detection procedure, only texture-based segmentation algorithms were applied [7]. Although this may have yielded the intended findings, the accuracy of illness detection prediction under real-world situations will be different [8]. Pesticides may be diverted, and farmers may get inaccurate pesticide or medical advice.

Accurate calculation of disease incidence, severity, and their effects on crop quality and yield is essential for agricultural fields, horticulture, plant breeding, improving fungicide effectiveness, and plant research. Therefore, keeping an eye on the growing environment and spotting plant illnesses are crucial for sustainable agriculture. Prior detection of suspect regions in the plant could potentially reduce financial losses and make it easier to use effective management techniques to boost production [9].

For smart agriculture to promote early detection of plant illnesses and increase crop production, deep learning algorithm advancement is crucial. Due to the need for specialized knowledge and expensive tools, data collection for machine learning applications is an expensive undertaking. The practical use of any application is often limited by the lack of expertise of its users and the capabilities of the devices used to take the images for classification. Implementing data augmentation techniques for neural network training, we seek to increase the accuracy of deep learning models on low-quality test photos [10].

Several customized feature-based picture identification algorithms were employed mainly prior to the development of deep learning in the computer vision industry. Because of the extensive preprocessing, feature extraction, and classification, these approaches are computationally expensive and time intensive. It is referred to as a manual approach since constructing this algorithm necessitates complete human understanding and the procedure comprises many variables.

Deep learning has enabled scientists to create systems that can be trained and evaluated as a whole, in contrast to handcrafted methods that require individual steps. Convolutional neural networks (CNNs), due to their excellent accuracy in extracting features for image identification tasks, have been utilized across various industries such as automation, robotics, and agriculture.

Computer vision and CNNs, in agricultural applications, are used to do difficult and composite tasks like plant recognition. For example, a system using CNN beats local feature descriptors and bags of visual word algorithms when recognizing 10 different plant species. The CNN approach has been used to diagnose plant ailments in various crops, thanks to recent improvements in machine learning. For example, [11] use a CNN-basedLe-Net and image processing to detect two health issues. It has already been established. Using a custom dataset and photographs taken from the Internet, [12] developed a CNN-based system that could identify 13 illnesses from the health disorders of five crops. In the top 1, the suggested technology has a 96.3 percent success rate, and in the top 5, it has a 99.99 percent success rate. They proposed a rice sickness recognition system, the base of which is deep convolutional neural networks, in numerous trials [13]. A CNN was trained to identify 10 most common rice diseases consuming a dataset of 500 real images of healthy and damaged rice leaves and stems. The suggested model is 95.48 percent accurate on average. Tan et al. [14] suggested a CNN-based method for identifying apple pathology photos based on a self-adaptive momentum mechanism for updating CNN parameters. According to the data, the proposal’s identification accuracy was almost 96.08 percent, with rapid convergence. S. Adhikari et al [15] attempted to automatically classify and diagnose plant diseases with a focus on tomatoes. Image preprocessing, image capture, and image investment corrections were done for input preparation. For the classification and extraction of features CNNs were implemented. The model was tested using a dataset of web photos and yielded promising results.

We select deep learning models (AlexNet, ResNet, DenseNet, and VGG-16) to study plant disease because this study focuses on supervised machine learning algorithms including AlexNet, ResNet, DenseNet, and VGG-16 with image processing techniques. It also compares the results of the various algorithms to determine which one is the most effective at classifying plant diseases.

As a consequence of the fast growth of CNNs, several strong CNN designs have emerged, including AlexNet [16], Google Net, VGGNet [17], Inception-V3, Inception-V4, and ResNet [18]. It is essential to train the deep neural nets from the very beginning which requires a large quantity of data as well as costly processing resources. Simultaneously, we may have categorization challenges in one sector while lacking data in another.

On the other side, transfer learning can help deep neural networks perform better by eliminating the prerequisite for time-consuming data mining and annotating. In application, transfer learning may be divided into two types.

One approach to improve the performance is to adjust the network’s weights using our own data; however, the new data needs to be adjusted to match the input size of the pretrained network. Another option is to use the learned weights from the pertained network to train the target network. K.Zhang et al [19] used three pretrained networks (AlexNet, Google Net, and ResNet) to investigate the performance of SGD and adaptive moment estimation (Adam) in diagnosing tomato leaf disease. Using two different optimizers, SGD and Adam, train the network with a variety of batch sizes and iterations. The original tomato leaf dataset was obtained through an open-access picture repository. ResNet using stochastic gradient descent (SGD) has a batch size of 16 and 4,992 iterations after a series of studies. The training layer achieves the top-most accuracy rate of 97.28 percent for diagnosing tomato leaf disease from the 37th to the fully linked layer.

Similarly, developed the notion of pretrained networks for disease prediction in tomato plants. Images of tomato leaves were fed into two deep learning models, AlexNet and VGG16 Net, which achieved classification accuracy of 97.29 percent and 97.49 percent, respectively, using pictures from a dataset named Plant Village.

Moreover, we applied concept transfer learning in this research and used the Plant village dataset (9 sickness and healthy classes) to diagnose tomato illnesses using four pretrained networks (AlexNet, ResNet, DenseNet, and VGG16).

3. Methodology

3.1. Dataset

The initial dataset for tomato leaf disease was obtained from the Plant Health Open Access Image Library [20]. Table 1 and Figure 1 present the health condition of tomato crops and eight other diseases.

Bacterial spot, early blight, health disease, late blight, leaf mold, rust spot, red spider, target spot, mosaic virus, and yellow leaf curl virus are the disease names. Below are some statistics and sample photographs of tomato leaves.

The Plant Village dataset was utilized to gather pictures of 9 distinct disease-affected and healthy tomato samples, as previously detailed. In the segmented images, the background pixels of the three channels were initialize to 0, except for those associated with a leaf. Including 9 disease-affected and healthy classes, 7301 samples were taken into consideration. The specified deep learning models (ResNet, VGG16, AlexNet, and DenseNet)used images that were resized from the original size of 256 × 256 pixels to 224 × 224 pixels.

3.2. Preprocessing Steps
3.2.1. Image Reading

For the architecture to be trained and predicted on new and unknown samples, the photos should have matched the input size of the network. If the size of one’s photographs requires to be changed to fit the model, one can resize or crop them to the required size. We constructed a function to load image-containing folders into arrays after storing the path to our image collection in a variable.

3.2.2. Image Resizing

Image resizing is an important step in the deep learning workflow as it is used to format raw input into a format that the network can accept. By altering the size of the picture to match the input layer, we can prepare the image for the network.

3.2.3. Gaussian Blur-Removal of Noise

Preprocessing the data might improve the desired features or eliminate artifacts that could bias the network so that we can easily normalize or remove noise from input data.

3.2.4. Background Subtraction

By creating a foreground mask, the technique of background subtraction can be used to distinguish foreground objects from the backdrop. This method is employed to identify moving objects from stationary cameras. Techniques for background reduction are crucial for tracking objects.

3.2.5. Smoothing of Edges

Smoothing of the edge for the image is a digital image processing method that lowers and suppresses image noise. In general, neighborhood averaging should be opted to attain the smoothing goal in the spatial domain. Average smoothing, Gaussian smoothing, and adaptive smoothing are three types of smoothing filters that are frequently used.

3.2.6. Rotation, Horizontal Flip, and Cropping

Rotation is the process of rotating an image by a predetermined amount. Flipping involves reversing the rows or columns of pixels in an image, whether it be in a vertical or horizontal orientation. Cropping, on the other hand, involves selecting a random section of an original image and adjusting its size to match the dimensions of the original image.

3.2.7. Color Corrects and Saturation Adjustments

The technique of altering an image to eliminate and fix any discrepancies in how the human eye perceives things is known as color correction. With a few quick adjustments, we can digitally edit an image to make it feel, look, and appear as we saw it in its original state.

Saturation adjustment is a term used to characterize a color’s intensity. The term “lightness” correspondingly defines how bright or dark a color is. A photograph in grayscale or black and white has no color saturation, yet a photograph in full color of a field of sunlit wildflowers can have a lot of color saturation.

3.2.8. Image Preprocessing in Detail

Image resizing was a common preprocessing procedure used to adjust to the requirements of the DL systems. In previous research studies, the image size was modified to for the AlexNet architecture and for the VGG16 Net by resizing it to half of its primary dimensions. Image segmentation was used to increase the size of the dataset, focus on specific areas of interest, and simplify data annotation for researchers by using techniques such as calyx, stalk scar, and threshold segmentation. For image enhancement, some datasets used adaptive histogram equalization [21]. When utilized in machine learning projects, models typically perform well when the ratios of the classes in the dataset are similar. But occasionally imbalances come into play. In this article, the synthetic minority oversampling methodology is examined along with other techniques. To balance the imbalanced dataset, various resampling techniques such as IMPS, SMOTE, RUS, and ROS are used. Background removal was performed to minimize the noise in the dataset by utilizing methods such as the Gaussian density function, histogram thresholding technique, and noise cancellation algorithm. Other operations include the creation of bounding boxes to aid in weed detection, fruit counting, and the classification of the numerous objects present. Some datasets implemented image space conversion, converting RGB to HIS color model or other color models such as HSV [22].

In addition, some papers used image feature extraction techniques to extract features such as shape and statistical features, histograms, PCA filters, Gabor filters, and GLCM features [23].

3.3. A Brief Description of Deep Learning Architecture Used
3.3.1. AlexNet

AlexNet achieved the highest recognition accuracy when compared to all other machine learning and computer vision algorithms. This was a landmark moment in the history of machine learning and computer vision for visual recognition and classification tasks, and it signaled the start of a deep learning boom. The architecture of AlexNet is defined by the number of layers used for convolution and pooling. Convolution and maximum pooling are accomplished using local response normalization in the first convolutional layer, which employs 96 distinct 11 × 11 size receive filters (LRN).

The maximum pooling procedure is carried out using a 3 × 3 filter with a step size of 2. We apply the same technique to the second layer using the 5 × 5 filter. The third, fourth, and fifth convolutional layers are constructed using the 384, 384, and 296 feature maps. In the AlexNet architecture, a sequence of convolutional layers is followed by layers for ReLU activation, max pooling, and normalization. ReLU activation is applied to the outputs of all convolutional layers, including the last two fully connected layers. Additionally, the final fully connected layers have been modified to include ten classes, with nine representing different classes and one representing a healthy class. AlexNet’s architecture is depicted in Figure 2.

3.3.2. ResNet

The residual network design was chosen as the ILSVRC 2015 winner. Kaiming invented ResNet.

He set out to create an ultradeep network that was immune to the problem of vanishing gradients that had afflicted prior generations. ResNet employs a layering scheme consisting of 34, 50, 101, 152, and even 1,202 layers. The well-known ResNet50 network consists of 49 convolutional layers and a fully linked layer at the end. For this research, we utilized ResNet50, a high-performance network. The full link layer, softmax layer, and classification layer are used to augment the final three layers of ResNet in this work. For the classification of tomato leaf diseases, the fully connected layer in the ResNet architecture is replaced with ten neurons. This modification changes the structure of the ResNet. Additionally, the input image size for ResNet should be 224 × 224 pixels. The ResNet block’s fundamental block diagram is presented in Figure 3.

3.3.3. DenseNet

DenseNet was founded in 2017 by Gao Huang and his coworkers. Each layer’s output is coupled to the output of all the following layers via a thick block composed of densely connected CNN layers. As a result, it is formed using dense connections across layers, earning it the moniker “DenseNet.” This concept is extremely persuasive for feature reuse, as it results in a large reduction of network parameters. DenseNet contains a large number of dense blocks with transition chunks among them. Separately, every layer accepts as input all of the prior layer’s feature maps. This new model demonstrates the highest level of accuracy for object recognition tasks with a moderate number of network parameters. In this research, we implemented DenseNet for transfer learning by utilizing pretrained weights and adjusting the classification layer for our specific ten categories.

3.3.4. VGG-16

VGG16 is a pretrained network having AlexNet’s stacked architecture as its base but with additional convolutional layers (as presented in Figure 4). It is composed of 13 convolutional layers, followed by a ReLU layer on each convolutional layer. Similar to AlexNet, certain convolutional layers in the model are followed by max-pooling to reduce their dimension. In comparison to AlexNet, which makes use of filters with bigger dimensions, convolution layers make use of filters with a 3 × 3 dimension. The quantity of characteristics is reduced by using reduced filters, and a ReLU layer is added after each convolution layer to increase nonlinearity.

3.4. Experiments

In the first phase of the experiment, data augmentation was performed. Deep neural network systems outperform traditional machine learning and computer vision approaches, although they suffer from overfitting. The wrong selection of hyperparameters, system regularization, or training of the system with a large number of pictures is all examples of overfitting. We employed a range of boosting processes to increase the quality of the photographs in the collection. Intensity adjustments (contrast and brightness, color, and noise) as well as geometric changes (resize, rotation, horizontal flip, and cropping) are used.

This research uses a transfer learning approach, which involves employing a pretrained deep learning model to categorize new categories of objects. The corrected pictures were first sent into the AlexNet, DenseNet, ResNet, and VGG16 networks that had already been trained. Because the model was trained on 1,000 items in the ImageNet dataset that belonged to different categories, the final layer was replaced with an output layer with an equal quantity of categories. In this scenario, there are 10 classes, nine disorders, and one health class. A softmax layer and a fully connected layer have also been added to the model. As a result, the final three layers of all models have been altered. The early layers’ learning rate is kept to a minimum since they are already trained networks that have been fine-tuned for ImageNet recognition. The batch size is 32, and the maximum number of epochs is 40 (Max).

4. Results and Discussion

The current section presents the experiment and its findings. All tests are conducted on a Windows 10 platform with a programming language “Python” and Kaggle, as well as a TPU and 16 GB RAM. By incorporating the pretrained network, the initial experiment tries to determine the optimal optimization approach to detect tomato leaf disease among RMSProp and Adam.

We set the following hyperparameters for every network in this research: 32 batch size is set, at first 1.001 learning rate is set, and the maximum epoch is set to 40. The gradient decay rate is 0.9, the square gradient decay rate is set to between 2 and 0.999, and the Adam optimizer set’s denominator is set to 10–8. The accuracy of many networks utilizing Adam and RmsProp optimizers is shown in Table 2 and Figure 5.

To determine the classification accuracy, an equal quantity of pictures from each class is employed. The minimum number of instances in our analysis is 160 from the mosaic virus class, with the largest number of samples coming from the yellow leaf curl virus class. We balanced the samples up to 1,600 for all classes after data augmentation. With the varying number of epochs, each model is fine tuned. All networks test accuracy and training loss are shown in Figures 613.

When comparing the four models using execution time and accuracy as evaluation metrics, we found that AlexNet required more repetitions and had lengthier runtime compared to the other pretrained networks, but had lesser accuracy. In the same way, in comparison to AlexNet, the VGG-16 model performed fine in terms of precision and execution time. The ResNet model and DenseNet model by twenty repetitions yielded an accuracy of 0.99% but DenseNet took the longest to execute. With an outstanding accuracy of 99.8%, the ResNet model had the highest speed. So, if we had to pick among the four most efficient models, the ResNet would be the best option. After comparing four pretrained architectures in terms of optimizer, we found that the Adam optimizer yielded good results for the VGG-16 and AlexNet models, while the RmsProp optimizer produced excellent results for the DenseNet and ResNet models. As per common assumption, DenseNet with Rmsprop is the most efficient and appropriate system to predict tomato leaf disease.

Accuracy is a measure of how close the predicted/generated value is to the true value of the item. The precision is an indication of the reproducibility of the predicted value, i.e., if the same value is fed to the model twice, how far apart would the results be for each case. Recall is the measure of how many true positives were found in all. Lastly, the F1 score is calculated, which is the harmonic mean of precision and recall.

Considering the 9 classes of which healthy leaves can be considered the smallest data subset, there is a larger number of unhealthy leaves vs. healthy leaves. Considering the fact that the dataset is not completely balanced, here the precision, recall, and F1 scores are better evaluation metrics than accuracy. The minimization of false positives is pivotal to the correct classification of diseased leaves and minimization of false negatives is crucial to finding the diseased plants. Hence, precision, recall, and F1 scores in the aforementioned table are good indicators of the excellent performance of the model with both Adam and RMSProp optimizer. The Adam optimizer has reached the highest values for each of these key evaluation metrics. The accuracy, precision, recall, and F1 of many networks utilizing Adam and RmsProp optimizers is shown in Table 3.

5. Limitations of Methodology

The challenges of plant pests and diseases often greatly impact the accessibility and protection of plants for human consumption. To address these challenges, a new agricultural revolution is taking place. New digital technologies and deep neural networks play a significant role in this agrarian revolution. We opted CNN models (AlexNet, ResNet, DenseNet, and VGG16) to use in the development of our hybrid model to identify plant diseases. We utilized the plant village dataset, which includes nine different types of tomato diseases.

A deficiency of samples is a common barrier because the model should encounter overfitting. Several augmentative techniques were used to get around this problem. The state-of-the-art results and developments in GAN motivated us, and this kind of system was implemented (in addition to traditional techniques) to increase the size and supplement the dataset. In conclusion, the Plant Disease Net, a novel two-stage architecture for plant illness recognition, was proposed. The use of different kinds of information sources, like climate, location, and plant age, could potentially improve accuracy.

Future research should concentrate on noticing diseases in different parts of the plant and at diverse stages of the disease. The developed model can be used as a decision support system to provide optimal decision-making conditions and also can be integrated into a mobile application for low-cost detection of plant diseases by taking a photograph of the leaf [24].

6. Conclusion

In this study, a deep neural network model was presented to categorize and identify tomato plant leaf illnesses into specified categories. It also took into account morphological features including the plant’s color, texture, and leaf margins. The study can only be used in investigating the diseases of tomato leaves.

A comparative study in networks using pretrained VGG-16, DenseNet, ResNet, and AlexNet neural networks with transfer learning techniques was conducted, and the same dataset was used. The relative performance using the Adam and RMSProp optimization methods was measured. Our proposed model, DenseNet with transfer learning, outperformed all the others with a 99.9% accuracy rate when using RMSProp as its optimizer. Additional work can go into the detection of all types of illnesses in a myriad of plant species. This form of early detection can also aid in the development of treatment. Modifying the model with a larger number of images, including other crops, will aid in the early detection of other plant species, considering the high detection rate with tomatoes.

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.