Abstract

Cotton is one of the economically significant agricultural products in Ethiopia, but it is exposed to different constraints in the leaf area. Mostly, these constraints are identified as diseases and pests that are hard to detect with bare eyes. This study focused to develop a model to boost the detection of cotton leaf disease and pests using the deep learning technique, CNN. To do so, the researchers have used common cotton leaf disease and pests such as bacterial blight, spider mite, and leaf miner. K-fold cross-validation strategy was worn to dataset splitting and boosted generalization of the CNN model. For this research, nearly 2400 specimens (600 images in each class) were accessed for training purposes. This developed model is implemented using python version 3.7.3 and the model is equipped on the deep learning package called Keras, TensorFlow backed, and Jupyter which are used as the developmental environment. This model achieved an accuracy of 96.4% for identifying classes of leaf disease and pests in cotton plants. This revealed the feasibility of its usage in real-time applications and the potential need for IT-based solutions to support traditional or manual disease and pest’s identification.

1. Introduction

In Ethiopia, agriculture is the basis for national economy from which 85% of livelihood and 90% of total foreign trade comes from this agricultural sector [1]. It is believed that Ethiopia is suitable for many farmable crops, and one among them is cotton. Cotton (Gossypiumspp) is also called “White Gold” and “The King of Fibers.” For growers, processors, exporters, and producing countries, cotton is the earnest point of supply [2]. According to the data of African report, only 428,120 hectares are harvested with the total production value of about $596,000,000 in SNNPRS. Approximately, 18% of crop yield are lost due to different diseases and pests, which result in the loss of millions of dollars worldwide every year.

Even though agriculture is the backbone of Ethiopia, so far no advanced technologies have been explored in the development of automation in agricultural science and also there are high problems in production and quality due to different diseases and pests. In recent times, the sophisticated emerging technology has attracted many researchers in the field of detection and classification of cotton leaf diseases and pests. In Ethiopia, there are several constraints which reduce the yield and quality of the product. Particularly, identification of potential diseases or pests on Ethiopian cotton is based on traditional ways. There is a wide area of farm suitable for cotton plantation, but only limited research attention is given to cotton crop production. Traditionally, experts detect and identify such plant diseases and pests on bared eyes. Bared eye determination is considered as a loss of low-level accuracy in order to detect any diseases. On high demand, different advanced technologies were aided for structuring the systems to assist nonautomatic recognition of the plant diseases and pests to increase the accuracy for any corrective measures. With the help of advanced technologies, the plant diseases were reduced, thus increasing the productivity which helped to raise the economy via boosting the production. For that reason, the implementation of information technology-based solutions in the sector of agriculture had high level of significance for Ethiopia’s development in monetary, community, and eco-friendly developments by increased cotton crops’ productions.

Among different diseases and pests occurred, about 80–90% were on the leaves of cotton [3]. In Ethiopia, it is observed that there might be a fiscal destruction around 16% because of plant syndromes. However, without control measures, it can cause 30%–50% of loss [4]. Cotton diseases and pests are difficult to identify through bared eyes.

2. Statement of the Problem

The cotton plant is susceptible to several disorder (biotic and abiotic constraints) attacks due to temperature fluctuation, diseases, and pests. Indeed, the whole world produced nearly 576 kg per hectare of cotton crops, where only 10% of production loss occurred due to different cotton leaf diseases. The United States of America (USA) is a major exporter of cotton in the world and it obtained 5.1 billion US dollars in 2016, but there are well-known native pests which were the reason for the distraction of cotton farms [3]. And, India has 24 percent of cotton land of the world and got 4.6 billions of dollars in 2016, from which generally 18% of cotton crops’ production was lost every year due to different diseases that attacked the cotton plants which had its impacts on losing almost nine hundred thousand of Indian rupees [5]. Presently, in Ethiopia, nearly 12–15% of cotton crop plants are infected due to different diseases [6]. In Ethiopia, performance evaluation of GTP-I showed that these diseases and pests are the main constraints of the world standards in cotton quality and quantity of production. This results in the downfall of the economy of both the farmer and the country [1].

Detecting these diseases with bare eyes increased the complexity of cotton crops productivity which decreased the accuracy in identification precision. Even an expert would fail to assess and diagnose the diseases with their bare eyes, and this inadequate technique leads to more wastage of cotton crops. Due to these mistaken conclusions, most of the time, certain unnecessary pesticides which badly affect healthy cotton are applied. Leaving the farm for even a short time interval without production will affect overall nation GDP [7].

The researchers forwarded the following research questions with consideration of the issues cited in the statement of problems:(1)What is the suitable technique used for diagnosing cotton disease and pests?(2)How to develop an automatic cotton disease and pests diagnosis system?(3)How to determine the acquisition of the model?

Deep learning incorporates image processing and data analysis as a path for more possible findings. As it has been a successful application, it has now entered the domain of agriculture. Today, several deep learning-based computer vision applications such as CNN (convolutional neural network), RNN (recurrent neural network), DBN (deep belief network), and DBM (deep Boltzmann Machine) are performing tasks with high accuracy. However, the most prominent application for this research work is CNN [7].

Nowadays, CNN techniques are used to detect different objects and to perform automatic drawings of instructions for analysis purposes [8]. K-fold cross-validation strategy recently recommended dataset splitting and boosted generalization of the CNN model. Generally, the model developed at the end was from scratch rather than any transferred learning model or pertained model.

Deep learning draws an attention in order to maximize the performances to classify different tasks which help to promise the human intervention data [9]. In this real world, the usage of deep learning shows the major interest for decoding human brain activities [10]. The problem is faced between intertrial and intersubject variability in electro encephalography signals, an indigenous access for attention-based bidirectional long-short-term memory. Convolutional neural network was analyzed among different factors that are classified into four classes of electro encephalography motor imaginary functions. Here, the usages of bidirectional long-short-term memory with the attention model accomplished the extraction of different features from the raw electro encephalography signals. Advancement of the clinical translation of the electro encephalography motor imaginary-based brain computer interface technology is applicable for varied request, where this system supports the paralyzed patients. The unusual achievements include the maximum accuracy and time-resolved predictions [11].

. To make an efficient and effective interface system, the human plays an important role.Graph convolutional neural networks, a novel deep learning framework, addressed the issues in order to differentiate the four-class motor imaginary intentions by mutually agreeing through the similarity of electro encephalography electrodes. To find the motor imaginary, four tasks are preferred with the prediction of highest accuracy [12].

3. Scope and Limitation of the Study

This research study focused on developing an identification model for cotton leaf diseases and pests using deep learning technique called convolutional neural networking. Three common types of disease and pests such as bacterial blight, leaf miner, and spider mite have been affecting cotton productivity and quality. Also, the model applied made a supervised learning technique on datasets with four prime feature extraction process and 2400 datasets. The datasets are limited to four different feature descriptors. Taking into consideration the time constraints and reach of the regions that grow cotton, the research focused in the southern part of Ethiopia such as Arba Minch, Shele, and Woyto. MelkaWorer agricultural research center was also proposed as a focus area because it is responsible for cotton farms in SNNPR. Deep learning techniques were used to perform the automatic feature eradication from the different input datasets.

4. Literature Review

According to Shuyue [13], they outlined the different formats of graph convolutional neural network. It was prepared to process the uniform electro encephalography data for the purpose of predicting the four classes of motor imaginaries to relate with electro encephalography electrode. They addressed their data with the transformation of 2D to 3D perspectives. The structure was processed through these dimensional units.

A study [14] stated that, in order to utilize the dynamic route of deep learning, they proposed short-term voltage stability. They managed the clustering algorithm to obtain short-term voltage stability to increase the reliability.

In [7], it is stated that deep learning technique was applied to identify the leaf diseases in different mango trees. The researchers used five different leaf diseases from various specimens of mango leafs, where they addressed nearly 1200 datasets. The CNN structure was trained with more than 600 images, where 80% are used for training and 20% are used for testing. Remaining 600 images were used to find the accuracy and to identify the mango leaf diseases which showed the feasibility of its usage in real-time applications. The classification accuracy can be further increased if more images in the dataset are provided by tuning the parameters of the CNN model.

The research study [6] states that the mechanism for the identification and classification of rice plant datasets are used to process the CNN model. For training, nearly 500 different images with diseases were collected for processing from the rice experimental field.

In [15], detection of cotton leafs were addressed with image processing. Here, K-means algorithms are used to segment the datasets.

The research [16] showed the identification of diseases in banana plants which infect their leaf. In this research study, 3700 images were used for training, but there is no balanced dataset in each class. Researchers performed different experiments, for example, the training mode by using colored and grayscale image datasets and also by using different dataset splitting techniques. They obtained the best accuracy of 98.6% in colored image and 80% and 20% training to the validation dataset.

5. Research Methodology

This study used a design science to build and evaluate an approach that creates innovations and defines ideas, practices, technical capabilities, and products using qualitative or quantitative data. One of DSRM outputs is a model; it is a conceptual representation and abstraction of datasets. According to Hevner [16], Figure 1represents the processing model for this research.

Among different entry points, “problem-centered initiation” is the best fit for this design science research. The problem-centered initiation entry point is applicable because the problem is being observed by the researchers and business within the cotton disease identification domain [17]. Figure 1depicts the DSRM proposed by the research study together with the activities adapted to this research.

6. Data Collection and Sampling Technique

The sample leaf images which the researchers have used in this research are both primary as well as secondary types of dataset. Primary data is a type of data collected fresh for the first time. In this study, the primary types were collected from July to August 2019 from Arba Minch, Shele, and Woyto cotton farms where cotton plants are widely planted and there is high infection in SNNPR, whereas secondary data collected in each class were obtained from Melaka-Worker agricultural research center founded in the Afar region and SNNPR.

For this study, the researcher has used purposive or judgmental sampling techniques, selecting three infected and a healthy sample from the population, which is nonprobabilistic. During data collection, 2400 images of data are captured and distributed into four equal classes such as bacterial blight, healthy, leaf miner, and spider mite used to train with balanced dataset, as shown in Figure 2.

6.1. Cotton Images’ Sample Digitization

The data acquisition system in this research was used with regard to generate clear, unbiased, and simplified digital images of leaf in the cotton plant sample database for further analysis and processing. The aim was to provide the digitizing system with uniform lightning or balanced illumination. The images captured using a smartphone camera and digital camera are then transferred to a computer, displayed on a screen, and stored on the hard disk in the PNG format as digital color images.

6.2. Image Data Preprocessing

Inserting preprocessed images into a network is the first and basic task in all image processing projects. Common image preprocessing tasks in any image processing project are vectorization, normalization, image resizing, and image augmentation. In this research, these image preprocessing tasks are carried out before going to further deep learning processing using OpenCV library in python [18]. Data augmentation is also used to generate more training datasets from the real sets for data samplings.

6.3. Feature Extraction

Deep learning solves different shortcomes of machine learning feature extraction such as extracting features manually by using the best and robust technique called a CNN [19]. The layers are used to learn the knowledge. With the use of filtering mechanism the data are used to match and extract their values.

6.4. Dataset Partitioning and Model Selection Methodology

The used dataset partitioning technique is K-fold cross-validation which is partitioned as K values, where K + 1 have to be obtained for the upcoming divisions. For this research, the study researcher has assigned the K value as 10 because it is recommended for deep learning [8, 20]. Therefore, K = 10 means 10-fold cross-validation, so dividing the total dataset into 10. D = 2400/10 = 240 data for each fold are used. From this routine activity, 80% (2160 leaf images) yield the most appropriate performance which are trained and rest 20% (240 leaf image) are used for testing; thus, the system was validated.

6.5. Tool Selection

To collect cotton leaf images for this research, two image capturing devices were used such as a smartphone and digital camera. The proposed model was implemented using python version 3.7.3 for its usages. Also, the model is trained on the deep learning package called Keras, Version: 2.2.4-tf TensorFlow backed. TensorFlow, Version: 1.14.0 was recommended to adopt the proposed system. To evaluate the performance, many experimental setups were conducted with the help of a graphical user interface using Tkinter. From hardware, training and test was carried out on CPU instead of GPU.

6.6. Evaluation Techniques

To evaluate the routine of the structure, the researchers used various techniques in different periods, such as the developmental stage and at the end. First, the researchers evaluate the acquirements of the prototype using the confusion matrix and four evaluation metrics for confusion matrix reports such as F1-score, Precision, Recall, and Accuracy on the test dataset. Secondly, in this study for subjective evaluation, the researcher has used a questionnaire to measure the performance of a prototype by domain experts, as shown in Figures 3and 4. An objective evaluation has been made using the experimental analysis to test an artifact. Finally, the result of the evaluation depicts the practical applicability of the model.

7. Designing of Cotton Plant Disease and Pest Identification Model

The first task in this model designing is image acquisition from the field with digital camera and smartphone. Then, image preprocessing techniques were applied to prepare acquired images for further analysis. After this, preprocessed images were inserted into the CNN algorithm to feature extraction with neural network. Then, best-suited extractions to represent the image are extracted from the image using an image analysis technique. Based on the extracted features, the training and testing data that are used to identify are extracted. Finally, a trained knowledge base classifies a new image into its class of syndromes, as shown in Figure 5.

8. The Architecture of CNN for the Model

CNN architecture consists of two broad sections such as feature learning and classification section. In general, the cotton images feed into an input layer and end with an output layer. The hidden layer consists of different layers, as shown in Figure 6. Here, a cotton leaf and the output will be the class name of such an image also called the label of cotton leaf diseases or pests. In general, for this proposed architecture, each cotton leaf images with addition of neurons are augmented with considerable weights. Output of the augmentation process to the upcoming layers are processed and duplicated to next layer. Output layers show the prediction tasks for calculating neurons for this research.

9. Experimental Results

During experimentation, different experiments were undergone to get an efficient model by customizing various parameters that provided different results. Those parameters are dataset color, number of epochs, augmentation, optimizer, and dropout. According to Serawork Wallelign [19], augmented RGB colored images provided about 15% improvement on accurate than that of not augmented.

For this new model, the researcher has trained three different numbers of epochs such as 50, 100, and 150. However, the model achieved the best performance on 100 epochs, as shown in Figure 7. Nitish Srivastava [5] added a dropout in the CNN given additional performance (2.7%). Therefore, during the experiment, the researcher used 0.25 and 0.5 dropout percent in each layer and achieved the best performance in 0.5 dropout percent. Finally, a very important experiment was carried out on the regularization method that optimization algorithms’ usage could minimize the loss through iterations by updating means according to a gradient. From Figure 8, it is observed that the effects of numbers on epochs and regularization methods are identified. For this research, two most recent and used optimization algorithms are used such as RMSProp and Adam, but the Adam optimization algorithm reduces loss by 2.5%, as shown in Figure 9.

Researchers observed highest training accuracy at the 100th epoch as 0.990. The graphs show all the training and validation success rates that the network achieved during the process, as shown in Figure 9, and the loss graph is shown in Figure 10.

10. Results and Discussion

To analyze the performance of the model, the last result is achieved using parameters such as K-fold cross-validation using 10 folds. RGB-colored image dataset with augmentation provides 15% best performance for the model. The researchers used the transferred learning CNN model and the grayscale dataset achieved 98.6% accuracy [6]. However, color is the main and most decisive feature in cotton detection and classification; therefore, using a colored dataset takes a long time to train the model to add performance even if it is a complex layer. The number of epoch with 100 iterations and the Adam optimization method is very significant to boost the model performance by 10% and 5.2%, respectively. In the end, this developed CNN model achieves 98% of bacterial blight, 94% healthy, 97.6% of leaf minor, and 100% of spider mite, which are correctly classified. Additionally, the researcher has used different preprocessing techniques for noise removal. The main factors for the misclassification of the result exist between bacterial blight, healthy, and leaf miner. The overall performance of the model, as shown in the confusion matrix, is 96.4% accurate for diagnosis of leaf disease and pests of cotton plants.

11. Prototype Development and Evaluation

For the prototype, the researchers focused on the convention of the digital forensic investigation process, which is ISO and IEC to evaluate the prototype in terms of efficiency, effectiveness, fault tolerance, helpfulness, learn ability, and the control to assess the quality of the prototype. For the time being, the system prototypical test is carried out as a desktop application which is conducted with the help of Tkinter, a graphical user interface in Python programming language.

For the prototype, researchers focused on the convention of the digital forensic investigation process, which is ISO and IEC, to evaluate the prototype in terms of efficiency, effectiveness, fault tolerance, helpfulness, learn ability, and to control the assess quality of the prototype. For questioners, evaluators were allowed to rate the options as extremely satisfied, very satisfied, somewhat satisfied, not so satisfied, and not at all satisfied for five closed-ended questions and one open-ended question. The questionnaires are distributed to Ethiopian cotton farm experts, as shown in Figure 11. The data obtained from the farmers are recorded in Table 1 to evaluate the model. For questionnaire’s, evaluators were allowed to rate the options as extremely satisfied, very satisfied, somewhat satisfied, not so satisfied, and not at all satisfied for five closed-ended questions.

The overall performance of the cotton leaf disease and pest identification prototype evaluation selected by the evaluator was 60% extremely satisfied option for all questions and 20% of very satisfied and somewhat satisfied option. Also, for the open-ended question, almost all experts reflect constrictive thoughts on the overall performance of the system and prototype. So, this result shows that the prototype of cotton leaf diseases and pests was performed well in problem-solving ability and making a correct prediction is shown in Figure 12.

12. Conclusion

This deep learning-based model was implemented using Python and Keras package, and Jupyter was used as a development environment. Different experiments have been undergone in this research study to get an efficient model by customizing various parameters such as dataset color, number of epochs, augmentation, and regularization methods. RGB-colored image dataset with augmentation provided 15% best performance for the model. The numbers of epoch and regularization methods are very significant to boost the model performance by 10% and 5.2%, respectively. The proposed prototype has achieved the highest efficiency of 96.4% for identifying each class of leaf disease and pests in cotton plants. Developments of such automated systems are used to assist the farmers and experts to identify cotton disease and pests by leaf visual symptoms. Obtained results evidence that the designed system for the farmers are much helpful in order to reduce the complexity, time, and cost of diagnosing the leaves from any diseases.

13. Future Works

The main challenge while developing an object detection model on deep learning was to collect a large number of training high-quality images with different shapes, sizes, different backgrounds, light intensity, and orientations in different classes. Therefore, future researchers should try to include a solution for such challenges in their work and not only identify but also suggest remedies for diseases and pests. Ethiopia launched the satellite in 2019, and this is the best initiative for the future researcher to access remote-accessing high-resolution satellite images to train high-performance deep learning technique-based model.

Data Availability

During data collection, 3117 images of data are collected from those varied environments.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was funded by the Arba Minch University, affiliated with Ministry of Science and Higher Education, Ethiopia.