Abstract

The main purpose of the work is to evaluate the deep machine learning algorithms used for the distinction between weeds and crop plants using the open database of images of the carrot garden. Precision farming methods are highly prevalent in the agricultural environment and can embed intelligent methods in drones and ground vehicles for real-time operation. In this work, the accuracy of the weed and crop segment is analyzed using two different frameworks of deep learning for the semantic segment: the fully convolutional network and the ResNet. An open database with images of 40 plants and weeds was used for the case study. The results show a global accuracy of more than 90% in the verification package for both structures. In the second experiment, new FCN networks were trained to evaluate the impact of these processes on different image preprocessing and separation performance by different training/testing rates of the dataset.

1. Introduction

The greater global demand for food demands greater agricultural production from the rural environment. Advances in technology have helped to increase and improve the agribusiness. The field mechanization process of recent decades is an example of the attempt at large-scale production. However, there are several issues with this production being ecologically sustainable. The origin of invasive plants is considered natural in rural environments. However, it is necessary to act quickly to eliminate the presence of weeds before consuming the resources in the garden to prevent the decay of the cultivated plants [1]. In general, invasive plants or weeds are species that are born without human assistance; i.e., they occasionally grow in the soil and begin to compete for light and nutrients, which inhibits the growth of the crop. In addition, weeds spread rapidly and in large numbers, competing with other plants for space [2]. Depending on climatic conditions, soil type, and extent of attack [3], weeds can cause up to 70% loss in original plantings. They can cause significant losses in global food production, far more than the damage caused by harmful animals (arthropods, nematodes, rodents, birds, snails, and snails), pathogens (fungi and bacteria), and viruses [4].

The most commonly used method of weed removal in large areas is chemical control, which involves spraying herbicides all over the area to ensure that all weeds are removed [5]. The choice of an herbicide should be based on the physical and chemical properties of the plant species and material in the area to be treated. However, traditional spraying methods cause great wastage of these chemicals because of the lack of selectivity during use [6], which can be economically and ecologically harmful. Frequent and improper use of herbicides can cause soil pollution and, as a result, exposure to surface and groundwater, as well as to workers exposed to these chemical components (Majeed, 2018). For this reason, better spray methods should be developed and used in rural areas.

Calculation methods for plant identification are important for systems embedded in spot spraying on drones and ground vehicles. Methods of separation between plants and soil have been studied in detail, and some growers use it in the first stage of cultivation, which involves removing all weeds before planting begins. One of these methods is overgrowth [7], in which plant detection is based on the characteristic green color of chlorophyll, but it is only capable of separating soil and vegetation. However, as the plants grow, the weeds will grow together and harm the production. At this point, a method for species discrimination is needed. Several studies have been conducted to develop techniques for automatically differentiating weeds from the garden, aimed at localized spraying of herbicides. One of these techniques is based on the calculation of plant codes such as NDVI [8], which uses the reflection of wavelengths between red and infrared light. Other techniques include image analysis, which takes into account color and shape to detect weeds [9]. The most recent methods use artificial intelligence, where a neural network is trained to identify certain organisms [10]. However, there are various methods of artificial intelligence that can be studied and used to detect weeds. Each of these methods has advantages and disadvantages depending on the accuracy and computational performance. Therefore, two different machine learning techniques will be explored in this work: full transforming network (FCN) [11].

The main purpose of the work is to evaluate the deep machine learning algorithms used for the distinction between weeds and crop plants using the open database of images of the carrot garden. The database is provided in [12] and includes labels (labels) for evaluating the performance of the section of the algorithm.

2. Literature Survey

For the expansion of this research paper, a series of works in the fields of neural networks and artificial intelligence and precision agriculture were reviewed.

In [13], the authors propose an R-CNN network (regional-based convolutional network). Region-based predictions are converted into pixel predictions, which are classified according to the highest-scoring region that contains the pixel. R-CNN first extracts large-scale object proposals using selective search and then performs CNN functions. Finally, it classifies each area by SVMs (support vector machine). R-CNN can be generated from popular architectures such as AlexNet, VGG, GoogLeNet, and ResNet. FCN [11], a fully convolution network, performs pixel-by-pixel learning without partitioning. FCN is an extension of CNNs. FCN’s idea is to make the conventional network accept images of any size. Due to the control of the fully attached layer, CNNs only accept standard size images. However, FCNs only have convolutional layers and subsample functions. DeepLab [14] is a network created in response to the problem of previous deep convolutional networks, which added some inaccuracy to the object segment. DeepLab is a composite of a deeply convoluted neural network, a final fully integrated conditional random field (CRF) layer. DeepLab manages to create a section of such quality that enhances the state of the art at the time.

In the field of precision agriculture, the following tasks should be highlighted. Article [15] provides a computer vision system capable of distinguishing between weeds and crop plants under unrestricted light in real time. The system consists of two parts: a fast image processor that delivers results in real time (fast image processing) and a slower and more accurate processor (robust crop row detection) which is used to correct previous errors. This system achieves optimal results under a variety of conditions. It was tested in scenes of corn gardens. The database for discrimination between crop plants and weeds was proposed in [16]. The database contains 60 images with tags. Images were taken by an autonomous robot Ponyrope in the carrot field during development. The camera used captures visible light and infrared light. In [16], an approach with a computer vision to classify plants without division was proposed. Instead of dividing the image into individual leaves or plants, a random forest classifier was used to estimate the pixel positions of the plants and weeds based on the characteristics extracted from the neighborhood. The results are smoothed out in space using the Markov random field, and successive areas of plants and weeds are inferred at full resolution images by interpolation. In [17], the authors introduced a new method of using features called imaging in conjunction with advanced linear image representations for weed recognition. The proposed method was based on the modern state of the art to classify objects and images and explore improved performance using machine learning methods. The resulting system can be used in a variety of environments, crops, and weeds. Soybean seedlings [10] and related weeds are subject to research. In this work quoted, a model has been developed to identify herb weeds based on feature learning using -fish with convolutional networks. The proposed is a method of implementing feature learning as a pretraining process, changing the random start pattern of rotating network parameters. With this method, parameters with very reasonable values can be initiated and the trained web can identify weeds with greater accuracy. In [18], due to the difficulty in compiling a database, the authors attack the problem of scarcity of agricultural image databases and propose a system that minimizes the human intervention required to train classification methods and detection. The approach is to create artificial databases that generalize on key characteristics of the target environment (plant and weed species, soil types, and light conditions). More specifically, by adjusting the model parameters and exploring some real systems, we were able to provide realistic views of an artificial farming system with little effort. In [19], a classification system between plant and weed was proposed by a completely modified network with an encoder-encoder structure that incorporated spatial information considering image sequences. The authors gave an experimental assessment, showing that the proposed system can be neatly generalized to previously unseen fields under a variety of environmental conditions, which is a fundamental feature of a system that can truly be used in precision agriculture. In [20], the authors propose a solution based on rotating networks to the problem of semantic division of planting fields between beet plants, weeds, and soil based on RGB images. The proposed network provides real-time classification by exploring existing plant codes. A robot has been proposed [21] that is capable of detecting weeds and treating them in real time with locally used herbicides without harming the crop plants. The robot’s algorithm uses the SVM (supported vector machine) classification and the drop on demand (DoD) system to apply the herbicide.

3. Methodology

To create an automated weed detection using neural networks, the first step is to select the database that will be used to train the network. Since this is an image separation detector, the database must contain images and their labels. The tag of each image is the sequence that connects each pixel to a class. The training of neural networks for the image segment is through supervised learning, in which the tags fulfill the role of the teacher and summarizes the process of training and testing the neural network for this task.

From the preprocessed database, we divided the data into two sets: the training package and the test package. The training package is a part of the database that is used to train the supervised learning of the network. During the training phase, the network is shown in Figure 1 with RGB images and their corresponding labels as input and synaptic weights output.

The test package images are processed by a trained network so that it segments as shown in Figure 2. From network-separated images and test package tags, they are compared using measurements to measure the performance of a trained neurological network.

The use of deep learning approaches to image separation has increased in recent years. The image separation process involves the pixel-by-pixel classification of an image and the detection of objects of one or more types. From AlexNet, convolutional neural networks (CNN) are widely used for image classification and segmentation tasks. In this work, the results of the use of fully convertible networks (FCN) and SegNet will be presented.

3.1. Dataset

The database used in this work was taken from a rice farm in Shanmugavel farms, Thanjavur, Tamil Nadu. The database contains 40 images, all of which were captured by a 10-megapixel cell phone camera (3264 by 2448 pixels), as proposed in [12]. The database contains a total of 311,620,608 pixels, of which 26,616,081 are rice, 18,503,308 are pixels, and 266,501,219 are soil pixels. Images were taken at a distance of 1 m from the ground. The labels prepared by the authors were made from a section of pictures, based on the ExGExR code to separate plants from the soil. The separation between plant and weed was carried out by manual inspection. The whole number is reserved for “0” soil pixels, “1” for pixels, and “2” for rice plant pixels. Figure 3 shows the image/label pair. Because each pixel is represented by a byte, its value varies from 0 (dark black) to 255 (lighter white), and the pixels associated with each class are indistinguishable from each other. For visualization purposes, the tag is multiplied by 125 factors so that the soil appears black (0), the weeds gray (125), and the rice plants white (250).

Before actually training the neural network, it is necessary to process the database images in advance.

3.2. Tests

To test the method, two tests were performed. The first test was performed to compare the performance between the FCN and SegNet architectures for the database. Second, some modifications to the FCN architecture are proposed and implemented.

3.2.1. Test to Compare the Proposed Network Architectures

In this first pattern, the performance between ResNet and FCN frameworks for the image segment used in the database is compared. The database was initially divided into two sections as training and testing. The selection was made approximately, where 38.5% of the images formed the training set, while 61.5% created the test set. Therefore, 16 images and their tags were selected for the training stage, and the remaining 23 images were used to test the performance of the trained network segment. The preprocessing for this assessment is summarize. Images and their tags were changed from 3264 2446 to 800 606. By minimizing the images, the number of parameters to be trained is significantly reduced, so training for networks will take much less time. The ratio of dimensions is (width/height) from 1.3333 to 1.3158.

The implemented FCN network is based on the VGG-16 model of transmission training, that is, 16 layers with convolution and submodel, and ResNet based model implemented in Python. Both networks were trained over 600 epochs using the volume size of an image.

After the training, the test set is inferred on a test set of 24 images using trained networks. The separated images are compared with each other using tags and appropriate measurements. These tests are called test 1.

3.2.2. FCN Network with New Preprocessing

In this second pattern, two training sessions are performed using the same FCN network architecture used in the database [12]. The preprocessing performed on this test is similar to the previous test, but with some modifications. The dimension used for images and their labels is 800 650. They are cut into 25 equal parts of dimension 160 128. Then, image fragments are selected: if the clipping contains more than 10% plant pixels (plant/weed), it is used for training; otherwise, the clipping will be discarded.

This was done to analyze whether the focus on weed detection has a positive impact on target performance for weed and plant pixels. Table 1 shows the percentage of pixels in each exercise set. These tests are called test 2.

In the first exercise of this experiment, 16 images are used for training and 23 images are used for testing, using the same training and test sets as in the first experiment. From the results of this exercise, it is possible to analyze the impact of the examination performed by comparing it with the results of the FCN of the first test. In the second exercise, 26 images are used for training and 13 images for testing. Figure 4 shows the labels generated by the networks and the original label for an image of the test set.

4. Results

4.1. Test to Compare the Proposed Architectures

The approach used in this experiment yielded an overall accuracy of 94.2% for the trained FCN network and 93.12% for the trained ResNet network. Figure 5(a) shows one of the images in the test package, Figure 5(b) shows the corresponding label, and Figures 5(c) and 5(d) show the section made by the trained FCN and ResNet, respectively.

Confusion matrices are shown in Table 2. These results show that the percentage of pixels is assumed to be a class or incorrectly another class . For example, for the FCN network, 86.0442% of the estimated pixels are land pixels, and 0.8566% are incorrectly predicted to be plant. The values shown in bold correspond to the values correctly calculated.

As you can see from Table 2, although the FCN network generated somewhat higher overall accuracy, it correctly guessed the higher percentage of ResNets and plant pixels, while FCN correctly guessed the higher ground pixels. Because the database is unbalanced (representing more than 80% of the pixels in the database), global accuracy alone does not provide enough information to compare between network performance.

Therefore, accuracy (), recall (), and union on interference (IoU) performance measurements were used for better comparison. These measurements are widely used to measure the performance of an image segment.

where , , and are the number of true positives, false positives, and false negatives, respectively. For more detailed data, these parameters were calculated on a class basis because their average values are more affected by the division of the ground pixels because they are more numerous in the database, as mentioned earlier. In the performance of the FCN and ResNet structures for the soil, weed, and plant, segment is compared.

As shown, from the FCN network for weed and plant division is higher than from the ResNet network (89.2% vs. 78.0% plant pixels and 80.7% vs. 50.1% weed). Although the ResNet network correctly guessed the high pixel numbers of weeds and crop plants (as shown in Table 2), the FCN network misinterpreted the low pixel numbers of weeds and plants; i.e., the network FCN generated a small number of false positives. These two classes describe the high of the FCN network for plants and weeds.

On the other hand, the division made by the ResNet network is more reminiscent of weeds and plants (81.7% vs. 75.3% for plants, 54.8% for weeds 34.5%). Recall increases with the number of true positives but decreases with the number of false negatives. The FCN network mistakenly assumed a higher number of weed and plant pixels than the ResNet network as ground pixels, which affected the recall for both classes.

Ground pixels were more than 94% identical for both configuration, recall and IoU configurations. The main reason for this is that, as mentioned earlier, more than 80% of the database is made up of pixels on the ground. The number of false negatives and false positives was not significant compared to the number of true positives for both cases.

Both structures showed similar results in terms of overall accuracy. However, the main difference between the results of the two approaches was that the FCN-based model had a higher percentage of false positives for the crop and weed classes, and the ResNet-based model had a higher percentage of false positives for both of these classes.

4.2. FCN Network with Dataset

In Table 2, the segment performance of the three FCN architecture-trained networks is compared: FCN network trained in trial 1; FCN networks from test 2, one trained with 16 images and tested on 23; the other trained on 26 images and tested on 13. Table 3 shows the percentage of each segment of the three networks, similar to Table 2 in the previous section.

Comparing the numbers between FCN (16/23) of test 1 and FCN (16/23) of test 2, it can be seen that the latter made accurate predictions of the greater number of weeds and plant pixels: from 2.6119% to 3.5411%. Weeds are related by class and range from 7.8494% to 8.8297% compared to plant species. Overall accuracy increased: from 94.2% to 95.9%. Analyzing FCN (26/13) from experiment 2, the number of correctly predicted pixels of plants increased from 8.8297% to 10.7241%. However, the house pixels dropped from 3.5411% to 3.0267%, and the overall accuracy dropped from 95.9% to 95.5%. Figure 6 compares the performance of the three networks in terms of accuracy, recall, and interference over union measurements, as in Figure 5.

From Figure 6, the FCN (16/23) FCN (16/23) network shows significant improvement in recall and IoU scales for plant and weed pixels compared to FCN (16/23) network in test 1. Weeds increased from 33.15% to 54.54%, plant species from 76.23% to 82.63%, IoUs from 30.15% to 47.19% per class, and plant species from 68.12% to 75.27%. In terms of accuracy measurement, there is a slight decline for weed class (75.62% to 74.35%) and 89.36% maintenance for plant class. The changes in these values are caused only by the difference between the preprocessing of the two networks. Test 2’s FCN (26/13) network, compared to the FCN (16/23) network of the same test, shows a worsening of all measurements related to weed pixels (from 74.35% to 62.36% for metric accuracy and from 54.54% to 42.85% for IO 47.19% to 35.37%) and a slight decline in accuracy from 90.12% to 88.19%; recall improved from 82.63% to 87.29% and 76.39% for IoU.17% in relation to FCN (16/23) in FCN (16/23) (12). It is necessary to emphasize that the FCN (26/13) test package is different from the test package, although many measurements of the network are poor.

Tests 1, 5, 6, and 7 show the performance of the section according to the three measurements (accuracy, recall, and intersection over union) corresponding to each image in the test set for the three FCN-trained networks (FCN (15) and plant classes/24, FCN (16/23), and FCN (26/13) from test 2) (Table 4).

The index values 1, 2 and 3 are the measurements related to FCN (16/23) networks from test 2 to test 1, FCN (16/23) and FCN (26/13), respectively. is the average of these values for the 24 images in the test set. is the average of the last 12 images typical for two sets of tests on three trained FCN networks. Table 5 shows that on average, the recall and cross-section (84.12% to 86.57% for R and 74.98% to 76.95% for R) of the last 12 images () are average (IoU), and there is a slight decrease in for plant class (87.78% to 87.63%).

Tables 6 and 7 compare the segmentation performances of the three nets for weed pixels. It is observed in these two tables that the metrics for images with less than 3% of weed pixels (images 19, 20, and 21) have the lowest segmentation performance values for the three metrics, precisely due to the low number of weed pixels to be segmented by the nets, which in itself increases the probability of error.

As mentioned above, the FCN (26/13) network of test 2 is inferior when analyzing the networks in all of their test packages according to the three measurements associated with the FCN (26/13) network of the same test. On the other hand, by analyzing Table 5, the FCN (26/13) performance of test 2 on average () of the last 12 images is higher by three scales and FCN (16/23) from the same estimate (accuracy: 62.63% to 62.87%, memory: 39.72% to 43.48%, and IoU: 33.56% to 35.38%). In [12], the work presented in the database is discussed in this talk; the authors present the preliminary section results. They use class accuracy (CA)—the average in all classes of the number of correct predictions for a particular class, divided by the actual number of samples in that class—to measure their results. Six ResNet networks were trained with training/testing ratios of 1/38 and 38/1, using 3 color schemes: RGB, HSV, and Lab. Table 8 provides a comparison between the results.

The FCN2 nets refer to the two nets trained in the second test. Both FCN2 networks have higher mean AC compared to networks trained according to the approach of [12].

5. Conclusion

Precision farming methods are highly prevalent in the agricultural environment and can embed intelligent methods in drones and ground vehicles for real-time operation. In this work, two deep learning frameworks, FCN and ResNet, were used and evaluated for the weed and crop segment. An open database with images of 40 plants and weeds was used for the case study. The results show a global accuracy of more than 90% in the verification package for both structures. In the second experiment, new FCN networks were trained to evaluate the impact of these processes on different image preprocessing and separation performance by different training/testing rates of the dataset.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

There is no conflict of interest.