Research Article | Open Access
Robust Grape Detector Based on SVMs and HOG Features
Detection of grapes in real-life images is a serious task solved by researchers dealing with precision viticulture. In the case of white wine varieties, grape detectors based on SVMs classifiers, in combination with a HOG descriptor, have proven to be very efficient. Simplified versions of the detectors seem to be the best solution for practical applications. They offer the best known performance versus time-complexity ratio. As our research showed, a conversion of RGB images to grayscale format, which is implemented at an image preprocessing level, is ideal means for further improvement of performance of the detectors. In order to enhance the ratio, we explored relevance of the conversion in a context of a detector potential sensitivity to a rotation of berries. For this purpose, we proposed a modification of the conversion, and we designed an appropriate method for a tuning of such modified detectors. To evaluate the effect of the new parameter space on their performance, we developed a specialized visualization method. In order to provide accurate results, we formed new datasets for both tuning and evaluation of the detectors. Our effort resulted in a robust grape detector which is less sensitive to image distortion.
Detection of grapes in real-life images is a serious task solved by many researchers dealing with precision viticulture . Grape detectors are employed in various applications, for example, in autonomous vineyard sprayers and harvesters or in the process of yield estimation [2–5]. Various types of image processing, feature extraction, and classification algorithms can be employed when detecting berries or bunches of grapes in RGB images.
A bunch detector designed by Reis et al.  employs colour mapping, morphological dilation, and stem detection. It showed correct white wine bunch classification at 90.53% and red wine at 97.14%, demonstrating an increase in complexity of detection depending on grape colour. Thus, let us focus only on solutions aimed at white variety detection.
A detector introduced by Berenstein et al.  was based on the decision tree algorithm and was applicable for both the bunch and the berry detection. Its detection rate of bunches was 90.45%, and the detection rate of single grapes was 90.10%. An exceptional single grape detector was developed by Nuske et al. . It showed overall precision at 98.00% but recall was lower at 63.70%. Their detector utilized radial symmetry transformation, Gabor filters, and a -nearest neighbours classifier.
A comparable method , aimed at detection of white wine varieties, considered support vector machines (SVMs) classifiers in combination with histograms of oriented gradients (HOG). Specifically, its average accuracy by a 10-fold cross-validation (CV) was 98.23% for a linear kernel and 98.96% for a radial basis function (RBF) kernel. Similar results were achieved in their evaluation on real-life images .
While excellent, these detectors are computationally intensive to an extent that makes them impractical in viticulture applications. Modification of an image preprocessing (IP), where an input RGB image is first converted to a grayscale format and then adjusted with a linear contrast normalization, can be used to reduce the time complexity of the original solutions . The impact of the normalization step was shown to be negligible in the grape detectors [8, 9], but appropriateness of the grayscale conversion remains to be tested.
The evaluation of the detectors requires sets of labelled object images. These sets might significantly influence evaluation results. Škrabánek et al. [7, 9] discovered inconsistency in the performance of the detectors that depended on the evaluation set. Although the detectors have a high score in the 10-fold CV and an evaluation on real-life images, they do not achieve such results on test sets . The test sets contained distorted positive samples, where 75% of the positive samples in each test set were artificial samples created by a rotation of original samples. The detectors were tested on two categories of test sets. The poorer grape detection results were present for both categories. For example, the average accuracy of the original detector with the linear kernel was as low as 87.85% in a test set .
Here, we address both the relevance of the grayscale conversion and a potential sensitivity of the detectors to the image rotation. We test a modification of the conversion and propose a parameter tuning method, designed for such modified grape detectors. To evaluate the effect of the new parameter space on their performance, we developed a specialized method for a visualization of the evaluation results. We applied the modified conversion and the tuning method into grape detectors developed by Škrabánek and Majerík , and we evaluated performance of these modified versions. In order to provide accurate results, we formed new datasets for both tuning and evaluation of the detectors. Our improvements resulted in a robust grape detector which is less sensitive to image distortion.
2. Materials and Methods
2.1. Original Work on Grape Detectors
2.1.1. Objective of the Detectors
The grape detectors were aimed at the recognition of grapes of white varieties in object images. RGB object images of dimensions px (pixel) were considered in [7–9]. The object images were square viewports of source images of a resolution px, 24 bit.
The detectors distinguished between two classes : “berry” and “not berry.” The class “berry” was called “positive” and the class “not berry” was called “negative.” Object images belonging to the class “positive” contained berries of the circle shape of a diameter ranging between 30 and 40 px. Moreover, the middle of the berries was required to be placed in the middle of the object images with a tolerance px. Object images, which did not satisfy this condition, belonged to the class “negative.”
2.1.2. Structure of the Detectors
In computer vision, detection of objects in images usually consists of four successive steps. The first step is acquiring an object image from a large real-life image; the second step is the IP resulting in a modified image ; the third one is extraction of features; and the final step is classification of the object image using a feature vector . However, the grape detectors introduced in [7, 9] consist specifically of three parts only: from the IP, a features descriptor, and a classifier. Although the detectors differ in structure of the IP, used feature descriptors, and settings of classifiers, they have the same arrangement of a vision pipeline (Figure 1). Parts of the detectors are described in the context of our previous works in further details.
Image Preprocessing. IPs of the original detectors, labelled as , consist of two steps: the conversion of input RGB images to grayscale format followed by a linear contrast normalization on the range . Simplified versions, which were introduced in , skip either the contrast normalization (simplified version 1 or ) or both operations (simplified version 2 or ). In and , the conversion is carried out according to ITU-R recommendation BT.601 . This conversion belongs to techniques based on weighted means of all three colour channels . It means that the conversion of the input RGB image from the RGB model to the grayscale format is realized by eliminating the hue and saturation information, while retaining the luminance.
Features Extraction. Two types of features, a vector of normalized pixel intensities and HOG, were considered in ; however, only the HOG features have proven to be convenient for the detection of grapes of white varieties [7–9]. Thus, only the detectors based on the HOG features will be considered further. It means that the feature vectors are extracted from using the HOG descriptor . A standard setting of the descriptor has demonstrated to be sufficient. Specifically, a linear gradient voting into 9 bins in 0°–180°; cells of size px; blocks of cells; and 1 overlapping cell between adjacent blocks in both directions were used in [7–9].
Classifier. The aim of classifiers in the grape detectors is a judging the classes of the object images using feature vectors . SVM classifiers  with the linear and the RBF kernel functions were used in [7, 9]. Regardless of the used kernel function, the performance of a SVM classifier is influenced by a regularization constant . Performance of a classifier with the RBF kernel is further influenced by a kernel width . A grid search algorithm  combined with the 10-fold CV was used to find settings of these parameters giving the maximal recognition accuracy .
2.1.3. Training and Evaluation of the Detectors
Five training sets were introduced in . The th training set was denoted as T-, where , and . Each training set consisted of 288 unique “positive” and 288 unique “negative” samples. The training set T-3 was used for the training of the detectors [7–9].
Three kinds of evaluation methods were considered by the evaluation of the original detectors O [7, 8]: the 10-fold CV, an evaluation on test sets, and an evaluation on cut-outs of one vineyard photo. The simplified detectors and were evaluated using the test sets  and the cut-outs . In all cases, three performance measures were used for the evaluation:where is the number of correctly classified “positive” samples, is the number of misclassified “positive” samples, is the number of misclassified “negative” samples, and is the number of correctly classified “negative” samples . Naturally, for the cut-outs, unbiased variants of the measures have been used .
While just the training sets were needed for the 10-fold CV, appropriate datasets must be created for the remaining two evaluation methods. Creation of these datasets was sufficiently described in ; however, datasets used for the evaluation on test sets should be detailed here. For this purpose, two types of datasets were created: an environment type E and a grape type G. Five sets of each type were formed . The th test set of the type E was denoted as E- and the th test set of the type G as G-, where .
Each test set consisted of “positive” and “negative” samples. The sets were based on one vineyard row photo which was not used for creation of the training sets. To form a single test set, unique “positive” and unique “negative” samples were used. Each test set was further extended by artificial “positive” samples . The “positive” samples were created by turning of the images through an angle , where .
The difference between these two types of test sets consisted in the selection of the “negative” samples. The “negative” samples in G were composed solely of incomplete grape berries of a diameter ranging between and px, while the “negative” samples in E were based on the environment only, and they did not capture even the smallest piece of the targeted berry. Examples of “positive” samples as well as of both types of “negative” samples are shown in Figure 2.
2.2. Robust Grape Detector
While the original, as well as the simplified, grape detectors showed excellent performance by the 10-fold CV  and by the evaluation on the cut-outs , considerably worse results were obtained by their evaluation on the test sets [7, 9]. For example, the original version of the detector with the linear kernel reached the following average score by the 10-fold CV: , , and . The following average results were obtained for the same detector on the test sets of the type E: , , and . Similar results were obtained on both types of test sets for all versions of detectors.
The drop in accuracy and recall might be caused by different reasons. Generally, the most common source of a discrepancy in evaluation results is an inadequacy of datasets used by the evaluation. In the case of the grape detectors, a distortion of positive samples caused by the rotation might be considered as well. Indeed, the HOG features are not rotation invariant . Since a potential sensitivity of the detectors on the distortion caused by the rotation of images is undesirable, we started a research aimed on enhancement of the detector’s robustness.
Two principally different means might help to improve the detector’s performance in the presence of a distortion: either a more appropriate training set could be used for the training, or a modification in the vision pipeline could be done. In our research, we focused on the modification of the pipeline (Section 2.2.1). Unfortunately, three additional tenable parameters were introduced by the presented modification, which considerably complicated the search for the optimal setting of the modified detector. In order to face this disadvantage, we developed a specialized tuning methodology (Section 2.2.3). The methodology was based on a visualization method which we designed for this purpose (Section 2.2.2).
2.2.1. Modification of the Detector
Although the selection of a good feature vector is widely recognized to be fundamental when designing image recognition systems, the IP may also significantly influence the performance of a final solution. Thus, an intelligent use of the IP can provide benefits and solve problems that ultimately lead to better local and global feature detection .
IPs of the original grape detectors consisted of two operations: the conversion of RGB images to the grayscale format and the linear contrast normalization. Our later research has positively shown that the skipping of the contrast normalization does not influence the performance of the detectors at all . Nonetheless, the relevance of the conversion in the detectors was not disproved.
Importance of the conversion of RGB images to the grayscale format, in connection with the image recognition systems, was studied, for example, by Kanan and Cottrell . They have shown that a method used by the conversion of colour images to the grayscale format may significantly influence performance of the image recognition system, even when using robust descriptors. Further, they have pointed out the fact that different tasks involve different conversion methods. For object recognition tasks, they recommend techniques based on weighted means of the red, green, and blue image channels.
Considering their outcomes, we proposed to use a general formula for the conversion based on the weighted means. The main motivation for its use is the fact that this formula allows better control of the conversion, which consequently may allow us to improve the performance of the detectors. The general formula can be written as where , , and are intensity images of the red, green, and blue components of the RGB image ; and , , and are weights of the colour components in the resulting grayscale image . It holds that and . Since conversion (2) is the single operation performed within IPs of the modified detectors, it holds that .
We modified the simplified version of the grape detector using the general formula (2); that is, the standard conversion was replaced by this formula. The rest of the vision pipeline was not changed; that is, the new detector consisted of the HOG descriptor and the SVM classifier. We named the new detector as a robust grape detector or simply R.
Considering the essence of the proposed modification, it is apparent that the modification does not increase the computational complexity of the detector. Inherently, it may not cause a worsening of its performance. Indeed, the setting of the weighting coefficients according to the ITU-R recommendation BT.601 is just one of an infinite number of possible settings.
2.2.2. Performance Visualization for Better Understanding of Tuning Parameter Relevance
While searching for the optimal setting of an image recognition system, an appropriate variant of the grid search algorithm is typically used. For just one or two tuneable parameters, meaningfulness of evaluation results can be assessed on the basis of the raw data directly by a specialist. However, the assessment is becoming challenging when increasing the number of tuneable parameters.
Considering the parameters of the HOG descriptor to be nontuneable, the original and the simplified versions of the detectors have only one or two tuneable parameters (depending on the used kernel function). The fixation of the HOG descriptor setting has its origin in practical reasons. The setting of the HOG descriptor, which was summarized in Section 2.1.2, is appropriate in terms of exactitude versus computational complexity. Thus, this fixed setting is used for the robust detectors as well.
The original tuneable parameters, and , were extended by three additional parameters, , , and , in the case of the robust grape detectors. These parameters were introduced by formula (2). Thus, the tuning of a robust grape detector setting using a grid search algorithm results in a large amount of data. An analysis of such quantities of data in the raw numeric form is really not convenient for humans. In order to facilitate the analysis, we suggested a ternary diagram for a visualization of the data. Specifically, we used the diagram to show a prospective influence of the weighting coefficients, , , and , on the performance of an overall solution for a fixed setting of the remaining tuneable parameters.
The ternary diagram is a graph which consists of an equilateral triangle in which a given plotted point represents the relative proportions of three end-members (, , and ), usually expressed as percentages (do not confuse the end-member with the regularization constant ). Moreover, the sum of the relative proportions is equal to a given value; for example, for percentages, it holds that . The axis related to the member is the left arm of the triangle. The relative proportion of , , is plotted on the axis where increases downwards. The same principle is used for the remaining two axes where the bottom axis is related to component and the right one to component . The relative proportion of increases in the right direction and the relative proportion of upwards. A dependent variable can be represented in different ways, for example, using contour plot or shaded surface. For more information, see .
In our case, the three end-members, , , and , are the intensity images of the colours, , and , respectively. The relative proportions, , and , are the weights, , and , where . Moreover, the weights are bounded by the condition . The ternary diagram is aimed to be used as a supporting tool by the evaluation of the grape detectors. Thus, a performance measure is the dependent variable to be displayed. Considering our previous experience, we suggested to use the shaded surface for the visualization of the results. This type of diagram gives the better idea about the influence of the weights on the performance measure.
Let us denote a setting of the weighting coefficients as an ordered triple of the weights ; that is, . Further, let us form a finite set of settings for the purpose of the graph construction, and let us call the set grid. The grid should uniformly cover the surface bounded by the triangle. It means that a step , of a fixed size, has to be used by forming the grid. Let us express the step as , where , . Thus, the weighting coefficients can take any value from ; however, a combination of the coefficients in is bounded by the condition . It means that the grid is a set of all admissible settings .
In order to construct the ternary diagram, evaluation of a detector using a measure has to be achieved for . Naturally, the classifier has to be trained for each of these settings separately. Settings of the classifier ( for linear kernel and for RBF kernel) must not be changed within the training-evaluation process. Once the training-evaluation process is performed for all the admissible settings , construction of one diagram can be executed, where the diagram shows dependence of the used performance measure on the weights , , and for one particular setting of or . An example of the diagram is shown in Figure 3 where the recall of a grape detector is displayed.
Figure 3 can be used also for explanation of how to work with the diagram. Let us suppose that the recall for , , and is required to be determined. The reading of the recall value can be done using auxiliary lines which are plotted using dashed lines in Figure 3. Each line is parallel with one of the sides of the triangle. The lines are named with respect to referred components; that is, “ line” is related to the intensity images of the red colour , “ line” to the intensity images of the green colour , and “ line” to the intensity images of the blue colour .
Positions of the lines are given by the weights . In this example, “ line” passes the left axis at the point 0.1, “ line” passes the bottom axis at the point 0.2, and “ line” passes the right axis at the point 0.7. The intersection of the lines positively determines the recall. The numeric value can be estimated using the colour bar. In this example, the recall is approximately 0.78. It might be noted here that only two auxiliary lines are necessary for reading of the dependent variable. Indeed, the proportion of a third component is always positively determined by the constraint of their sum, in this case by .
2.2.3. Tuning Methodology
The goal of our research was the development of a grape detector which would be invariant to the distortion caused by the rotation. One of the key steps within the development process is finding the setting of all tuneable parameters giving the best performance according to a criterion. The search for the optimal setting is usually executed using a variant of the grid search algorithm .
The grid search algorithms ensure a systematic evaluation of an image recognition system performance. Usually, the grid search is combined with the CV. In such a case, the evaluated system is trained on a part of a dataset and evaluated on the rest of the dataset. The training is carried out for various settings of all tuneable parameters. The settings might be assigned according to a rule or directly defined by an expert. The setting giving the best score is considered to be optimal. In order to obtain more accurate results, the search can be performed repeatedly. The search is then carried out with a finer resolution in a scaled down area. An area promising best results is selected for the finer search.
In order to develop a methodology aimed at tuning of the robust detectors, we adopted the basic principles of the grid search algorithms. Specifically, the training-evaluation process is carried out for various settings of all tuneable parameters within the search process. However, our methodology requires involvement of a computer vision specialist. Further, the evaluation of the image recognition system should be performed on a dataset affected by the target distortion. For that reason, the commonly used CV was replaced by the evaluation of the system on a specialized dataset.
In order to ensure flexibility of the method, we prepared the methodology for a multiple criteria usage. The proposed methodology allows combination of several performance measures when searching for the optimal setting. In the case of the multiple criteria, a priority of the performance measures has to be determined in advance. For the detector with the RBF kernel, the methodology consists of following steps:(1)Select performance measures and give them priorities. In such a way, a finite set of measures is created where each measure is paired with a priority.(2)Define admissible settings of all tuneable parameters , , and , where is a finite set of all admissible settings of , and is a finite set of all admissible settings of . Such a way, a parameter space of features is formed, where , and .(3)Perform the training-evaluation process using .(4)Display the obtained results using the diagram. Such a way, graphs are obtained, where denotes a cardinality of a set.(5)Manually evaluate the obtained results using the graphs. Identify combinations of and leading to senseless results. Eliminate all settings containing the offending combinations of and from the further processing; that is, a new parameter space is formed where .(6)For each performance measure , find the setting giving the best score according to(7)Determine a globally optimal setting on the basis of all . Within this step, the priority of the measures must be taken into account; however, the functional dependence shown in the appropriate graphs must be considered as well.
This methodology can be also applied on the detector with the linear kernel. Naturally, the variable should be ignored in this case.
2.3. Design of Evaluation Experiments
In the experimental part, we evaluated relevance of the grayscale conversion and the potential sensitivity of the detectors to the image rotation. For this purpose, new statistically relevant datasets were created.
2.3.1. Assessment of Conversion Importance
The assessment of the conversion relevance was one of the main goals of the presented work. Two versions of the conversion were considered in the grape detectors. While the grape detectors employ the standard conversion according to the ITU-R recommendation BT.601, the robust detectors R are based on the generalized conversion (2). The detectors do not perform any conversion within the IP.
Performances of all three variants of the detectors have to be confronted concerning the assessment of the conversion relevance. In order to keep comparability of the results, the detectors should be tuned the same way. Thus, when tuning and , the methodology presented in Section 2.2.3 should be used in appropriately modified form. It means that steps(4) and (5) have to be left out when searching for their settings.
2.3.2. Dataset for Tuning
The proposed tuning methodology requires one training and one evaluation set. In order to keep continuity with our previous research, the training set T-3 was used in the training phase. Within the evaluation phase, a specialized (tuning) dataset should be used. The tuning set should be large enough and it should be affected by the target distortion of the “positive” samples. Further, both types of the “negative” samples, E and G (Section 2.1.3), should be equally represented in the set. Searching for the optimal setting on such datasets may guarantee finding a setting which would be appropriate to given requirements on the robust detector.
For this purpose, a tuning set on ten photos was created. These photos were captured under the condition specified in . They were captured at six different locations. None of these photos were used while forming the training set. The tuning set consisted of labelled RGB object images of size px. The labelled object images were created from the photos using an editor . To create the dataset, unique “positive” and unique “negative” samples were gathered. The set was extended using artificial “positive” samples created by turning the images through the angle ; that is, it consists of “positive” and “negative” samples.
2.3.3. Datasets for Evaluation
In Section 2.1.3, the evaluation test sets, E and G, were mentioned. Each of these sets consisted of only 400 samples, which seems to be insufficient for a credible assessment of the meaning of the conversion. In order to get meaningful results, we formed new expanded test sets. These sets were created in the same spirit as the original test sets. Continuity of the marking was also maintained; that is, expanded test sets of the environment type were labelled as EX and expanded test sets of the grape type as GX. Two expanded test sets of each type were formed. In addition, a new type of test set was introduced. This set was labelled as expanded standard test set or SX. The set was not affected by the distortion.
All the expanded test sets comprised labelled RGB object images of the size px. The labelled object images were created from vineyard row photos using the editor . The sets EX and GX were based on a collection of ten unique vineyard row photos. Ten different photos were used for SX creation. The photos did not match with the photos used when creating the training and the tuning set. The photos were captured at six different locations under the conditions specified in .
An expanded test set, either EX or GX, consisted of unique “positive” and unique “negative” samples. The sets were extended using artificial “positive” samples created by turning the images through the angle ; that is, they consisted of “positive” and “negative” samples. The selection of the “positive” and of the “negative” samples followed the criteria stated in Section 2.1.3. Sets EX- and GX- with the same index shared the same collection of “positive” samples. The standard set SX consisted of unique “positive” and unique “negative” samples.
2.3.4. Inquire into the Sensitivity to the Rotation
The potential sensitivity of the detectors to the image rotation was the second issue to be investigated. Suspicion on the sensitivity came from the disproportion between the results acquired by the 10-fold CV or by the evaluation on the cut-outs and the results obtained by the evaluation on the test sets. The new expanded dataset, EX, GX, and SX, allowed us a detailed exploration of this issue. A comparison of evaluation results obtained on the expanded dataset with evaluation results obtained on the original datasets is the key. In order to get comparable results, the detectors , , and R tuned according to the proposed methodology, must be evaluated also on the sets E and G.
A high sensitivity of an assessed detector to the rotation would be noticeable from a comparison of evaluation results obtained on EX (GX) with evaluation results obtained on E (G). Similarity of these results would indicate the high sensitivity. Worse evaluation results obtained on EX (GX), rather than on E (G), would also confirm the high sensitivity. A low sensitivity of the detector would be visible from the comparison of evaluation results obtained on EX and GX with evaluation results obtained on SX. A worse performance on EX or GX, rather than on SX, would confirm the low sensitivity. All other results would signify its insensitivity to the rotation. In order to eliminate a potential influence of the proposed tuning methodology on the evaluation results, the new results obtained for E and G were compared with the original results.
3. Results and Discussion
3.1. Optimal Settings of Detectors
Optimal settings of the detectors were determined using the presented methodology. In order to keep the continuity of our work, the performance measures (1a), (1b), and (1c) were used for the evaluation of the detector performance. It means that . The accuracy was used for the tuning of all former versions of the detectors. For this reason, the accuracy was chosen as the primary measure with the highest priority. Detection of all grapes in a photo is essential for applications such as yield estimation. Thus, the recall was taken as the secondary measure; and consequently, the precision was considered to be the tertiary one.
Depending on the version, the detectors have up to five tuning parameters. While the weighting coefficients are bounded by the conditions and , the regularization constant , likewise the kernel width , must be positive. Based on our previous experience, the following settings of the parameters were used within the search process: , , and . In such a way, sets of all admissible settings were formed.
While the search for the optimal setting of and according to the proposed methodology does not require any intervention of human, the search for the optimal setting of R cannot be performed without the computer vision expert.
3.1.1. Optimal Setting of Robust Detector with RBF Kernel
On the basis of data obtained within the training-evaluation process, 72 graphs were created. On the basis of their mutual comparison, we discovered that has much higher influence on the performance measured by accuracy and recall than . Further, we found that both and influence the performance only slightly from the perspective of precision. The main trends captured by the graphs can be demonstrated on graphs obtained for an arbitrary chosen setting of and . For this purpose, we chose results obtained for these values of and . The results are shown for all three measures in Figures 4–8.
The analysis of the 72 diagrams pointed out an abnormality in the results obtained for and . The abnormality is clearly visible in Figure 4, where the performance of the robust detector with the RBF kernel is shown for and . For example, accuracy (Figure 4(a)) is 0.5 for the majority of the settings ; nevertheless, excellent scores were achieved for some of them. It is apparent that a very small change in the setting of the weighting coefficients would lead to a drastic drop in accuracy. The diagrams for the other measures (Figures 4(b) and 4(c)) show similar discrepancies. Very similar graphs were obtained for in combination with .
The seriousness of the discrepancies is even more apparent when comparing graphs obtained for with graphs obtained for higher values of . Graphs obtained for and (Figures 5–8) show consistent trends for all performance measures. Such trends were obtained for and . It is clear that cannot guarantee robustness of the final solution. Thus, all evaluation results, obtained for , were eliminated from the further processing; that is, . On the basis of formula (3), results summarized in Table 1 were obtained.
Diverse results were obtained for the measures . Although the measures have predefined priorities, meaningfulness of the results should be always considered before the globally optimal setting is determined. It is apparent from the graphs (Figures 5–8) that the parameters had negligible influence on precision. When comparing all obtained graphs, we found that also hardly influenced precision. Thus, precision was abandoned within the final assessment.
According to the remaining two measures, the setting providing the best performance was and . However, the obtained results did not allow the direct determination of the optimal value of . Since accuracy had the higher priority, the optimal setting of was determined according to this measure. As is apparent in Figures 7 and 8, there was only insignificant difference in the performance measured using recall for and . Thus, we selected to be the globally optimal setting for the robust detector with the RBF kernel.
3.1.2. Optimal Setting of Robust Detector with Linear Kernel
The analysis of the data obtained within the training-evaluation process was much simpler in the case of the robust detector with the linear kernel. At first, only 12 graphs were created on the basis of the obtained data. Secondly, the regularization constant did not influence the performance of the detector; that is, identical graphs were obtained for . Thus, for the next explanation, we selected results obtained for (Figure 9).
It is apparent in the graphs (Figure 9) that there was no anomaly in the evaluation results, and the detector reached high scores for all used measures. It means that . According to formula (3), settings summarized in Table 2 were determined to be optimal for .
As is apparent in Figure 9, the weighting coefficients had almost no influence on precision. Thus, just accuracy and recall were taken into account when the final decision was made. Since identical optimal settings were obtained for both measures, we can positively recommend the setting as the globally optimal setting for the robust detector with the linear kernel. However, we should point out here that any might be used as well.
3.1.3. Summary of Optimal Settings
Herein, we provide a summary of optimal settings for various versions of the detectors. The summary can be found in Table 3.
3.2. Evaluation of Detectors
In order to investigate the opened issues, evaluations of the detectors on the expanded as well as on the original datasets were done. The results obtained on the expanded sets, EX, GX and SX, are summarized in Table 4. The results obtained on the original sets, E and G, are stated in Tables 4 and 5, respectively.
3.3. Discussion of Evaluation Results
Two main issues were opened in this article. Our findings are presented in following text.
3.3.1. Assessment of Conversion Importance
The importance of the conversion in the grape detectors was assessed on the basis of results obtained on EX, GX, and SX (Table 4). Let us consider the results obtained for the detectors with the linear kernel at first. From comparison of and with , it follows that the standard conversion according to the ITU-R recommendation, as well as the generalized conversion according to (2), enhanced scores obtained using accuracy and recall on EX and GX. The improvement is more evident for recall. Precision was slightly better for for these datasets; however, a small downturn in precision was registered for . Considering the significant improvement in recall, the conversions seemed to be valuable parts of the detectors. However, the results obtained on SX did not confirm these outcomes. Thus, using of a conversion for detectors with the linear kernel cannot be positively recommended.
In the case of the detectors with the RBF kernel, the results spoke definitely on behalf of the conversions. The detectors and outperformed the detector in accuracy, and especially in recall, on all the expanded sets, that is, on EX, GX, and SX. Within all the experiments, the changes in precision were marginal. Let us focus now only on the detectors and . From the results, it is apparent that always outperformed in accuracy and recall. We considered the downturn of precision of to be marginal. Indeed, it never fell below 0.9800. Thus, from the perspective of the performance, the robust detector with the RBF kernel can be positively recommended as the most reliable solution. With , , and , the robust grape detector with the RBF kernel is fully comparable with the state-of-the-art solutions aimed at the detection of single grapes of white varieties.
3.3.2. Inquire into the Sensitivity to the Rotation
As the first step, we compared the results obtained for the detectors and (Tables 5 and 6), tuned according to the proposed methodology, with the original results . We found that the new results are almost identical to the original ones. We came to the conclusion that the tuning methodology did not influence the results significantly.
In the second step, we compared the results achieved by the detectors on type matching sets; that is, results obtained on E (G) were compared with results obtained on EX (GX). We found that all detectors tuned according to the proposed methodology achieved significantly better results on the expanded sets EX and GX (Table 4), rather than on the original sets E (Table 5) and G (Table 6). It is evident that the poor performance of the detectors, which was reported in [7, 9], was mainly due to the inappropriateness of the sets E and G. Thus, we recommended using exclusively the new expanded datasets for the evaluation.
In the third step, we compared the results achieved by the detectors on the expanded sets (Table 4). For both types of the kernels, we observed a connection between the performance of the detectors and the grayscale conversion. While the detectors (without the conversion) had considerably lower recall on EX and GX than on SX, the robust detectors reached almost identical results on all these sets. The detectors showed results on the borderline between and R. Thus, we came to the conclusion that the detectors are sensitive to the image rotation; however, the sensitivity can be suppressed by the grayscale conversion. We further found that the robust detector with the RBF kernel is almost resistant to this distortion.
The grape detectors based on SVMs classifiers and HOG features were appropriate solutions for detection of single grapes of white varieties, supported by excellent results by the 10-fold CV and the evaluation on the real-life images. However, results obtained by the evaluation on the test sets prompted a thorough examination of the detector performance for a confirmation of its expected merits. Our results showed that the grayscale conversion should be excluded when the SVM classifier with the linear kernel is used in combination with the HOG features. Using the linear kernel and skipping the entire image preprocessing ensure a low time complexity of the final solution, while keeping excellent performance on standard datasets. Such solution might be used in applications where worse performance under unfavourable conditions is not critical, for example, in autonomous vineyard sprayers.
The robust grape detector with the RBF kernel is fully comparable with the state-of-the-art solutions aimed at detection of single grapes of white varieties. The detector has greater time complexity than a detector without the grayscale conversion, but this disadvantage is counterbalanced by its excellent performance. Since the robust grape detector with the RBF kernel provides excellent results under standard conditions, as well as under unfavourable conditions, we recommended its usage in applications where the high accuracy and recall are essential, for example, for the yield estimation.
In the presented application, the modification of the grayscale conversion proved to be valuable. We believe that its usage is not limited to the grape detection. The modification, together with the tool set for parameter optimization, as well as the novel visualization method introduced in this contribution, might be used in various application areas.
The research article is an extended version of a conference paper which was presented at the Proceedings of the 5th Computer Science Online Conference 2016.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The work has been supported by EEA and Norway Grants (no. EHP-CZ07-INP-2-0942014). This support is gratefully acknowledged. The authors would like to offer special thanks to Víno Sýkora s.r.o. company which enabled them to perform experiments in its vineyards. The authors also want to express their thanks to the government of India for the support of their studies about image processing at the Indian Institute of Remote Sensing in Dehradun via the ITEC Scholarship. The knowledge gained within the studies was very helpful for solving the problem described in the article.
- J. Arnó, J. A. Martínez-Casasnovas, M. Ribes-Dasi, and J. R. Rosell, “Review. Precision viticulture. Research topics, challenges and opportunities in site-specific vineyard management,” Spanish Journal of Agricultural Research, vol. 7, no. 4, pp. 779–790, 2009.
- R. Berenstein, O. B. Shahar, A. Shapiro, and Y. Edan, “Grape clusters and foliage detection algorithms for autonomous selective vineyard sprayer,” Intelligent Service Robotics, vol. 3, no. 4, pp. 233–243, 2010.
- S. Nuske, S. Achar, T. Bates, S. Narasimhan, and S. Singh, “Yield estimation in vineyards by visual grape detection,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics (IROS '11), pp. 2352–2358, September 2011.
- M.-P. Diago, C. Correa, B. Millán, P. Barreiro, C. Valero, and J. Tardaguila, “Grapevine yield and leaf area estimation using supervised classification methodology on RGB images taken under field conditions,” Sensors, vol. 12, no. 12, pp. 16988–17006, 2012.
- S. Liu and M. Whitty, “Automatic grape bunch detection in vineyards with an SVM classifier,” Journal of Applied Logic, vol. 13, no. 4, pp. 643–653, 2015.
- M. J. C. S. Reis, R. Morais, E. Peres et al., “Automatic detection of bunches of grapes in natural environment from color images,” Journal of Applied Logic, vol. 10, no. 4, pp. 285–290, 2012.
- P. Škrabánek and T. P. Runarsson, “Detection of grapes in natural environment using support vector machine classifier,” in Proceedings of the 21st International Conference on Soft Computing MENDEL 2015, pp. 143–150, Brno University of Technology, Brno, Czech Republic, June 2015.
- P. Škrabánek and F. Filip Majerík, “Evaluation of performance of grape berry detectors on real-life images,” in Proceedings of the 22nd International Conference on Soft Computing MENDEL 2016, pp. 217–224, Brno University of Technology, Brno, Czech Republic, June 2016.
- P. Škrabánek and F. Majerík, “Artificial intelligence perspectives in intelligent systems,” in Proceedings of the 5th Computer Science On-line Conference (CSOC '16), vol. 1 of chapter Simplified version of white wine grape berries detector based on SVM and HOG features, pp. 35–45, Springer, 2016.
- ITU-R Recommendation BT.601. Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios, March 2011.
- C. Kanan and G. W. Cottrell, “Color-to-grayscale: Does the method matter in image recognition?” PLoS ONE, vol. 7, no. 1, Article ID e29740, 7 pages, 2012.
- N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, pp. 886–893, June 2005.
- C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.
- J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012.
- M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing and Management, vol. 45, no. 4, pp. 427–437, 2009.
- C. H. Lampert, “Kernel methods in computer vision,” Foundations and Trends® in Computer Graphics and Vision, vol. 4, no. 3, pp. 193–285, 2009.
- S. Krig, Computer Vision Metrics: Survey, Taxonomy, and Analysis, Apress, Berkely, Calif, USA, 1st edition, 2014.
- R. J. Howarth, “Sources for a history of the ternary diagram,” British Society for the History of Science. British Journal for the History of Science, vol. 29, no. 3(102), pp. 337–356, 1996.
- P. Škrabánek, “Editor for marking and labeling of object images for binary supervised classification in matlab environment,” in Proceedings of the 21st International Conference on Soft Computing MENDEL 2015, pp. 151–158, Brno University of Technology, Brno, Czech Republic, June 2015.
Copyright © 2017 Pavel Škrabánek and Petr Doležel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.