Destemming fresh chilli fruit (Capsicum) in large productivity is necessary, especially in the Mekong Delta region. Several studies have been done to solve this problem with high applicability, but a certain percentage of the output consisted of cracked fruits, thus reducing the quality of the system. The manual sorting results in high costs and low quality, so it is necessary that automatic grading is performed after destemming. This research focused on developing a method to identify and classify cracked chilli fruits caused by the destemming process. The convolution neural network (CNN) model was built and trained to identify cracks; then, appropriate control signals were sent to the actuator for classification. Image processing operations are supported by the OpenCV library, while the TensorFlow data structure is used as a database and the Keras application programming interface supports the construction and training of neural network models. Experiments were carried out in both the static and working conditions, which, respectively, achieved an accurate identification rate of 97 and 95.3%. In addition, a success rate of 93% was found even when the chilli body is wrinkled due to drying after storage time at 120 hours. Practical results demonstrate that the reliability of the model was useful and acceptable.

1. Introduction

Chilli (Capsicum) is an important crop and is considered an almost indispensable spice in daily life [1]. The trace elements, minerals, and nutrients in their fruits have good health properties [25]. This crop is grown year-round, seasonally independent [6]; particularly, in the Mekong Delta (MD), high-yield species often grow, such as ChanhPhong F1, ChanhPhong F4, DongtienVang, and MuiTen 207.

Chilli products on the market are very diverse [7], and their value in Vietnam’s agricultural value chain is enhanced through processing processes [8, 9]. Drying steps are usually done using solar energy due to its low cost and ease of implementation [10, 11]. The stem removing systems have been studied with the goal of increasing productivity and automation, reducing the dependence on manual labor [1218]. These automated systems often cause cracking on the fruit body, while manual sorting reduces quality and increases cost; so, an automatic identification and sorting system of damaged fruits is needed to improve quality.

Several works have been done to detect outer surface defects of fruits and seeds [1922]. In recent years, the computer vision used in combination with artificial neural networks (ANN) has become a very successful method in many fields [2330]. With structures such as biological neural networks, ANN can be trained to identify desired objects [3133]. Using ANN, a model was built to detect, locate, and estimate fruit sizes [3438]. Combined with the image threshold method to detect the direction of the chilli stem, the ANN model has been developed on CUDA (GPU), which obtained 79 times faster than the CPU that allows real-time deployment due to outstanding processing speed [17].

The research in mango cultivating and grading has been studied [39, 40]. They have proved the strength of combining computer vision with ANN in complex identification applications. Moreover, nondestructive classification and size-based sorting technology using computer vision and artificial neural networks were built [4143]. By using YOLOv3 and the deep learning process, a model was built up to detect lychee fruits in low-light environments at night [44]. Another algorithm using an RGB-depth camera together with DeepLabv3 was also built to detect and to position the litchi fruit on the branch [45].

Computer vision has also been developed as a multivision solution in a complex and dynamic environment [4648]. General reviews on fruit classification and identification of spoiled agricultural products using computer vision, in combination with a straight propagation neural network and deep learning process, were evaluated [4951]. They all show that ANN has much strength and is perfectly suitable for identification applications.

In order to remove the stem with less damage, it is better to rely on the separation method—a method in which the stems are clamped and pulled away from the fruit body until they are separated [52]. This method obtained good results. However, it also led to fruits cracking at the position adjacent to the stem as described in Figure 1, thus reducing quality as well as productivity. Cracked chilli in the finished product will reduce the quality and affect other intact fruits during storage.

This research focuses on developing an identification model to classify fresh chilli fruits that have cracks caused by the destemming process (Figure 1). This system is placed right after the stem separation system to eliminate cracked ones, increasing the quality and uniformity of the finished chilli group after splitting. Computer vision was used to process input images, and the propagation artificial neural network has been trained to determine whether cracks appear on the chilli fruit or not. Then, the appropriate signals were sent to the actuator to push them into the appropriate group.

2. Materials and Methods

2.1. Convolution Neural Network, Input Library, and Training Process

To carry out the crack identification process, a linear forward propagation convolution neural network model was chosen for application because its simplicity will help speed up processing in real-time applications. The network defined in this study consisted of three layers, as shown in Figure 2: (i)One input layer included 1024 neurons. An original image in dimension pixels, through the convolution 32, 64, 128, and 256 kernels with dropout rate ½, will reduce the size accordingly to , , , and pixels. The final was a matrix with 1 column and rows. This was the input of the neural network. To achieve a good matching, the number of input neurons was selected as 1024(ii)One hidden layer included 64 neurons. All of them were fully connected and linked to all outputs in the next layer(iii)One output layer had 2 neurons corresponding to 2 outputs. They were defined as in the case of input images without cracks and in the case of input images with cracks

Python was the most popular language used in image processing, identification, and object detection applications. With friendly programming interface, large open-source library, and strong communication features, it was chosen as the main programming language for this study.

OpenCV and TensorFlow were used as the main library for performing image processing tasks and data structure. Convolutional layers were used to process and extract features of the input image, while the dropout method was used to reduce the size of the input image or, in other words, to reduce the required physical memory used in processes. The Keras application programming interface (API) is used to build and train a sequential neural network model.

The image processing sequence and neural network model training are depicted in Figure 3. The original data includes 6000 gray images in the size of pixels through 4 transformed operations: rotate 45°, flip, enlarge, and stretch, which will create a dataset of 30,000 images. They were then resized to pixels. After four convolution operations with pixel reduction ratio ½ and one image flatten operation, each image was transformed to a data matrix in grayscale of size . The neural network training was performed in batch size 32 images, and the maximum number of epochs was set to 20. The training result is a model saved in .h5 format.

To train the neural network, the set of 6000 images in size pixels was collected selectively. This was divided into two groups: (i)The crack group consisted of 3000 images, wherein 2400 images were used for training and other 600 ones were used for validation(ii)The noncrack group also consisted of 3000 images with the same division percentage

The image conversion process is described in Figure 4. A noncrack image with pixels through convolution 32 kernels (sizes ) created a three-dimensional data matrix in size as presented in Figure 4. Similarly, through the convolution 64, 128, and 256 kernels with dropout rate ½, tensors in size , , and were, respectively, generated. Finally, the flattening step transferred the three-dimensional matrix into a one, which was suitable for the neural network input.

2.2. Hardware Setup

A block diagram of the crack identification process is shown in Figure 5. Details of the parts are presented in Figure 6:

Conveyor belt: used 8 mm pitch timing belt;

V-shaped slots: positioned and kept chilli fruits in the grooves, with a properly calculated pitch of 24 mm;

Main motor and transmission belt: drive the conveyor belt;

Camera 5.5 mm diameter with LED lighting attached;

Raspberry Pi 4 computer (or personal computer);

Arduino UNO R3 microcontroller: received control signals from Raspberry computers (or personal computers) and controlled stepper motors via driver TB6600;

Cover box: eliminated external light, stabilizing the brightness of the internal working environment around the camera. The inner surface of the cover box was white and translucent to reduce the reflective effect;

Pushing motor: rotated the elastic rods to push cracked chilli out of the conveyor belt. The size and elasticity of the rods, as well as the diameter, elevation, and rotation speed of the rods, were carefully selected.

V-shaped slots were made of plastic and manufactured by the 3D printing method. They were fixed onto the conveyor belt by countersunk head screws and adhesive simultaneously. The hardware assembly is described in Figure 7. The motor position was designed to be adjustable in two linear directions, while the camera could be moved in two translational directions and one rotational direction.

2.3. Control Flowchart

The identification flowchart of the process is depicted in Figure 8. After starting (Start), the neural network model in .h5 format (obtained through the training process as described in Section 2.1) was loaded to the working memory. After that, the “Set start position” operation needs to be carried out for both the conveyor belt and the pushing motor. Every time the numeric keypad “1” was inputted and “Enter,” the main motor was driven to rotate at an angle of 1.8°. This step was repeated until the symmetrical plane of the V-shaped slot, the center of the camera, and the position of the pushing rods were on the same surface as the A-A section in Figure 6. At this position, fruits would have appeared completely in the target area, as described in Figure 9. Every time the numeric keypad “2” was inputted and “Enter,” the pushing motor was controlled to rotate at an angle of 1.8°. Continue this step until the pushing rods were horizontal, as shown in Figure 7(a).

Once the initial position has been set correctly, press the “S” key and “Enter,” and the identification process began with the camera capturing the image and saving it to the working memory. This image was converted to grayscale, then cropped pixels in the target area, where cracks appeared. Next, its size was reduced to and transformed into a matrix of in order to make it compatible with the CNN input. Then, it was compared with the crack model, which has been uploaded on memory. CNN results returned the value “1” if there was a crack in the input image; otherwise, the return value was “0.” The actuators were operated as follows: (i)If the return from CNN was “1,” the control signal would be sent to rotate the pushing motor 180° to push the chilli fruit away from the conveyor belt; then, it continued to control the main motor to move the conveyor forward 24 mm step in order to bring the new fruit into the target area(ii)If the return from CNN is “0,” the pushing motor rotation control signal was ignored, and only the main motor control was performed

The steps from capturing, identifying, CNN comparing to pushing were repeated until a stop signal (Stop) was found.

2.4. Experimental Setup

To assess this model, three experiments were set up to test the identification ability under different working conditions.

2.4.1. Experiment I: Static Identification

This first experiment was used to verify the model’s ability to identify under the static condition. First, destemmed fruits (with or without cracks) were placed in the target area on the V-shaped slot. Then, the recognition algorithm was started while the entire conveyor and the chilli were completely still. The display observed on the computer is shown in Figure 9. Results on the screen returned the word “Crack” if there were cracks on the fruit; otherwise, it returned the word “Non_Crack.” There were 200 fruits, each of which was identified five times, corresponding to a total of 1000 tests, of which 500 times had cracks and 500 times had no cracks.

2.4.2. Experiment II: On-Working Identification

This second test helped verify the identification ability when the conveyor belt was working, including moving parts that affected the recognition accuracy.

A code was written to identify every batch of 10 fruits. Initially, 10 fruits were placed in the V-shaped slot. When the initial position has been set, the first fruit was moved to the target area; the conveyor was stopped for 0.3 seconds to carry out identification and push. Then, the next ones were in turn taken to working position until finishing the whole batch.

This experiment was done on 200 fruits, of which 50% of fruits had cracks and others had no cracks. Each of them was identified five times, corresponding to a total of 1000 tests or equivalent to 100 batches.

2.4.3. Experiment III: Identification Ability during Storage Time

Due to practical processing conditions, chilli fruits were destemmed after a certain storage period from harvest. This third experiment was designed to evaluate the model’s identification ability at storage periods from 0 to 168 hours.

There were 8 samples harvested at the same time, stored in nylon bags, and placed in a well-ventilated room with temperatures ranging from 25 to 30°C. Each sample of 50 fruits was identified, similar to Experiment II described in Section 2.4.2. The first sample was identified right after being picked from the tree (0 hours). After every 24 hours, samples were used in turn to perform experiments until 168 hours.

Each fruit was identified four times; thus, there were a total of 200 identities in every sample. Results were recorded and analyzed to assess the effect of storage time on the identification quality of the algorithm.

3. Results and Discussions

3.1. Static Identification Results

Experimental results in the static condition are presented in Table 1. The exact crack recognition rate reached 97.6% and was higher than 96.4% in the case of noncrack. The difference was relatively small at 1.8%; thus, the model was considered to have the same identification ability for both the crack and noncrack fruits.

3.2. On-Working Identification Results

A comparison of the identification rate in the working condition and the static condition is presented in Figure 10. The exact recognition rate decreased slightly from 97.6 to 96.4% for cracked chilli fruits while having a significant reduction from 96.4 to 94% in noncrack ones. The causes were supposed by the effect of vibrations that blurred input images, thus increased crack-like features on the image of noncrack fruits. This led to less accurate identification. While for crack ones, the images already had those features, so vibration had less effect.

When the identification algorithm was running, the conveyor belt and the fruits were still standing. But the movement caused both the system and the camera to continue vibrating for a while after stopping. The solution was offered to move the camera out of the frame or use a high-shutter speed camera to increase the input image quality.

In this current study, cracked fruits were detected and pushed away. The exact recognition rate of 96.6% in the crack group equivalent to 3.4% of them was recognized as noncrack. They would not be pushed away, left on the conveyor belt, and mixed in the noncrack group.

In fact, the destemming system had a relatively low crack rate. By adding this grading system, this rate would be reduced significantly. Assume that for every 100 kg of destemmed chilli, there was 5% (equivalent to 5 kg) cracked. Passing this system, only 3.4% of the 5 kg will remain in the finished group. Thus, the error rate was improved from 5% initially to 0.1786% after treatment.

3.3. Effects of Storage Time to Identification Ability

Identification ability by storage time is evaluated and presented in Figure 11. It can be seen that the accurate detection rate after 24 hours of storage was stable and declined slightly to 93% at 120 hours. After that, there was a rapid decrease to 90% at 144 hours, which fell down to only 85% at 168 hours of storage. This decline was explained by the fruit’s wrinkles during the drying and dehydration process, resulting in regions of the input image with crack-like features. Those wrinkles were misidentified as cracks and led to a low accuracy rate. During the first period of storage from 0 to 96 hours, the fruit body shrank size was not large enough to cause a false identification, so the accuracy rate was almost not changed. Over time, those wrinkles enlarged enough to be recognized. From these results, when using the crack identification model, the destemming operation should be done during the period of 0 to 96 hours of storage to ensure good identification and classification.

A sample of the noncracked chilli was chosen to store and examine the wrinkle of the fruit body. As described in Figure 12(a), there was no wrinkle until 24 hours of storage. As time was prolonged at 72 hours, small deformations appeared, as shown in Figure 12(b), but their sizes were so small that they did not confuse the identification. By 120 hours, the deformations became so large that they were easily observed by visual inspection. As shown in Figure 12(c), the concave and convex appeared with sizes and features similar to cracks, thus reducing the accuracy. Those wrinkles continued to expand after 120 hours, especially at 168 hours, as shown in Figure 12(d).

In the production process, to ensure the quality of the fresh chilli, stem separation was usually done during the first 36 hours of storage. Therefore, the decrease in the accurate identification rate was negligible under these practical conditions.

4. Conclusions

The CNN model has been developed to identify and classify fresh chilli fruits with cracks caused by the destemming process, thus increasing the efficiency of the processing. The accurate recognition rate of 95.2% has proved that the model was reliable and suitable to be applied in practice. In addition, after 120 hours of storage with wrinkled skin, test results for destemmed fruits also yielded an accuracy of 93% and met the required reliability. This model helps to improve the automatic ability of the destemming and sorting system. By adding this grading method, the destemming system has higher applicability in real production. Because the test was performed in step mode, studies on increasing accuracy in real-time identification would be carried out in further works.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Additional Points

Highlights. The manual sorting of cracked chilli after destemming results in high costs and low quality; thus, automatic grading is necessary to be performed. The convolution neural network (CNN) model was built and trained to identify cracked fruits in order to classify them into one group. OpenCV, TensorFlow library, and Keras API were used to support this work. Experiments achieved an accurate identification rate of 95.3%. Moreover, a success rate of 93% was found even in wrinkled chilli bodies during the drying and dehydration process.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.


Quoc-Khanh Huynh was funded by the Vingroup Joint Stock Company and supported by the Domestic Master/Ph.D. Scholarship Programme of Vingroup Innovation Foundation (VINIF), Vingroup Big Data Institute (VINBIGDATA). The authors also would like to thank Chi-Tinh Vo, Huu-Quan La, and Minh-Tri Nguyen for their enthusiastic participation in this research.