Abstract

Deep learning has recently been hailed as the most advanced computer vision technology for image classification. The invention of convolutional neural network (CNN) simplified the effort of feature engineering. Classification of various stages of fruit maturity using machine learning algorithms is a difficult task since it is difficult to distinguish the visual features of the fruits at different maturity stages. Fruit ripeness is critical in agriculture since it impacts the quality of the fruit. Manually determining the maturity of the fruit has various flaws, including the fact that it takes a long time, needs a lot of labor, and can lead to inconsistencies. In developing countries, agriculture is one of the most important economic sectors. Created system can be employed in the food processing business, in real-life applications where the intelligent system’s accuracy, cost, and speed will improve the production rate and allow satisfying consumer demand. With small number of image samples, the system is capable of automating assembly line related work for classifying bananas along with sufficient overall accuracy. The noninvasive method will also be used to classify other clustered fruits or horticultural crops in the future. The system can either replace or aid human operators who can focus their efforts on fruit selection. The combined merits of RGB and HSI (hyperspectral imaging) for classification of bananas were highlighted in the present study; they have possible application as a model for classification of several types of horticultural produce. The multi-input model’s quick processing time can be a useful and handy technique in the farm field during postharvest procedures. Via a combination of CNN and MLP applied to data collected using RGB and hyperspectral imaging, the multi-input model reliably recognizes bananas with an accuracy level of 98.4 percent as well as an F1-score of 0.97. The AI algorithm predicted the size (large, medium, and microscopic) and perspective (front or rear half) of banana classes with 99 percent accuracy. In comparison to previous studies that simply employed RGB imaging, the presented model revealed the value of integrating RGB imaging and HSI approaches.

1. Introduction

Technology advancements have a tremendous impact on human lives. Technology is employed in a variety of industries to assist humans in doing various tasks. Agriculture is one of the effects of technology. One of the most important parts of agriculture is the employment of modern technology. Computer vision and machine learning are two of the most recent technologies [1]. Crop diseases are one of the primary causes of famine and food insecurity throughout the world. Plant diseases are expected to account for up to 16% of global agricultural output losses each year. Furthermore, current disease-fighting tactics need the extensive use of crop protection chemicals, which are dangerous to the end environment [2, 3]. In agriculture, computer vision and machine learning are utilized for a variety of tasks, including fruit detection, fruit classification, and ripeness level estimation or fruit fault diagnosis [46]. Plant diseases are important factors to consider since they diminish the quality and quantity of agricultural products. As a result, recognizing and diagnosing these illnesses are crucial [7, 8]. Agriculture is world’s most important source of food, and it is one of the most important aspects in determining a country’s economy. Agriculture is a primary resource of earnings for majority of developing nations [9]. The quality of processed fruits is particularly essential in the food sector. Meeting consumer demands and generating high-quality fruits on the production line at a rapid rate necessitate the use of high-performance technologies [10]. Plant diseases cause considerable productivity and economic losses in agriculture and forestry. Recent advances in agricultural technology have necessitated the development of a new generation of automated, nondestructive plant disease detection devices [11, 12]. Furthermore, because of its reliance on the weather and the labor market, the food business is one of the few fields with constricting conditions and limits. For example, if the fruits were not harvested at the optimal time owing to weather circumstances, the quality and quantity of the crop may suffer as a result of poor weather and excessive ripening. In underdeveloped nations, beyond 40% of food losses are reported at the postharvest and food processing stages, while in developed countries, 40% of food losses are observed at retail and consumer stages [13].

In Philippines, key food crops face major postharvest losses estimated between 27 and 42%. False maturity approximation, different mechanical damage during transportation, weight losses due to improper care, diseases, and rotting due to lack of proper management are all typical causes of losses in the banana sector, which can reach up to 30% [14]. Consumption of bananas gives good nutrition to the human body, since it contains high levels of potassium, serotonin, and iron. Because it helps to control blood pressure, avoid depression, and is also beneficial to anaemic people, it is highly recommended [15]. It is also utilized as a semisolid meal supplement for newborns [16]. In crop monitoring and disease surveillance, front-line remote sensing methods combined with machine learning (ML) are commonly employed. Two satellite image applications that provide exact, dependable, and cost-effective information at various spatial, temporal, and spectral resolutions are crop type classification and a disease early warning system [17, 18]. Banana farming or banana industry is an important sector of the worldwide agro-business since bananas are rich in mineral contents like potassium, calcium, magnesium, manganese, and iron to name a few. People all around the world consume bananas because they contain so many vitamins. Bananas are also considered an instant energy booster [15]. According to Wikipedia, approximately 15% of worldwide banana production is trades to western countries. The United States, being the world’s largest importer of bananas, accounts for approximately 18% of worldwide imports. Disease and infected banana trees, as well as other climatic changes, will result in a complete loss of banana production and export in all countries. In general, among the top viruses, two major diseases impact bananas: Black Sigatoka and Fusarium Wilt, also known as Panama disease and Xanthomonas Wilt. The symptoms and appearance as well as effects of the numerous diseases which are found in banana leaves—along with the symptoms, appearance, and results when the disease causes an infection—are discussed elsewhere [19].

According to banana production and export figures, India produces roughly 25.7 percent of global bananas [20], with Brazil, Ecuador, Indonesia, and Philippines contributing a total of 20% of global banana production. In the production of these items, the ripeness of the banana is a major consideration. To the best of our knowledge, most disease monitoring systems rely on single-sensor solutions, lagging behind the integration of several data sources. Furthermore, because it is difficult to monitor larger landscapes with unmanned aerial vehicles (UAV), sufficiently significant satellite imagery of advanced learning (ML) models via smartphone apps could aid in the detection and identification of banana plants, as well as providing much relevant data on their overall health status [21, 22]. In many food processing sectors, picking the right ripeness stage is crucial. The branding of products, as well as their industries, is ensured by their quality. In the fried banana chips industry, unripe and partially ripe banana pulp slices were employed as raw materials. Banana flour was made from both ripe and unripe bananas as raw materials. In order to make banana jam and banana puree, ripe bananas were utilized. Bananas that were overripe were also used to make banana jelly. The delicate peel of banana is highly prone to damage and decays in the postharvest processing and packaging, as well as transportation. All of this reduces the quality and influence of consumer choice [23]. The manual classification of bananas accounts for between 3% and 30% of postharvest losses due to restricted human vision, which makes it difficult to make a quick, accurate, consistent, and unbiased observation. Furthermore, misclassification of fruits due to discrepancies in human categorization might result in postharvest losses, lowering production profit margins [24]. Bananas are consumed by people all over the world since they are high in vitamins. Bananas are also regarded as a quick source of energy. Almost 15% of global banana output is exported to western countries. The United States, as the world’s top banana importer, accounts for around 18 percent of global imports. India produces around 25.7 percent of the world’s bananas, with Brazil, Ecuador, Indonesia, and Philippines accounting for 20%.

Because of unreliable handling in harvesting, grading, packaging, and transportation of these delicate fruits, which are prone to mutilation such as bruising, splits, abrasions, puncture, and cuts, to name a few, they get damaged, lose a lot of nutritional value, and become more susceptible to decay and microbial growth, influencing customer preferences [25]. Harvest and subsequent handling damage, as well as the rising expense of locating experienced farm laborers or fruit graders, may result in limited profit margins in the agriculture industry. In agriculture research, emerging technology and algorithms have brought particular interest to food losses as well as providing alternative ways for higher precision [26]. Studies involve fruit classification for quantification of fruit quality (which distinguishes eatable from defective fruits) or maturity stages. Individually, fruits are generally analyzed and categorized using a variety of imaging techniques as well as machine learning techniques [27]. Using computer vision and machine learning techniques, various research projects have recently been completed for controlling and grading fruits. Common applications include fruit classification and sorting fruit defect identification, ripeness detection, and food security estimation [28] as shown in Figure 1.

Different machine learning methods have recently been designed to solve various engineering and image processing challenges. Machine learning, particularly deep learning approaches, has advanced significantly in recent years, with notable improvements in domains such as medicine, agriculture, and food engineering [2932]. Table 1 shows the fruit classification methods found in the literature.

The majority of banana classification methods in the literature used just RGB which is a part of visible light imaging and responsible for extraction of only morphological changes and exterior aspects of fruits [38]. Fruit’s interior qualities are associated with spectral signatures and reflectance values which are extracted using Near-Infrared (NIR) Spectroscopy and partially by HSI [39]. The fruit's intrinsic flavor components were extracted using NIR. The remaining specifics and suggested method details are presented and discussed in this study in Sections 2. Neural network topology followed by experimental design and lately model training has been discussed. Section 3 discusses the results of the suggested method’s performance parameter evaluation. The performance of this study is compared to the performance of other approaches in the literature. Section 4 includes discussion followed by conclusion in Section 5.

2. Experimental Details

The banana kinds were collected from Taiwan and used in this experiment as data samples. For a total of 900 photos, the bananas were shot from both the front and back side of camera and the samples were divided into three different types. RGB (red, green, and blue) color model employs distinct red, green, and blue channels that are mixed in varied proportions to generate a broad spectrum of colors. To obtain other attributes from the RGB image file, a convolutional neural network was used. In addition to offering more picture information, the HSI approach for analysis employs a broad range of ultraviolet and visible light to break down the light striking each pixel into different spectral bands. More experimental details are given below.

2.1. Experimental Design

Data samples consisting of images of banana varieties from a particular nation, namely, Taiwan, were obtained and used in this investigation. Experts acquired the samples and preclassified them in accordance with worldwide banana classification criteria. The bananas were photographed from both the front and rear, for a total of 900 instances. Samples were divided into three categories: export quality, local market quality, and refuse quality, with 100, 500, and 200 banana classes, respectively. The lengths of the banana fingers were measured and divided into three categories, namely, small, medium, and large. These data, as well as visual examples, were utilized to train the banana size classification model. Only the RGB photos provided to the AI models were used to automatically classify banana sizes. Two cameras were used to acquire banana samples. The RGB camera—a regular digital single-lens reflex camera—and the hyperspectral camera were used to extract external and interior properties of bananas, respectively.

2.2. RGB Image Data Collection

RGB (red, green, and blue) color model uses distinct channels of red, green, and blue which are combined in various amounts to produce a wide range of colors. This device-dependent color model is primarily utilized for image sensing, representation, and display in electronic systems, as well as image processing. A banana’s hue denotes ripeness, while its texture shows discoloration or bruising on the peel.

Samples were collected at the same camera height, which was thought to be crucial for estimating sample size approximation. Throughout the RGB picture acquisition with white backdrop for normalization of all the collected images, the fixed camera distance and lighting were maintained. The resolution of image samples was lowered from 5184 × 3456 to 320 × 213 pixels so that they can be used as an input into CNN. Image samples are captured and preprocessed. The key features retrieved are color and texture. Each pixel can be used to extract a variety of characteristics.

Quality for export bananas must have a green peel color with no or few black spots and defects, whereas reject quality tends to have darker and grayer colors, which were derived from RGB color space. These photos were input to CNN, which used a sequence of convolutions to automatically extract additional exterior properties of the banana. The approach of isolating some exterior aspects of bananas was called Haralick texture. A convolutional neural network was used to process the RGB image file for extracting other features. The method utilized to extract some exterior properties of bananas was Haralick texture. The approach can tell the difference between bananas with a smooth texture and those with flaws. The photographs were also used to extract the banana size and view. The banana size cataloging model was implemented utilizing RGB picture file as an input for neural network using the recorded sizes as training data.

2.3. Hyperspectral Image (HSI) Data Collection

HSI is a technology that involves hundreds of channels and numerous huge dimensions of data. HSI technique for analysis uses a broad spectrum of ultraviolet and visible light, thereby breaking down the light striking every pixel into distinct spectral bands in order to deliver higher image information. With hundreds of thousands of color bands, HSI delivers additional imaging information, rendering it appropriate for a diverse range of image application areas. Reflectance values of banana may also be collected with a hyperspectral camera, which is useful information for categorization.

To provide extra visual information, every pixel is segmented into several spectral bands. SWIR (short wave infrared) imaging system, comprised of 288 bands with wavelength range of 1000 to 2500 nm, is used in the present study. The liquid content of the banana, along with the amount of soluble solid content, contributes to wavelength bands in specific ranges. A CNN was used to process images, while a multilayer perceptron (MLP) network was used to process numerical and categorical data. Extracted features from HSI and RBG images are presented in Table 2.

Figure 2 shows the model architecture of neural network employed in present investigation.

The size classification of bananas was performed using a convolutional neural network model, with the VGG16 CNN architecture considered as a base for the banana view and size categorization objectives. The size categorization input layer consists of images having resolution of 224 × 224 in 3 RGB channels, i.e., 224 × 224 × 3. A sequential model, which uses Keras open-source libraries, has been developed on top of CNN model. RGB and hyperspectral camera images were used as input for multiple CNN models to see which model would best use the data for banana grade categorization. VGG16, out of all the CNN models tested, yielded encouraging results. As a result, VGG16 was chosen as the study’s backbone CNN network. The comparative results of various CNN models are shown in Figure 3.

The overall structure of banana classification system is seen in Figure 4. Images collected with the two cameras were used as inputs. Size and view classification using Haralick function and reflectance were used as inputs in MLP for HSI data. Different CNN models were given RGB and HSI images. RGB images are processed using the CNN architecture.CNN using Conv2D, as designed for convolutional filters, as well as a ReLU function and MaxPooling2D from Keras libraries was employed. To construct the CNN architecture (CNN2) used for hyperspectral imaging images, a sequential model was applied to the VGG16 CNN framework. The grading system was created for use in a fruit (e.g., banana) manufacturing facility that requires quality monitoring and sorting. Human operators have always graded raw material and product to evaluate if the raw material is appropriate for production or sale over the years. When it comes to decision-making, this operation by human workers is deemed to be very sluggish, and their decisions on the product may also be inconsistent.

80% of the samples have been used in training, while the remaining 20% have been used in validation process. The AI model has been iteratively trained as well as validated on the multiple sets of data by using several features acquired from HSI and RGB images, which compose the whole dataset. Every epoch’s validation accuracy has been found, and the highest accuracy has been recorded. Neural networks used in the present study had their hyperparameters finely tuned to get appropriate outputs. After running the application numerous times, the best validation accuracy score was collected.

The multi-input DL model has been tested and implemented into an online grading system of banana quality classification. The performance of the proposed model for grading, created classifiers, and multiple-input framework has been assessed, and the performance of the various approaches investigated is presented in the Results.

3. Results

The Banana Grading System collected RGB image attributes and HSI aspects, including signatures created from banana hypercube data and reflectance values. The total classification performance was determined using all of the test data photos as input to the AI model, which had saved weights in the previous test run, producing a high efficiency of roughly 98.9%. The results show that the proposed model produces far fewer false positives. Three types had a recall of more than 97 percent, indicating that the classification can find all right examples in each class, with only a little deviation from the accuracy scores. The detailed discussion is given below.

3.1. Grading System Performance

The presented AI model is able to construct a variety of banana classifiers such as size categorization, as well as view classification (which includes class, front, and rear view among them), along with the system’s principal purpose of identifying the banana. Because export quality bananas should have a required size guideline which proposed framework consider that into account as well. The banana size and view were predicted by the automated system.

3.2. Classification Accuracy

Classification accuracy in the multi-input model is the percentage of image samples properly forecasted by BGS.

The highest validation accuracy attained after multiple runs of the system’s training was approximately 95.7 percent. The overall classification accuracy achieved has been calculated utilizing all of the test data images used as input to the AI model which saved weights in the last test run, yielding high accuracy of around 98.9%. The models did extremely well in terms of banana size and view classification, with a confidence level of 99 percent. All of the little bananas were correctly predicted. Medium bananas had a 94 percent accuracy rate, and huge bananas for export had a 99 percent accuracy rate. The reject class was accurately identified in the BGS model, whereas classifier algorithm for class 2 banana has shown accuracy level of 98%. The export class or class 1 showed 94% accurateness, which could be attributed to small sampling size of the whole data, as class 1 has the least samples or data images as compared to the other classes. The metrics of BGS report for three classifiers are shown in Figure 5.

The average of accuracy, recall, and F1-score can be analyzed in a better way with the classifier’s performance due to the imbalanced amount of samples per class. The precision of the three classifiers was greater than 0.97, indicating that they were able to recognize just proper occurrences of every designated classification. Results indicate that the presented model produces very much fewer false positive results. The recall of the three classes was high, being more than 97%, suggesting that classifiers can discover all correct cases in each class. With a minor variance from the accuracy ratings, for all classifiers, the F1-score, which is a weighted measure of prediction accuracy, was above 97 percent. The multi-input method performs splendidly, achieving an accuracy rate of 98.4 percent and a macro average F1-score of 0.97. BGS scored the same for the well described criteria, which can be attributed to evenly distributed quantity of data images per class in BGS framework.

3.3. Comparative Analysis of Different Methods

To gauge the strength of multi-input framework, employing the integration of RGB and HSI, several approaches and algorithms were built via the same set of banana data images including several features as inputs with distinct modalities. Precision, recall, F1-score, and accuracy have been used as performance measurements. Figure 6 summarizes the results of the various AI models. The ratio of accurately projected positive instances to all expected positive cases is known as precision. The ratio of all accurately detected positive instances to all genuinely positive cases is called recall or sensitivity. Because the F1-score is a weighted average of accuracy and recall, it accounts for both erroneous positives and false negatives. The F1-score is considerably more valuable than accuracy when data has an uneven class distribution.

ML approaches were used to feed the numerical and categorical characteristics of bananas. Every method’s best accuracy was captured, and we performed comparative analysis with multi-input framework in the present study.

Several ML methods have been implemented and tested, utilising diverse characteristics (RGB colour values, texture, and reflectance values) collected from the system. ML methods such as KNN, Gaussian Naïve Bayes, random forest (RF), decision tree (DT), SVM, and logistic regression (LR) are a few examples. The procedures, on the other hand, did not work well. The methodology that used only hyperspectral images had an accuracy of 82 percent, whereas the method that used only RGB photos had an accuracy of 89 percent. In terms of performance parameters, namely, accuracy and precision as well as recall and F1-score, the proposed system won the competition.

The machine’s running time was calculated to determine the grading system’s performance using the integrated multi-input DL framework. The average time taken to forecast the banana class with the help of user interface was found to be merely 15 seconds. With given 516 samples for training along with validation at 200 epochs, the training of the AI models took an average of 2650 seconds.

4. Discussion

A framework of CNN architectures for analyzing pictures received from RGB and HS cameras, as well as an MLP network for analyzing categorical as well as numerical information extracted, namely, banana size, banana view, and reflectance values, was used in multi-input model. The CNN section of this study used finely tuned VGG16 framework network grounded on the pretrained ImageNet model. Comparable banana grading models were used to evaluate the performance of the suggested AI model for banana categorization. Despite using a new image dataset, this study’s total accuracy of 98.4 percent puts it ahead of the competition. In comparison to the previous studies that relied solely on RGB imaging, the provided model demonstrated the value of combining RGB imaging and HSI approaches using the same RGB picture datasets for the categorization and classification of bananas using imaging technology (RGB), with an overall accuracy of 91.0 percent. Using two well-established and successful imaging approaches, RGB and hyperspectral imaging, the multi-input framework in this work can categorize bananas with greater accuracy than using a single image approach. Reflectance values obtained through hyperspectral imaging were also used to enhance the key characteristics of bananas. Bananas were classified using hyperspectral imaging based on the mean reflectance values as well as reflectance values for given wavelength.

The banana size categorization worked well as well. By selecting one band as an input to the CNN—thereby extracting essential features using HSI technique, namely, reflectance values along with texture extracted from hyperspectral image data file—the multi-input framework presented in this study is capable of solving high computing cost along with the problem of resource allocation. The model training along with real classification of the grading system is rapid and cost effective as evidenced by the running time gained, which is required to automate agricultural procedures. This research proposes a revolutionary strategy for combining RGB with hyperspectral characteristics in photos for quality grading of banana, as well as a ground breaking approach for automatic size classification of banana tiers. Other techniques that used only RGB photos of bananas performed worse than the model developed.

The model for banana grade categorization, on the other hand, can still be improved to eliminate overfitting. In future investigations, it is suggested that sample sizes for export as well as class 2 bananas be increased. Final results comply with the grading standards for bananas. Agriculture’s future lies in the automation of postharvest procedures. Fruit and vegetable inspection may be done quickly and accurately using machine vision technologies and spectral imaging techniques. Food losses can be decreased while productivity and profitability are increased using a nondestructive technology. The proposed multiple-input model was used to evaluate banana tiers, which is a perfect example of clustered fruit; as a result, the model has good potential for grading clustered fruits such as grapes, cherries, and a variety of clustered veggies.

5. Conclusion

The presented framework uses RGB imaging, hyperspectral imaging, and DL techniques to create a novel multi-input deep learning model. The RGB technique can quantify exterior elements of the banana, such as size, color, and texture; however, the HSI technique can extract spectral signatures, which are strongly linked to the internal qualities of fruits. Using a mixture of CNN and MLP applied to features extracted using RGB and hyperspectral imagery, the multi-input model correctly classifies bananas with an overall accuracy of 98.4% and an F1-score of 0.97. With 99 percent accuracy, the AI system anticipated the size (big, medium, and tiny) as well as view (front or back half) among banana classes. Using a small number of samples, this study demonstrated the advantages of RGB and HSI for several agricultural applications. The multi-input model’s quick processing time can be a useful and handy technique in the farm field during postharvest procedures. In the future, data augmentation on RGB and HSI images may be used to relieve the models’ potential overfitting. Using RGB and HSI images of various fruits and agricultural products, the model can be trained and tested. The multiple-input technique, along with robotics and mechatronics, can be employed in an assembly line setting to construct an automated system for real-time sorting and noninvasive classification of a range of vegetables and fruits in big farms.

Data Availability

The data are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.