Abstract

Considering that the garbage classification is urgent, a 23-layer convolutional neural network (CNN) model is designed in this paper, with the emphasis on the real-time garbage classification, to solve the low accuracy of garbage classification and recycling and difficulty in manual recycling. Firstly, the depthwise separable convolution was used to reduce the Params of the model. Then, the attention mechanism was used to improve the accuracy of the garbage classification model. Finally, the model fine-tuning method was used to further improve the performance of the garbage classification model. Besides, we compared the model with classic image classification models including AlexNet, VGG16, and ResNet18 and lightweight classification models including MobileNetV2 and SuffleNetV2 and found that the model GAF_dense has a higher accuracy rate, fewer Params, and FLOPs. To further check the performance of the model, we tested the CIFAR-10 data set and found the accuracy rates of the model (GAF_dense) are 0.018 and 0.03 higher than ResNet18 and SufflenetV2, respectively. In the ImageNet data set, the accuracy rates of the model (GAF_dense) are 0.225 and 0.146 higher than Resnet18 and SufflenetV2, respectively. Therefore, the garbage classification model proposed in this paper is suitable for garbage classification and other classification tasks to protect the ecological environment, which can be applied to classification tasks such as environmental science, children’s education, and environmental protection.

1. Introduction

As early as the 1960s, many countries proposed green treatment schemes to classify garbage and protect the environment as global garbage pollution increased sharply [1]. Among them, Japan was at the forefront of garbage disposal. First, it classified recyclable garbage, and its domestic environment was significantly improved [2]. Considering the tasks in garbage classification and the high labor intensity, low classification accuracy, and poor effects, this paper proposes to use CNN to carry out real-time garbage classification on the conveyor belt to promote garbage recycling and environmental protection. This paper also uses CNN to classify plastic waste based on the research of Bobulski et al. [3, 4]. At the same time, to ensure the economic value, we classified recyclables into five categories including cardboard, glass, metal, newspaper, and plastic.

CNN was first proposed by LeCun [5] and others. After continuous development, it was used for classification tasks for the first time by KRIZHEVSKY [6] and others with excellent results. Skip connection was proposed by ResNet [7] in 2016 and reached a climax. Based on skip connection, we improved and optimized the design on the DenseNet [8] classification model and used it for real-time garbage classification on the garbage conveyor. We also compared it with the classic image classification networks including AlexNet [6], VGG [9], and ResNet and lightweight classification networks including MobileNetV2 [10] and ShuffleNetV2 [11] in real time. The experimental data show that the recyclable garbage classification method (garbage classification denseNet, GC-DenseNet) proposed in this paper has good generalization ability, real-time performance, and high accuracy. The network is shown in Figure 1.

2. Algorithm Description

2.1. Depthwise Separable Convolution

Depthwise separable convolution [12] consists of depthwise convolution and pointwise convolution [13]. Depthwise separable convolution was used to replace densenet 3 × 3 standard convolution to make the number of packets equal to the number of output channels, thereby reducing the number of parameters (Params) and floating point operations (FLOPs) of the model to improve the efficiency and real-time nature of the garbage classification model. The depthwise convolution and standard convolution adopted are shown in Figure 2.

Depthwise separable convolution is used to reduce the number of Params and FLOPs of the model. The formulae of Params and FLOPs are shown in formulae (1) and (2), respectively:where represents the height of the output feature; represents the width of the output feature map; represents the height of the convolution kernel; represents the width of the convolution kernel; represents the number of input channels, and represents the number of output channels.

2.2. Attention Mechanism

The principle of the attention mechanism is to make the model allocate limited resources to the important parts and ignore the unimportant parts, thereby improving the accuracy of the model. At present, the most commonly used attention mechanisms are channel attention mechanism [14] and spatial attention mechanism [15].

The channel attention mechanism makes the model pay more attention to the channel information of the feature map. Firstly, an adaptive maximum pooling and an adaptive average pooling were applied to the feature map. After that, the pooled result was sent to a multilayer perceptron (MLP) [16] for channel splicing, and the weights of the two output feature maps are combined. Finally, we multiply channel-by-channel weighting to the original features on the new channel weights to complete the recalibration of the original picture in the channel of the feature map channel. The network structure of the channel attention mechanism is shown in Figure 3.

The spatial attention mechanism makes the model pay more attention to the spatial information of the feature map. First, the input feature map undergoes an adaptive average pooling and adaptive max pooling in the spatial dimension. Then, a 7 × 7 convolution kernel is connected to form a new feature map. Finally, the output feature map is obtained by multiplying the attention module feature map by the new feature map through the scale operation. The network structure of the spatial attention mechanism is shown in Figure 4.

This paper combines the features of the channel attention mechanism and the spatial attention mechanism in CBAM [17] and adds the channel and spatial attention mechanisms in series after the 3 × 3 in DenseNet, to make the network model pay attention to the channel and spatial information while paying attention to the feature map, thereby improving the feature extraction ability of the network model and improving the accuracy of the garbage classification model.

2.3. Model Fine-Tuning

Model fine-tuning is a commonly used method for optimizing network models. It refers to improving the performance of the model by continuously adjusting parameters of the network model. First, because the experiment uses the densenet network model, the pkl parameter file well trained by the garbage is used as the parameter model of transfer learning [18], thereby improving the feature extraction and classification capabilities of the garbage network model to improve the accuracy. After that, since the shallow neural network has a stronger ability to extract garbage information, and to reduce the number of network model parameters to maintain the real-time nature of the garbage classification process, we set the first block of DenseNet to 3, and the others to 1. Finally, given the random garbage information and the unstable pixels of the pictures, to prevent some of the junk data from the explosion during feature extraction, we used hard-Swish [19] instead of RELU [20], which indirectly improved the robustness of the garbage classification model. Among them, hard-Swish is shown as follows:

3. Experiment and Result Analysis

In our experiment, we used the computer of ubuntu20.04 operating system, Intel (R) Core (TM) i-9750 H CPU @ 2.60 GHz 2.59 GHz, and the graphics card model of NVIDIA GeForce RTX 2070 with the memory of 8G. The garbage data set in this experiment was provided by the Institute of Automation, Chinese Academy of Sciences. For each garbage image, the width is 512 pixels, the height is 84 pixels, the horizontal and vertical resolution is 96pi, and the bit depth is 24. DenseNet is used for training. There were 2000 pieces in the training data set, 400 pieces for each category, and DenseNet was used for training. There were 300 pieces in test sets, 60 for each category. To ensure the readability of the experimental data, the following data were all performed with 300 test sets, and only one test set was used. The running time (Time), accuracy rate (Acc, Top1), the number of Params, and FLOPs were used as evaluation indexes.

3.1. Depthwise Separable Convolution

We used this convolution to replace the 3 × 3 standard convolution in DenseNet121 to reduce the Params and FLOPs of the model. We can see that from Table 1 that, after the depthwise separable convolution (Dense + Group) is used, the accuracy of the garbage classification model is stable, the Params is reduced by 1/3, the FLOPs is reduced by 1.71 times, and the running time is reduced by 3 min, which is conducive to the garbage classification project.

3.2. Attention Mechanism

This paper continues to optimize based on the original and makes the garbage classification model pay more attention to garbage information by attention mechanism to improve the accuracy of the model. From Table 2, we can see that its accuracy has reached 0.9221, although the running time, Params, and FLOPs have been correspondingly improved. So, this garbage classification model is better than DenseNet121 and is preferable for garbage classification tasks.

3.3. Fine-Tuning

To improve the effectiveness of the garbage classification model on the task, we conducted fine-tuning on the model. Then, we compared the fine-tuned model (Dense + Group + Atten + Funing, GAF_dense) with AlexNet, VGG16, ResNet18, MobileNetV2, and SufflenetV2. From Table 3, we can see that the FLOPs and Params of GAF_dense are much lower than other models in the same environment. Although the running time is slightly higher than that of AlexNet, AlexNet has a low accuracy rate and is not suitable for garbage classification tasks. At the same time, the accuracy rate is better than other classification network models. We can conclude that the model proposed in this paper is very suitable for garbage classification tasks.

Figure 5 is a curve of the loss function and accuracy during the training and verification of the garbage classification model proposed in this paper. From the graph, we can see that the model converges when the epoch is 100, which is suitable for garbage classification tasks.

To further prove that the classification model has a good generalization capability, we tested it on the CIFAR-10 data set and compared it with the abovementioned well-performed Resnet18 and the lightweight model SufflenetV2. From the data in Table 4, we can see that the model GAF_dense proposed in this paper is superior to Resnet18 and SufflenetV2 in time and accuracy.

At the same time, a test was carried out on ImageNet subdataset [21]. Table 5 shows that the GAF_dense model is still more accurate than Resnet18 and SufflenetV2. Finally, we can conclude that the GAF_dense model not only has a good effect on the garbage classification task but also has a good generalization capability. Especially, it can be transplanted to other classification task models and has a good portability accuracy.

4. Conclusions

This paper designed a model (GAF_dense) on the task of garbage classification and reduced the Params and FLOPs of the network model by using depthwise separable convolution on DenseNet. The attention mechanism was used to improve the accuracy of the classification model, and the model fine-tuning method was used to ensure the real-time nature of the model. We also compared the accuracy of the junk data set, CIFAR-10, and small ImageNet data set with the classic classification model. We compared the real-time nature of garbage classification with the lightweight classification model and found that the garbage classification model proposed in this paper has higher accuracy, better real-time performance, and generalization capability. We can see that this model is suitable for the classification task on the conveyor belt, which solves the problems of low classification accuracy and high labor intensity in the garbage classification process while protecting the environment and benefiting the country and the people.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study. Some data are internal data of our laboratory, so we cannot share them all at present.

Conflicts of Interest

The authors declare that they have no conflicts of interest.