Trash classification is an effective measure to protect the ecological environment and improve resource utilization. With the development of deep learning, it is possible to use the deep convolutional neural network for trash classification. In order to classify the trash of the TrashNet dataset, which consists of six classes of garbage images, this paper proposes a hybrid deep learning model based on deep transfer learning, which includes upper and lower streams. Firstly, the upper stream divides the input garbage image into category MPP (metal, paper, and plastic class) or category CGT (cardboard, glass, and trash class). Then, the lower stream predicts the exact class of trash according to the results of the upper stream. The proposed hybrid deep learning model achieves the best result with 98.5 % than that of the state-of-the-art approaches. Through the verification of CAM (class activation map), the proposed model can reasonably use the features of the image for classification, which explains the reason for the superior performance of this model.

1. Introduction

The increasing focus on protecting the ecological environment and improving resource utilization has led to making changes in trash classification around the world. Billions of tons of garbage are generated annually across the globe, which can cause a major impact on the environment [1]. Meanwhile, garbage contains a huge amount of recyclable resources. The sensitivity analysis shows an increase in the waste collection rate, the fraction of paper and plastic waste, and the recycling rate in 2025 will boost recyclable resources’ energy and environmental benefits [2]. However, classifying various wastes into different categories, such as metal, glass, plastic, and paper, is essentially a complex and expensive process. At present, some of the existing solutions almost require manual sorting, which involves amounts of labor costs. Therefore, with the rise of deep learning, it is a feasible direction to use machines instead of manual waste classification.

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art speech recognition, visual object recognition, object detection, and many other domains such as drug discovery and genomics [3]. Because of the success in these fields, trash classification has become a promising application of deep learning. Transfer learning is a new machine learning paradigm that has gained increasing attention lately. In situations where the training data in a target domain are not sufficient to learn predictive models effectively and transfer learning leverages auxiliary source data from other related source domains for learning [4]. Meanwhile, the transfer learning model can effectively extract image features and significantly reduce model parameters.

Manual sorting, mainly implemented in developing countries, proves to be more efficient than mechanical sorting [5]. Nevertheless, handpicking involves the manipulation of hazardous materials and can have various adverse problems. Therefore, utilizing deep learning to classify trash has positive environmental effects and is also beneficial for the health of the workers. Recently, Salimi et al. [6] developed a visual waste separation system based on classification with a pretrained neural network. This approach is suitable for the problem of waste separation with material and object-based class divisions. Meanwhile, Ramsurrun et al. [7] introduced a deep learning approach using computer vision to automatically identify the type of waste, which consists of an automated recycling bin. In 2016, Yang and Thung [8] released the TrashNet dataset, which consists of six main classes (glass, paper, metal, plastic, cardboard, and trash), and some subsequent studies are based on this dataset. However, the relevant studies at present almost pay little attention to the unique characteristics of garbage in the TrashNet dataset. The different deep neural network has different abilities to extract their unique characteristics. To make full use of the features of the image, this paper developed a hybrid deep learning model, which consists of two networks, the upper network and the lower network. The upper network is used to divide the image into two broad categories, and the lower network classifies the image into one class according to the upper results. Experiments were conducted to compare the performance and CAM (class activation map) [9] of our proposed model with the state-of-the-art methods for trash classification on the TrashNet dataset to prove the effectiveness of the proposed model.

The main contributions of this paper are as follows:(1)We proposed a hybrid framework for waste separation with a two-stream structure, which presents a unique perspective and approach for trash classification using deep learning(2)This study fills a gap in the research of hard to classify classes in the TrashNet dataset, such as the “trash” class, which has been overlooked by most studies(3)Experiments showed that the proposed model is very effective and outperforms the state-of-the art classification methods on TrashNet

Yang and Thung [8] released the TrashNet dataset and achieved an accuracy of 63 % using an SVM(support vector Machine) and 25 % using a CNN model named AlexNet [10] on it. Some attempts to evaluate their models using TrashNet are summarized as follows.

At first, Arda Aral and Recep Keskin [11] proposed to compare several different CNN models for image classification including Xception [12], MobileNet [13], Inception-ResNet-V2 [14], and DenseNet-169 and DenseNet-120 [15] using TrashNet. The data augmentation process was applied to increase classification accuracy in their experiments. According to their experimental results, a transfer learning model of DenseNet-121 achieved the best accuracy, which yields 95 %. Another success rate was the Inception-ResNet-V2 model with an accuracy of 94 %. In addition, they proved the feasibility of the transfer learning model.

Next, Bircanoğlu [16] developed a model named RecycleNet, which can carefully optimize deep convolutional neural network architecture for the classification of selected recyclable object classes. This model altered the connection patterns of the skip connection inside the dense block. Although RecycleNet only achieved 81 % in terms of accuracy for the TrashNet dataset, it reduced the number of parameters from 7 million to about 3 million.

Then, Vo et al. [17] designed a DNN-TC framework using the ResNeXt framework, which can automatically classify the trash in smart wastage sorter machinery. The standard ResNeXt-101 model was modified by adding two fully connected layers for reducing redundancy. During the training process, the pretrained weight of ResNeXt-101 [18] obtained on the ImageNet dataset was loaded, and it achieved the best accuracy of 94 % on the TrashNet dataset. In their experiment, the DNN-TC method outperforms ResNeXt-101 for classifying glass and cardboard. Nevertheless, for metal, paper, and plastic classes, compared with ResNeXt-101, the performance of DNN-TC decreases, which is not explained by the author.

Recently, Shi et al. [19] proposed a novel network improvement method based on channel expansion for trash image classification which named M-b Xception. The best performance results were 94.34 % on the TrashNet dataset. Although their method improves the robustness and classification accuracy compared with the previous methods, it also increases the computational cost and parameter capacity.

Currently, the world is still plagued by COVID-19, so it becomes a major challenge to use AI to deal with medical waste. Kumar et al. [20] developed a classification model based on the support vector machine(SVM) that achieved 96.5 % classification accuracy on the experimental dataset, which they proposed. Their method provides a new way for the classification of medical waste. However, due to some limitations of the dataset, their experimental approach has to face greater challenges, which is a common problem in the task of trash classification. In the experiments analysis section, we will give more discussion on this method.

Among the above models, this paper found that ResNeXt is the best model for transfer learning to classify trash. Meanwhile, we observed that the DNN-TC model added only two full connection layers to the original ResNeXt model, but there was a deviation in the classification performance of different categories. According to this phenomenon, we believed that images can be classified more effectively according to their unique characteristics. Therefore, according to the classification results of the DNN-TC model and ResNeXt model on the TrashNet dataset and the characteristics of garbage images, we divide the training dataset into category MPP (metal, paper, and plastic) and category CGT (cardboard, glass, and trash). At the same time, a multilevel approach was designed based on the ResNeXt model.

3. Proposed Work

3.1. Dataset

Classification of recycling materials is very important for humanity and civilization. The TrashNet dataset is a representative trash classification dataset, and several studies utilize it for evaluating their proposed approaches. Therefore, improving the accuracy of this dataset is very essential for trash classification. The image on this dataset was placed on a white background and used sunlight and/or room lighting. Every image on this dataset was reprocessed to 512384 pixels, and the original dataset is nearly 3.5 GB in size. The statistics of images for each class is presented in Table 1. However, Figure 1 shows some sample images of the dataset.

3.2. ResNeXt Model and DNN-TC Model
3.2.1. ResNeXt Model

In 2017, Xie et al. [18] presented a simple, highly modularized network architecture for image classification named ResNeXt. This strategy of the model exposed a new dimension named “cardinality” (the size of the set of transformations) as an essential factor in addition to the dimensions of depth and width. ResNeXt adopted a highly modularized design following VGG [21]/ResNet [22]. It consists of a stack of residual blocks. These blocks have the same topology. The template module used the split-transform-merge strategy to approach the representational power of large and dense layers. The constructions between the residual block and ResNeXt block are shown in Figure 2.

The inner product of the simplest neurons can be considered as a form of aggregating transformation. As shown in Figure 3, the input to the neuron is a D-channel vector and is a filter’s weight for the i-th channel.

From the above analysis of a simple neuron, the elementary transformation can be replaced with a more generic function or a network. Hence, the equation could be aggregated as follows:where can be an arbitrary function and C is in a position similar to D in equation (1), which is normally assigned by 32. The cardinality is the size of the set of transformations. In addition, all s have the same topology which helps isolate a few factors and extend to any large number of transformations.

The aggregated transformation of each ResNeXt block in equation (2) serves as the residual functionwhere y is the output.

3.2.2. DNN-TC Model

DNN-TC (deep neural networks for trash classification) was introduced by Vo et al. [17] in 2019. In terms of the TrashNet dataset, the DNN-TC model modified the original ResNeXt model by adding two fully connected layers after the global average pooling layer with output 1024 and N-class dimensions, respectively, to decrease the redundancy, which has been illustrated in Figure 4.

DNN-TC model uses the log softmax function to compute the confidence for each label as follows:where is the number of classification labels, is the output of each label, and are the final hidden outputs.

3.3. Proposed Model Structure

In this section, this paper proposed a hybrid deep learning model for trash classification. According to the experimental results of the ResNeXt model and DNN-TC model, we observed an interesting phenomenon. In detail, the DNN-TC model modified the original ResNeXt model by adding two fully connected layers after the global average pooling layer which output l024 and dimensions. Nevertheless, the test set accuracy of the ResNeXt model and DNN-TC model in Table 2 shows that DNN-TC outperforms the ResNeXt-101 model for classifying glass, cardboard, and trash. As for the metal, paper, and plastic, DNN-TC has a slight decrease that that of the ResNext-101 model.

For this property, we designed a hybrid deep learning model based on deep transfer learning. The model developed in this paper consists of two streams of CNN. At first, we split the TrashNet dataset into two parts. One part included the category MPP (metal, paper, and plastic), and the other part included the category CGT (cardboard, glass, and trash). Then, a primary CNN model architecture was designed to classify the image based on the two categories of MPP and CGT. This upper stream of the model contained five convolution layers and three fully connected layers. Among the three fully connected layers, the final layer was a two-neurons layer with the softmax function to apply the classification. The secondary CNN model (lower stream) was designed to perform a classification based on the subcategory of trash from the results of the upper stream. The lower stream used the ResNeXt-101 model to extract the features. In detail, we added two classifiers after the global average pooling layer. Even more to the point, the model of this paper froze the parameters by deep transfer learning. Therefore, the model only needs to train very few parameters. The specific model architecture is shown in Figure 5.

4. Experiments, Results, and Analysis

4.1. Experiment Settings

The experimental models were implemented on an Intel Core i7 11th CPU with 16 GB RAM and Nvidia GeForce RTX 3070 8 GB GPU. This paper uses the PyTroch framework, which is a free software deep learning library for Python.

Due to the lack of images in the TrashNet dataset, data enhancement is utilized to suppress the phenomenon of overfitting during training. It is a process of expanding data samples in the training dataset, including cutting, scaling, and rotation. With data augmentation, the amount of data expands to approximately three times the original amount.

We replicate state-of-the-art models for trash classification including RecycleNet [7], ResNet-50 [11], DNN-TC [17], RexNeXt-101 [18], and M-b Xception [19] to compare with the proposed model. Specifically, we utilized the ResNet-50 model and RecycleNet model with the same configurations, which were described in their work. It is worth mentioning that the setting of the learning rate of our model is partially referenced from the DNN-TC model. For the pretrained model of this paper, we compare the impact of the SGD algorithm and Adam algorithm on training and experimentally prove that the Adam algorithm is a better performer in our model. Therefore, the hyperparameters of the Adam optimizer are set up with the learning rate , and two momentum parameters and are utilized for the first 10 epochs. Then, the learning rate decreases by for every 10 epochs.

In addition, the experiment of this paper uses a minibatch size = 12 with 50 epochs. For the epoch setting, we found that the change of accuracy and loss of our model is very small after 50 epochs, and considering the training time and overfitting of the model, we set epoch = 50. In detail, we use the pretrained model based on the ImageNet and the parameters before the fully connected layers are retained. Therefore, the model only needs to learn the parameters of the fully connected layers, which greatly reduces the training time of the model. Meanwhile, for the fully connected layer, we set dropout to be  0.2. Then, we train the upper and lower streams of the model, respectively. The upper model makes a simple binary classification for the two categories of the TrashNet dataset. Specifically, the lower stream contains two classifiers for subclassification, so the two classifiers need to be trained separately.

4.2. Experiment Results

Based on the finding and evaluation using the TrashNet dataset, the proposed model was compared with the state-of-the-art methods, which were introduced in Section 4.1. The accuracy of the experiment is shown in Table 3. The proposed model outperformed other approaches for the TrashNet dataset. The hybrid deep learning model achieved 97.9 % in category MPP, 99.2 % in category CGT, and 98.5 % in total class, while ResNet-50, ResNeXt-101, DNN-TC, M-b Xception, and RecycleNet obtained 90.5 %, 91.7 %, 92.1 %, 94.3%, and 71%, respectively.

Figure 6 shows the loss and accuracy variation of training and test sets over 50 epochs for the TrashNet dataset of the hybrid deep learning model, the model only needs to train very few parameters. Therefore, the training time for the hybrid deep learning model is short. This image indicates that the hybrid deep learning model obtained a higher accuracy and smaller loss value after 50 epochs. Meanwhile, the model could quickly achieve the stable and generalization on the TrashNet dataset.

4.3. Experiment Analysis

According to the experiment, this paper found that almost all state-of-the-art models performed poorly in the trash class. Table 2 shows the number of objects predicted correctly in the trash class by each model. Unfortunately, none of the previous papers mentioned above explained it. We hold that there are two main results for this phenomenon. Firstly, the number of images in the class trash is smaller than that in other categories, which means the data is not balanced. Another is that the existing model cannot extract valid features of this category. We uses the CAM (class activation map) to represent the importance of each location for the class. The CAM shows the importance of each position to the class, which helps to understand which part of the image allows the model to make the final decision. Figure 7 shows the CAM of the DNN-TC model and the model of this paper in the image of trash class and paper class. The highlighted part in the figure represents the focus of the model. Among them, orange indicates the most attention, and blue indicates the least attention.

As the experimental results show, for the image of the trash class, our model pays more attention to the important parts. For the paper class, the DNN-TC model and hybrid deep learning model have similar performance. Through CAM analysis of prediction error samples of the trash class, this paper found that most models focus on the background. How to make the model distribute its attention to the more important part will become a focus of the author’s future research.

Although our model has achieved good performance on the TrashNet dataset, as the world is still affected by COVID-19, it is necessary to consider the results of our model on the classification of medical trash. However, due to the lack of medical trash dataset, we cannot quantitatively evaluate the classification effect of our model on medical trash. Therefore, we choose to qualitatively discuss the scenario studied by Kumar et al. [20]. According to the classification idea of our model, waste can be divided into domestic waste and medical waste in the upper stream of our model. Domestic waste includes metal, paper, and glass categories, and medical waste includes polyethylene terephthalate (PET) category mentioned by Kumar et al. Then, according to the classification results of the upper stream, the accurate subclassification is carried out in the lower stream of our model. In addition, our model has the advantage that we can use only the upper stream to separate waste into domestic waste and medical waste when we do not need to sort the waste into subcategories. In summary, we believe that our model has some applicability to the classification of medical waste. Of course, this will be one of our future research directions.

5. Conclusion

This paper proposed a hybrid deep learning model for trash classification. The model consists of two streams of CNN. Firstly, we split the experimental dataset named TrashNet into two parts: category MPP (metal, paper, and plastic class) and category CGT (cardboard, glass, and trash class). The upper stream of the model was designed to classify the image based on the category of MPP or CGT. Then, we proposed the lower stream of the model to perform a classification based on the subcategory of the trash from the results of the upper stream. To demonstrate the validity of the proposed framework, we compared the predictive performance of the method and the state-of-the-art models. The hybrid deep learning model achieved 97.9 % in category MPP, 99.2 % in category CGT, and 98.5 % in total class. We used CAM to verify the effectiveness of the hybrid deep learning model. Experiments showed that the model could effectively extract image features.

The authors do acknowledge the current limitations of this paper. For example, our model is not currently deployed in real systems. Also, due to the limitation of dataset, our model does not take into account garbage classification under complex background. The problem of trash classification in a complex context will be an important part of our future research.

In future studies, a novel framework will be designed to enhance the model’s attention to the important parts of the image. Besides, the author will also challenge more difficult datasets and apply the model to the real system.

Data Availability

The data used to support the findings of this paper are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This paper was financially supported by the Ningxia Natural Science Foundation (no. 2021AAC03084).