Abstract

With the development of the Internet of Things, the application of computer vision on mobile phones is becoming more and more extensive and people have higher and higher requirements for the timeliness of the recognition results returned and the processing capabilities of the mobile phone for image recognition. However, the processing capability and storage capability of the user terminal equipment cannot meet the needs of identifying and storing a large number of pictures, and the data transmission process will cause high energy consumption of the terminal equipment. At the same time, multisource deep transfer learning has outstanding performance in computer vision and image classification. However, due to the huge amount of calculation of the deep network model, it is impossible to use the existing excellent network model to realize image recognition and classification on the mobile terminal. In order to solve the abovementioned problems, we propose a multisource mobile transfer learning algorithm based on dynamic model compression, this algorithm considers the realization of multisource transfer learning computing in the case of multiple mobile device computing source domains, and the method also guarantees data privacy and security for each device (origin domain). Meanwhile, extensive experiments show that our method can achieve remarkable results in popular image classification datasets.

1. Introduction

Machine learning can achieve good results in computer vision, often based on the following assumptions: there are enough data samples in the training dataset to train a high-precision classifier and training data and test data come from the same feature space and the same distribution. However, obtaining sufficient labeled data for practical applications is often expensive and time-consuming. In this case, transfer learning [1] is a promising approach. It transfers knowledge from the tagged source domain to the target domain. At the same time, the emergence of convolutional neural networks has also accelerated the technical level of transfer learning models. Transfer learning generally assumes that training and testing data come from similar but different distributions [2]. For example, images of objects taken at different angles, backgrounds and lighting may result in different edge or conditional distributions. By observation, existing transfer learning methods mainly focus on distribution adaptation to alleviate the domain gap through distribution adaptation. For example, several unsupervised transfer learning methods [35] incorporate maximum mean difference loss into neural networks to reduce domain differences; other models introduce different learning modes to align source and target domains, including aligning second-order correlations [6, 7]. In practical applications, we are often faced with multiple source domains. Common multisource transfer learning [8] maps the data of these two domains into a common feature space, which describes the invariant features of the source and target domains by minimizing the difference in domain distribution [5, 912]. With the development of deep learning, many scholars have proposed transfer learning models based on deep learning [1317].

At present, the global Internet of Things has entered the third wave of development. The mobile terminal generates a large amount of data. In the traditional computing architecture, the data needs to be centrally transmitted to the cloud for processing, which undoubtedly increases the network load and data transmission time. Therefore, mobile computing came into being. Processing deep learning on mobile devices has natural advantages over cloud computing. The entire workflow does not need to upload data to the cloud, which can run offline while avoiding the privacy risks caused by data transmission and off-site storage in the cloud computing process. Although deep learning models have excellent performance, the convolutional layers in CNNs consume a lot of computational and energy resources, posing severe challenges to devices. AlexNet [19], VGG [20], GoogleNet [21], and ResNet [22] are all excellent algorithm models of ILSVRC, but these models usually have tens of millions of parameters and require hundreds of megabytes or even 1G of memory. For multisource learning, multiple boxing neural networks are usually required to work together, so deploying multisource deep learning models on mobile devices is a very serious challenge, regardless of the amount of computation and memory overhead. GoogleNet and ResNet also have to face this problem.

Deploying multisource transfer learning on mobile devices faces two challenges: (1) How to effectively use mobile devices to deploy transfer learning models to mobile device side. (2) The model and calculation volume of multisource transfer learning are often very large and how to use the advantages of the cloud and mobile terminals to achieve the purpose of optimizing computing.

In this paper, a novel multisource mobile transfer learning algorithm based on dynamic model compression is proposed by combining the advantages of mobile computing, convolutional neural networks, and multisource transfer learning.(Multi-source Mobile Transfer Learning Algorithm Based on Dynamic Model Compression, MMTLDMC). MMTLDMC first performs BN pruning on the convolutional neural network. The classification loss and MMD loss is then computed on the device side (source domain). The classifier alignment loss is then calculated server-side. Then, the parameters are updated by minimizing the objective function. Finally, the model results are obtained, and the data classification is completed.

Compared with previous work, the contributions of this work are as follows:(1)Different from the previous multisource transfer learning algorithms, we consider the needs of mobile computing and propose a new multi-source transfer learning algorithm MMTLDMC algorithm, which combines the advantages of both to accelerate the multisource transfer learning model.(2)Use data on a device as a source domain. For samples on multiple devices, calculations are only performed on the device side, and the server and device side only synchronize parameters and models, taking into account the security of device data.(3)Experiments on real datasets show that the proposed algorithm outperforms or at least comparable to state-of-the-art benchmark algorithms in classification accuracy. At the same time, the speed is much better than the existing transfer learning algorithm.

The rest of the paper is organized as follows: Section 2 reviews related work on convolutional neural network pruning, mobile computing, and multisource transfer learning; In Section 3, a multisource mobile transfer learning algorithm based on dynamic model compression is proposed; Section 4 verifies the effectiveness of the algorithm on SVHN, USPS, MINIST, Office-31, Caltech-256, and DomainNet; Section 5 summarizes the main work of this paper.

2.1. Convolutional Neural Network Pruning and Mobile Computing

The powerful feature sampling performance of convolutional neural networks comes from a large number of parameters and a complex multilayer structure. Modern convolutional neural networks are also deepening and widening the classical structure to improve the accuracy. For example, VGG [20] increases the number of layers from 8 to 19 on the basis of AlexNet [21]; GoogleNet [22] is increased to 22 layers, adopts inception structure design, and uses average pooling to replace the full connection layer. ResNet [23] only learns the relationship between the residual and the input through the residual identity mapping, and builds a model that is easier to optimize. The depth has also developed from 152 layers of the standard to thousands of layers. The traditional pruning method is divided into three steps: baseline training, pruning, and fine tuning. In terms of pruning granularity, it can be divided into two categories: unstructured pruning [24] and structured pruning. Unstructured pruning means directly pruning individual weights. The weight matrix formed is sparse, which is not easy to achieve the compression and acceleration effect of general and easy deployment. Structured pruning refers to pruning for convolution core [25], channel [26, 27] or layer [28]. In the direction of convolution kernel pruning, literature [29] improved the pruning strategy based on convolution kernel weight ranking, and adopted the sum of the absolute values of the regular term L1 of convolution kernel as the pruning weight. Layer pruning needs to cut the complete layer with a poor flexibility and high precision risk. It is generally used to cut the deep network structure [30]. Channel pruning is one of the most studied and widely used structural pruning methods. Reference [31] proposed a channel pruning method for BN layer for model compression. By using the weight of BN layer to evaluate the score of input channels, the author sets a threshold to filter out the channels with low scores. When connecting, the neurons of these channels with too small scores do not participate in the connection, and then prune layer by layer. The pruning technology of convolutional neural network reduces the actual deployment of the model in the mobile terminal. Figure 1 shows the pruning model of BN channel factor.

2.2. Multisource Transfer Learning

Multisource transfer learning as a research direction of transfer learning has very important practical values. In the process of real life and practical application, there are often multiple source domains. Although each source domain has a different similarity with the target, these source domains can still be used for knowledge transfer. Moreover, multisource transfer learning contains more knowledge, which can make the effect of the model better. At the same time, transfer learning also has a theoretical basis. Crammer [28] first proposed the expected loss boundary condition of multisource transfer learning. Later, Mansour [29] proved that the distribution weighted combination rule can reduce the instantaneous function between the source domain and the target domain. Ben-David [30] gave two learning boundaries of minimizing empirical risk by introducing the distance between the target domain and the source domain.

In recent years, a lot of work has been centered on multisource transfer learning and deep learning. Zhuang et al. [31] proposed the Deep Cocktail Network (DCTN), which uses a single domain discriminator and a classifier for each source domain and target domain. The domain discriminator is used to align the feature distribution, and the classifier outputs the predicted probability distribution. Based on the output of the domain discriminator, DCTN designed a method of voting by multiple classifiers. Carmmer et al. [32] proposed a moment matching multisource domain adaptation (M3SDA) method, which not only considers the alignment between the source domain and the target domain but also aligns the feature distribution of different source domains. Zhu proposed a framework named aligning domain-specific distribution and classifier for cross-domain classification from multiple sources (MFSAN) [33]. However, the current deep multisource transfer learning algorithms often only consider marginal probability distribution or consider the marginal probability distribution and the conditional probability distribution separately. In this paper, multisource transfer learning is based on balanced distribution adaptation, which considers the joint probability distribution to improve the accuracy of the algorithm. However, the multisource deep transfer learning algorithm is not combined with mobile computing because of its large amount of computation. This paper proposes a multisource mobile transfer learning algorithm based on dynamic model compression, which aims to train the multisource transfer model on the mobile terminal.

2.3. Problem

In mobile computing or edge computing, we can take the data collected by each device as a source domain, but for privacy and security reasons, the data of each device cannot be completely uploaded to the server for model construction. According to this restriction, we redefine multisource transfer learning. In multisource transfer learning, there are source domains(clients), and their labeled sample data can be represented as , where represents the -th sample data in the -th source domain, and represents the -th source domain in the -th source domain, represents whether the -th sample tag in the -th source domain can be synchronized to the server, 0 indicates that it cannot be synchronized, and 1 indicates that it can be synchronized. The joint probability distribution of different domain can be expressed as , where the marginal probability can be expressed as , and the conditional probability can be expressed as . Similarly, we give the definition of the target domain, the sample of the target domain can be expressed as , and the probability distribution can be expressed as . This paper mainly considers the problem that the source domain data cannot be shared with the server, but the target domain data can be shared with the server which means , .

In recent years, some papers have defined the objective function of multisource deep transfer learning. They first map all domains to the same target space, then use the common domain invariant representation in the common feature space for learning all domains. Zhu [33] gave a definition of the loss function:

The first term represents the loss of the classification function, the general classification loss is the cross-entropy loss, and the second term represents the statistical measurement of the source domain and the target domain. Nowadays, the commonly used metrics are MMD [34, 35] loss, CORAL loss [5], and confusion loss [14, 15]. The third item, Zhu defines it as a specific difference loss. At present, there is no relevant method to consider the combination of multisource transfer learning and mobile computing. Most of these algorithms have high requirements for GPU and CPU of computing devices and cannot be calculated on the mobile end. At the same time, due to the concept of user privacy, it is very unfriendly to synchronize user data to the cloud. Therefore, how to realize the calculation of a large-scale model on the mobile terminal, such as multisource migration learning, is our concern.

In order to solve the above problems, we first map multiple source domains and target domains into the same subspace, and then prune the BN channel of the convolutional neural network in the space to reduce the amount of computation at the mobile end. Then, in order to ensure data security, we calculate the classification loss and MMD loss at the mobile terminal. Then, the calculated loss is synchronized to the server, and the server calculates the loss of the alignment classifier. Finally, we optimize the parameters by minimizing the loss function to obtain the optimal solution.

3. Multisource Mobile Transfer Learning Algorithm Based on Dynamic Model Compression

In order to solve the impact of mobile computing with memory and CPU on the existing multisource transfer learning algorithms. In this chapter, we introduce the multisource mobile transfer learning algorithm based on dynamic model compression. We use the model compression strategy proposed in literature [31] to compress the deep learning model.

3.1. Algorithm Structure

Our algorithm structure consists of two parts. The first part is the server side, including the calculation of minimization loss function and parameter update; The second part is the client (source domain), which mainly calculates the loss function and updates the parameters according to the results of server, as shown in Figure 2.

3.1.1. Preparation-BN Channel Pruning

We extract the source domain features into the same feature space in the form of shared parameters. Before extraction, we prune the BN channel of the feature extraction network according to the previously trained migration learning model. The purpose here is to reduce the parameters of the mobile terminal network model. We prune according to the literature [31], and the main method is to directly use the gamma parameters of BN layer for pruning evaluation.

The loss function after introducing the channel factor of BN layer is as follows:where represents training data and labels, and represents parameters. The first half is CNN’s original loss function, and the second half is the introduced penalty term. is the sparsity factor used to balance the formula term.

The mean and variance of BN activation values are calculated as follows:

Then, the calculation process of BN layer output can be expressed as

Among them And Are BN layer linear transformation parameters that can be trained. We refer to the method given in reference [31] to pretrain and fine tune the existing RESNET network. The initial convolution neural network parameters are formed.

Client: In order to ensure the data security of the client (source domain), we separate the loss computation of the multisource transfer learning objective function proposed in [17]. Put the classification loss and MMD loss into the client side to calculate. A set of images were given: from source domain and a set of images from target domain . The features of these specific fields are mapped to the same feature space through a common feature extractor. Some are specifically expressed as source domain mapping feature , target domain mapping characteristics . Therefore, we can get feature extractor corresponding to specific source domains . We use the pruned convolutional neural network as our classifier, we define as the classifier of source domains. According to experience, our classification loss is crossing entropy loss, and the loss function is .

Server: From Figure 2, we can see that the server needs to calculate the alignment loss of domain-specific classifiers proposed in the literature [17]. At the same time, on server side, we need to calculate the minimization objective function and the nonshared parameters of the neural network in each client.

3.1.2. Objective Function

According to Figure 2, we define the final objective function of the algorithm as

Classification loss -the loss caused by a specific domain classifier, according to Figure 2, we can see that the variable in source domain(client) undergoes a three-step transformation, first get through the public feature extractor, then get through domain-specific feature extractor, and finally get through the CNN classification after pruning. The final loss of the -th client classification is:

MMD loss -specific domain classification loss, maximum mean discrepancy (MMD) is a commonly used method to estimate the difference of distribution measurement. It is a commonly used two sample test method in statistics. From two probability distributions and , first assume , and then we can decide to accept or reject this hypothesis according to the results calculated or observed by MMD. Generally speaking, we can measure the difference between the two distributions according to the value of MMD.where is the reproducing kernel Hilbert space (RKHS) endowed with a characteristic kernel . Here, denotes some feature map to map the original samples to RKHS and the kernel means ,where represents inner product of vectors. The main theoretical result is that if and only if . In practice, an estimate of the MMD compress the square distance between the empirical kernel embedding is obtained, that is, the MMD loss of the client.

Equation (8) defines the difference estimation between single source domain and target domain. Therefore, we can give the MMD loss on the server side.

Classifier alignment loss -The target samples near the class boundary are more likely to be misclassified by the classifiers learned from the source samples. The classifier is trained in different source domains, so there may be differences in the prediction of target samples, especially near the class boundary. Intuitively, the same target samples predicted by different classifiers should get the same prediction. Therefore, we implement the classifier alignment strategy between domains on the server. Inspired by reference [17], we define the server side and the classifier alignment strategy as follows:

The final objective function is

In summary, the specific process steps of MMTLDMC algorithm are shown in Algorithm 1, is the model parameter.

Sever:
(1)initialize with the BN pruning model
(2)for each round
(3) for each client
(4)  
(5) end for
(6) calculate
(7)
(8) Update , by minimizing
(9)end for
Client: Run on client
(1)Update by sever
(2)for each local epoch do
(3) Calculate
(4)end for
(5)return

4. Experimental Results

In order to test the effectiveness and generalization of the MMTLDMC algorithm, we test it on two types of image datasets which are shown in Tables 1 and 2. The first type is a digital classification dataset including SVHN[36] dataset, USPS[37] dataset, and MNIST[38] dataset; the second category is image classification dataset including Office31[39] dataset and Caltech [40] dataset; the third category is network dataset named DomainNet [16].

The experiment will compare the multisource transfer learning algorithms DCTN, MFSAN, and MultiSource TrAdaBoost and give the results of the experiment under different pruning rates. For the fairness of the experiments, a 5-fold cross-validation strategy was selected for all experiments, and the experimental results of repeating this strategy twice were used as the final comparison results. In the experiment, we use the average classification accuracy [41] and recall rate of each algorithm after running 10 times as the evaluation criteria. The recall rate reflects how many positive examples in the sample are predicted correctly. The form of expression of classification accuracy and recall are defined as follows:

Classification accuracy:

Recall rate:

Among them, TP represents the number of positive samples that are correctly classified as positive, FP represents the number of negative samples that are incorrectly classified as positive, TN represents the number of negative samples that are correctly classified as negative, and FN represents the number of positive samples that are incorrectly classified as negative.

X represents the target domain number test dataset, is the sample -class label predicted by the classifier, and is the reality-class label of the sample .

4.1. Digital Classification Dataset
4.1.1. Dataset Introduction

Both the USPS dataset and the MNIST dataset contain handwritten digits 0″–″9″, the former is composed of 9298 16 × 16 images, and the latter is composed of 70,000 28 × 28 images. Street View House Number (SVHN) comes from Google. Each picture contains a group of Arabic numerals’ 0–9′, which contains 73257 digits and the image pixel is 32 × 32. Figure 3 shows examples of USPS, MNIST, and SVHN. We can see that the distributions of USPS and MNIST are different, but they contribute the same feature space. SVHN datasets are different from their distribution and feature space. We extract 9000 images from MNIST and SVHN as two domains. Because USPS has only 9298 pictures, we regard the whole dataset as a domain. Due to the limitation of the mobile terminal, 3000 images are extracted from MNIS, SVHN, and USPS as a domain, respectively.

4.1.2. Experimental Data

In this part, we compare some multisource transfer learning algorithms such as DCTN and M3SDA MultiSource TrAdaBoost with our algorithm MMTLDMC.

It can be seen from Table 3 that in the three cross-domain tasks, the MMTLDMC algorithm has an accuracy of 79.87%, 97.68%, and 95.49% when the pruning rate is 90%, which is higher than the comparison algorithms DCTN, MultiSource TrAdaBoost, and M3SDA. Compared with the data without pruning, in the tasks U.M- > S, U.S- > M, M.S- > U, the accuracy of our algorithm only drops by 1.36%, 1.45%, and 1.46%. However, when the pruning rate of the MMTLDMC algorithm is 95%, the accuracy dropped sharply. Because our algorithm runs on the mobile terminal, the pruning of 30% is not considered for the algorithm MMTLDMC.

4.2. Image Classification Dataset
4.2.1. Dataset Introduction

The Office-31 dataset is a commonly used standard transfer learning dataset. It contains 4652 sample pictures collected from different areas named Amazon (A), Webcam (W), and DSLR (D), these pictures can be divided into 31 categories. Among them, Amazon’s samples are from https://amazon.com, and the samples in Webcam and DSLR are obtained through web cameras and digital SLR cameras in different environments. Caltech-256 [40] is a standard database for object recognition. The database has 30607 images and 256 categories. In these experiments, we used the dataset as office-31+Caltech published by Gong [39] et al., as shown in Figure 4. Specifically, we have four domains, C(caltech-256), A (Amazon), W (webcam), and D (DSLR). We randomly select three domains as the source domain and the remaining one as the target domain, that is, (A, C, D- > W), (A, C, W- > D), (A, D, W- > C), (C, D, W- > A).

4.2.2. Experimental Data

In this section, we compare MMTLDMC with multisource transfer learning algorithms such as DCTN, M3SDA, and MultiSource TrAdaBoost.

It can be seen from Table 4 that among the four cross-domain tasks, when the pruning rate of the MMTLDMC algorithm is 90%, the accuracy rates are 90.77%, 98.48%, 98.52%, and 94.13%, which are higher than the comparative algorithms DCTN and MultiSource TrAdaBoost. At the same time, under the tasks of A.W.C- > D and W.D.C- > A, the accuracy only drops by 0.98%. However, when the pruning rate of the MMTLDMC algorithm is 95%, the accuracy dropped sharply. Because our algorithm runs on the mobile terminal, the pruning of 30% is not considered for the algorithm MMTLDMC.

4.3. Effect of the Pruning Rate on Computation Time

In order to demonstrate the advantages of the influence of our algorithm category, we choose the network dataset DomainNet proposed by the literature [14] (as shown in Figure 5). We randomly sample 20 classes from each domain, 3000 data as our training data.

4.3.1. Experimental Data

(1) Effect of the Pruning Rate on Computation Time and Iterations. As can be seen from Figure 6, the algorithm MMTLDMC is under the dataset DomainNet:a) The model calculation time will gradually decrease with the increase of the pruning rate. (b) The computational accuracy of the model will gradually decrease with the increase of the pruning rate. (c) From the figure, we can clearly see that when the pruning rate is 94%, the calculation accuracy will drop significantly. (d) When the pruning rate reaches 90%, the accuracy generally only drops by 2–4%. (e) When the pruning rate reaches 80%, the model calculation time will stabilize.

(2) Influence of Iterations. Figure 7 shows the effect of the number of iterations on the accuracy. (a) When the number of iterations exceeds 1000, the accuracy of the algorithm tends to be stable. (b) At the same time, MMTLDMC has better results.

5. Conclusion

In this paper, aiming at the problem of multisource transfer learning for mobile computing, a multisource mobile transfer learning algorithm based on dynamic model compression is proposed. This method combines the advantages of mobile computing and multisource learning to complete the image classification on the mobile terminal. This method first performs BN pruning on the pretrained network, and then calculates the classification loss and MMD loss of the multisource transfer model on the mobile side, and calculates the losses of different classifiers on the server side. Finally, the parameters of the minimized objective function are synchronized to the client. The experimental results on the SVHN, USPS, MNIST, Office31, Caltech, and DomainNet show that MMTLDMC outperforms the benchmark algorithms in both classification accuracy and training efficiency. Although the experimental results show that the MMTLDMC algorithm is better than the benchmark algorithm, further research is still needed in the following aspects: through data encryption, partial data sharing, and dynamic adjustment of the BN pruning strategy on the server side to optimize the model classification accuracy; continuing to reduce the number of parameters to achieve faster client computing.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has been supported by China’s national key Research and Development plan (2016YFB0801004). This work has been supported by Science and Technology Major Special Project of Heilongjiang (CN) (2020ZX14A02).