Abstract
Mobile edge computing (MEC) has been envisaged as one of the most promising technologies in the fifth generation (5G) mobile networks. It allows mobile devices to offload their computationdemanding and latencycritical tasks to the resourcerich MEC servers. Accordingly, MEC can significantly improve the latency performance and reduce energy consumption for mobile devices. Nonetheless, privacy leakage may occur during the task offloading process. Most existing works ignored these issues or just investigated the systemlevel solution for MEC. Privacyaware and userlevel task offloading optimization problems receive much less attention. In order to tackle these challenges, a privacypreserving and devicemanaged task offloading scheme is proposed in this paper for MEC. This scheme can achieve nearoptimal latency and energy performance while protecting the location privacy and usage pattern privacy of users. Firstly, we formulate the joint optimization problem of task offloading and privacy preservation as a semiparametric contextual multiarmed bandit (MAB) problem, which has a relaxed reward model. Then, we propose a privacyaware online task offloading (PAOTO) algorithm based on the transformed Thompson sampling (TS) architecture, through which we can (1) receive the best possible delay and energy consumption performance, (2) achieve the goal of preserving privacy, and (3) obtain an online devicemanaged task offloading policy without requiring any systemlevel information. Simulation results demonstrate that the proposed scheme outperforms the existing methods in terms of minimizing the system cost and preserving the privacy of users.
1. Introduction
In the recent years, with the advent of 5G network, as well as the fast popularization of mobile devices, a myriad of new applications is emerging, such as augmented reality (AR)/virtual reality (VR) [1, 2], online 3D games [3], and connected cars [4]. Specifically, the recent Cisco Annual Internet Report expects that the number of global mobile devices will grow from 8.8 billion in 2018 to 13.1 billion by 2023 and the vast majority of mobile data traffic (99%) will originate from these mobile devices [5]. However, due to their limited computing units and battery energy, mobile devices struggle to resist to such traffic explosion and become unable to meet the stringent requirements of computingdemanding and latencysensitive applications.
To get rid of such limitations, a novel paradigm of mobile edge computing (MEC) [6] is proposed as an extension of remotecentralized clouds [7] by the European Telecommunications Standards Institute (ETSI). The key idea beneath MEC is to deploy computing and storage resources from the core network to the radio access network (RAN) in the fifth generation (5G) networks [8]. In such computing paradigm, computation tasks will be offloaded to nearby MEC servers via wireless channels by mobile devices, which can meet the requirements of computing intensive applications and achieve ultrashort processing latency.
Despite the benefits, MEC still has shortcomings in terms of security and privacy leakage [9]. For example, the location privacy and usage pattern privacy problem [10] are investigated in this paper, which are related to the MEC task offloading feature. Intuitively, when a mobile device is to obtain optimal offloading performance, it tends to offload all its tasks to the MEC server. Accordingly, an honestbutcurious MEC server can infer the location privacy and usage pattern privacy of users who are privacy sensitive, which may prevent these users from accessing the MEC system if not properly addressed. Although these two privacy issues have been extensively studied in other fields, one challenge still needs to be addressed in MEC systems, which is how to protect both the location privacy and usage pattern privacy while minimizing the delay and energy consumption cost.
Most existing task offloading schemes that want to achieve optimal system performance, such as [11, 12], largely ignore these privacy problems. And current privacypreserving techniques of cloud computing are not always applicable for the MEC system, such as the works in [13, 14]. Therefore, the more challenging problem is how to prevent unintentional leakage of user’s privacy while still maintaining the optimal delay and energy consumption performance. The most related works probably are [10, 15], which studied the optimization of delay and energy consumption cost while considering both location privacy and usage pattern privacy. The former scheme formulates this problem as a constrained Markov decision process (CMDP), and the latter one applies a DyanQ architecture based on the CMDP to achieve a better privacyaware offloading policy. However, both of them are systemlevel solutions. They rely on the assumption of the wireless channel power gain that is formulated as a Markov chain model, in which some systemside information should be known in advance. Such assumption is not relevant to infrastructurefree scenarios, such as individual combat in military scenarios, forest fire rescue [16], and heterogeneous IoT [17] which are more applicable userlevel schemes.
In order to minimize system cost (e.g., latency and energy consumption) and protect user’s privacy without requiring any systemlevel information as a prior knowledge, we propose a devicelevel and privacypreserving task offloading scheme for the MEC system. This scheme is based on a semiparametric contextual multiarmed bandit (MAB) problem, which can address the tradeoffs inherent in the sequential decision problem and overcome the challenges of lacking systemside information. To the best of our knowledge, this userlevel scheme is the first to be proposed to solve the privacy problem of MEC. The location privacy and usage pattern privacy of users will be preserved in this paper. An onlinelearning algorithm will be proposed to make adaptive task offloading decisions under dynamic network environment. The main contributions of this paper are summarized as follows: (1)MABbased problem modelling. We study a joint optimization problem of task offloading and privacy preservation in the MEC system. And then, this problem is transformed as a semiparametric contextual MAB problem to overcome the challenge of unknown network dynamics, which can utilize the contextual feature vector to describe userside information for decoupling the time dependency(2)Privacyaware optimal offloading decision. We propose a privacyaware online learning algorithm, called PAOTO (privacyaware online task offloading), to make devicelevel task offloading decisions while protecting user’s location privacy and usage pattern privacy. By utilizing the transformed Thompson sampling (TS) architecture, we can make adaptive task offloading decisions at the userside perspective(3)Extensive simulationbased performance evaluation. We carry out simulations to demonstrate the effectiveness of the proposed algorithm. The results show that the PAOTO algorithm performs closetooptimal and far better than the newly proposed DynaQ algorithm in [15].
The remaining parts of this paper are organized as follows. In Section 2, we discuss the motivation and related works. We will formally describe the system model and problem formulation in Section 3. Next, we present the algorithm design and simulation evaluation in Sections 4 and 5, respectively. Finally, we draw some conclusions and highlight the direction for future work in Section 6.
2. Motivation and Related Work
2.1. Motivation
The problem of determining the privacyaware and userlevel task offloading decisions for mobile devices requires solving two important challenges: (1) how to best prevent leakage of user’s privacy while still maintaining the optimal delay and energy consumption performance and (2) how to design a task offloading policy that can online determine the optimal execution platform (i.e., local processing unit, MEC servers, or buffer) for users at the userside perspective?
To address the first challenge, we propose a privacy metric to jointly quantify the location and usage pattern privacy and utilize a semiparametric MAB to incorporate the privacy metric into the performance model. This can strike a balance between the privacypreserving level and system cost (e.g., processing latency and energy consumption cost). Previous works require systemlevel information to design an optimal task offloading strategy, but this is not applicable to infrastructurefree scenarios (e.g., individual combat in military scenarios, forest fire rescue and heterogeneous IoT). We address this second challenge by utilizing the contextual feature vector in the contextual MAB model to describe userside information and applying the Thompson sampling (TS) algorithm to estimate and learn the performance model based on the contextual information.
2.1.1. Characterizing Privacy Metric
Recently, MEC has been increasing in popularity but issues relating to the security and privacy in the MEC system still has shortcomings. On one hand, some security issues such as authentication, private data storage, and intrusion detection have received attentions but these security issues are inherited from the conventional cloud computing framework and are less relevant to the key technologies in the MEC system. On the other hand, based on the simulation results and considered setting in [10], we can find that the privacy problems relating to MEC unique wireless task offloading technology remains less explored, which are user’s location privacy and usage pattern privacy.
According to [10], the offloading pattern can be observed as a mobile device may offload all its tasks to the MEC server when the wireless channel state is good, while it will not offload any tasks otherwise. Accordingly, the honest but curious MEC server (it may be controlled by adversary) can be based on the offloading pattern and historical statistics to obtain the number of tasks offloaded in each period. Hence, the wireless channel condition (it is only assumed as good or bad and can be extended to the multistate case) and user’s actual usage pattern can be inferred by adversary. Specifically, the wireless channel condition is highly related to the distance between the user and the MEC server. If a mobile user communicates with multiple MEC servers, the location privacy may be inferred by these MEC servers based on the surveillance of the wireless channel state. The user’s location privacy is leaked. Moreover, when someone’s office is near the AP, its wireless channel state may be always good and it may always offload all its tasks to the MEC server. Thus, the adversary can obtain the total number of tasks offloaded that is determined by the user’s actual device usage pattern. The user’s usage pattern privacy may be leaked.
Particularly, it is very important for privacysensitive users to solve the problem of leaking location privacy and usage pattern privacy that are induced into the unique wireless task offloading feature in the MEC system. If they are note properly addressed, it may prevent these privacysensitive users from accessing the MEC system. Significantly, although these two privacy problems have already been studied in other system, protecting user’s location privacy and usage pattern privacy while minimizing delay and energy consumption cost in the MEC system still poses a critical challenge.
Therefore, to address this challenge, it is desirable to design a metric to jointly quantify the location and usage pattern privacy. Next, we formulate the task offloading and privacy preservation problem as a contextual MAB problem with a semiparametric reward model based on processing latency, energy consumption cost, and this privacy metric. This is aiming to strike a balance between the privacypreserving level and system cost. That is, according to problem formulation and proposed algorithm, we can obtain the optimal delay and energy consumption performance while protecting user’s location and usage pattern privacy, which can be seen in Sections 3 and 4.
2.1.2. UserLevel Task Offloading
With mobile data traffic growing explosively, the mobile devices with limited resources cannot meet the stringent requirements of computingdemanding and latencysensitive applications. Therefore, designing a desirable task offloading strategy of the MEC system has attracted tremendous attention in the industry and academia. This strategy can determine the optimal task execution platform for the user, executing in the local processing unit, offloading to MEC server or queueing in the buffer.
Many previous works (e.g., [10, 15]) on task offloading generally assume that the systemside information is always available. Such assumption is more applicable to the infrastructureassisted edge computing scenarios where the infrastructure (e.g., an access point or base station) is available for obtaining systemside information in advance [18]. However, some infrastructurefree scenarios, such as individual combat in military scenarios, forest fire rescue and heterogeneous IoT, are not suitable for previous systemlevel solutions, because these mobile devices in infrastructurefree scenarios are operating in a scattered manner and the systemside network information is missing for them. Especially, if they want to explore systemlevel information in advance, it may cause additional system cost, such as scarce bandwidth usage and additional energy consumption cost.
In this case, it is desirable to design a userlevel task offloading strategy for overcoming the challenge of lacking the systemside information. In response to the challenge that the systemlevel information may not be readily available in some infrastructurefree scenarios, we propose an online task offloading scheme at the user perspective. In this scheme, the userside information will be described as contextual feature vector and the Thompson sampling (TS) algorithm will be applied to estimate and learn the performance model based on the contextual information. It can adaptively decide where to execute the offloaded task for the mobile user without any system level information. It can be seen in Section 4 for details.
2.2. Related Work
In recent years, the task offloading strategies have attracted significant efforts to minimize total delay and energy consumption cost in MEC systems. For example, Xu et al. proposed an online algorithm based on Lyapunov optimization and Gibbs sampling, which jointly optimized dynamic service caching and task offloading to reduce computation latency while keeping energy consumption low [19]. Wei et al. studied the problem of task offloading and channel resource allocation based on MEC in 5G ultradense networks (UDN) [20]. The authors formulated task offloading as an integer nonlinear programming problem and proposed an efficient task offloading and channel resource allocation scheme based on differential evolution algorithm. Dab et al. proposed a joint radio resource allocation and task assignment strategy based on a Qlearning algorithm to minimize the energy consumption cost under both the latency and device’s computation resource constraints [21]. Li and Cai discussed the incentive mechanism design for collaborative task offloading in the MEC network [22]. They proposed an online truthful mechanism integrating computation and communication resource allocation to address social welfare maximization problem by considering each task’s specific requirements in terms of data size, delay, and preference. However, none of the works mentioned above considered user’s privacy issues.
There are a few works considering both task offloading and privacy preservation. For example, He et al. identified a new privacy vulnerability caused by the wireless offloading feature of MECenabled IoT. To address this vulnerability, the authors developed an offloading strategy for MECenabled IoT, which can learn a good offloading strategy while protecting the devices’ location privacy [23]. However, the extra prior information was required. In [24], Zhang et al. proposed a strategy that can achieve an efficient task scheduling policy on edge while ensuring privacy. In [25], Zhou proposed a novel contextaware task allocation framework for mobile crowdsensing in the scenario of edge computing. The task allocation was performed in both the cloud computing layer and the edge computing layer. In the cloud layer, authors proposed a privacypreserving and contextual online learning algorithm to manage the participants’ reputation. But this scheme was implemented at the system level and required a priori network information.
Besides, He et al. identified location privacy and usage pattern privacy issues, which are induced by the wireless task offloading feature of MEC [10]. To address these privacy issues, authors proposed a constrained Markov decision process (CMDP) based privacyaware task offloading scheduling algorithm to achieve the best possible system performance while protecting user’s privacy. Min and Wan proposed a reinforcement learning (RL) based privacyaware offloading scheme, which enables the IoT device to make the task offloading decisions and protect both the user location privacy and the usage pattern privacy for the MEC system [15]. Nevertheless, both the works were implemented at the system level. They all need to explore systemlevel information in advance, which is difficult to obtain and may cause additional system cost, such as scarce bandwidth usage and energy consumption of network devices.
In general, none of the aforementioned works consider both task offloading and privacy protection problem at the user level. These aforementioned studies mainly face two challenges. First, they only consider the simple task offloading strategies for minimizing total delay and energy consumption cost in MEC systems. However, the privacy issues related to the task offloading pattern were ignored in their works, which may be very important for the privacysensitive users. Second, some works considering both task offloading and privacy preservation were all implemented at the system level. That is, these works generally assume that the systemside information is always available. However, this is applicable for the infrastructureassisted scenarios where the infrastructure (e.g., an access point (AP)) is available for obtaining systemside information in advance. For the infrastructurefree scenarios (e.g., individual combat in military scenarios, forest fire rescue, and heterogeneous IoT), these mobile devices will operate in a decentralized manner and the systemside information is difficult to obtain and may cause additional system cost.
To conquer this challenge, we propose a novel privacyaware task offloading scheme based on an online learning algorithm that just requires devicelevel information and it can achieve the best possible system performance while protecting the user’s privacy.
3. System Model and Problem Formulation
3.1. System Model
In this section, the task offloading model will be presented. As illustrated in Figure 1, we consider a scenario in which the mobile user/device communicates with the MEC server through the access point (e.g., WiFi or 5G base station) via the wireless channel. For ease of exposition, the bandwidth constraint of the wireless channel is not considered in this paper and we will consider it in the next work. The mobile device has computingintensive computation tasks that are required to be completed as soon as possible. Due to its limited battery energy and computing capabilities, the mobile device can offload some computation tasks to the MEC server, which has powerful computing capabilities. As such, the mobile device has three ways to process these computation tasks, that is, computing in the local processing unit, offloading to the MEC server through the transmission unit, and queuing in the buffer for processing in the next time slot.
Without loss of generality, we assume that the task offloading policies are made in a slotted structure and its timeline is discretized into time slots . At each time slot , the mobile user will newly generate computation tasks to the mobile device, denoted by a set of ( is the maximum possible number of generated tasks), which depends on the user’s usage pattern. And the (with the maximum buffer size ) can be denoted as the number of tasks in the buffer at time slot .
A widely used threeparameter model [26] can be used to describe each task , denoted by a set of . The threeparameter model consists of the input data size (bits), computation intensity (CPU cycles/bit), and maximum allowed latency (seconds). Whereupon, the computation demand for each task can be obtained by (CPU cycles). In each time slot , all the () tasks (including the newly generated tasks and the tasks in the buffer) will be either locally executed, buffered, or remotely offloaded according to the proposed task offloading policy. More specifically, the mobile device will explore the optimal offloading policy for each task.
Based on [10], in order to minimize computing delay and energy consumption, the mobile device tends to offload all its tasks to the MEC server if the wireless channel state is good and processes all its tasks locally if the wireless channel state is bad. Under such circumstances, the user’s location and usage pattern privacy are easily spied by the attacker. Hence, the proposed task offloading policy in this paper takes the privacy preservation into account. And the wireless channel power gain will not be assumed as the Markov model, which allows the proposed task offloading policy to be executed on the device level. More explicitly, the mobile device can only observe its local information (e.g., the number of tasks and the computation demand of each task) but the systemside information is not observable. Key parameter notations in this paper are listed in Table 1 for ease of reference.
3.2. Problem Formulation
We focus on privacyaware and userlevel task offloading optimization problems in this paper. In this section, we firstly formulate the task offloading decision making and then the model system cost (including processing latency and energy consumption cost) and privacy level as the performance metrics. Finally, the objective function will be presented.
3.2.1. Task Offloading Decision Making
To maintain satisfactory quality of service, an available and reliable task offloading policy should be considered. And the tasks can be dynamically offloaded to the three different positions by the mobile device, denoted by , where , , and represent the local processing unit, MEC server, and buffer, respectively. At each time slot , the mobile device (also called the earner or operator) makes the task offloading decision for each task . Here, we design a binary indicator to denote the dynamic task offloading decision variable; let if the task is offloaded to platform at time slot and otherwise. Note that at a given time slot , each task can be offloaded to only one execution platform (, , or ). We have the following constraints for :
Equation (1) indicates that whether offloading the task in platform in the time slot . Equation (2) indicates that only one of , , and for task in the time slot can be nonzero. Equation (3) indicates that the () tasks will be offloaded to execution platform in the time slot . Based on the above definition, the system cost model (including processing latency and energy consumption) and privacy model will be further described.
3.2.2. System Cost Model
Similar to [27, 28], we consider a system cost that accounts for processing latency and energy consumption cost, which are associated with the task offloading. They depend on both the tasks and the processing platforms where the tasks are computed.
(1) Processing Latency. In this paper, total processing latency consists of three parts, i.e., queuing delay in buffer, computing delay in either the local processing unit or the MEC server. For ease of exposition, we assume that the queuing delay in the buffer can be converted to computing delay and the buffer can be treated as a microprocessor, which has much lower computing capability than the local processing unit. In our system, each task will be offloaded to execution platform by the mobile device in time slot . We use to denote the available computing capability (i.e., CPU cycles per second) of execution platform for task processing at time slot . Then, the processing delay of each task can be expressed as follows: where is the computation demand of each task at time slot . Therefore, given the task offloading decision , the total processing latency required to process () tasks within time slot can be further expressed as follows:
(2) Energy Consumption. Task offloading will consume the energy of the mobile device, whose battery storage capacity is rather limited. Thereby, further investigation on how to minimize the total energy consumption of the mobile device is one of the objectives of this paper. The energy consumption cost of the mobile device may include the CPU cycles, transmitting energy and electric energy. They are associated with the tasks executed in the local processing unit, offloaded to the MEC server, and queued in the buffer. To better characterize these energy consumption costs, we let the be the energy consumption in time slot for offloading task to execution platform (). Thus, when considering the task offloading decision , the overall energy consumption at time slot can be expressed as follows:
3.2.3. Privacy Model
As more and more people enjoy the benefits of MEC, the location privacy and usage pattern privacy of MEC have become a major concern. According to the simulation results and considered setting in [10], we can observe the offloading pattern that the mobile device may offload all its tasks to the MEC server when the wireless channel state is good, while it will not offload any tasks otherwise. For simplicity, it is assumed that the wireless channel states are only good and bad in this work. It can be extended to the multistate case. The wireless channel gain is highly related to the distance between the user and the MEC server. Thus, the honestbutcurious MEC server (it may be controlled by adversary) can infer not only the wireless channel state but also the distance to the mobile device based on the offloading pattern and historical statistics.
Accordingly, when the mobile device communicates with multiple MEC servers, its location information may be jointly inferred by these MEC servers. Besides, if a mobile device always maintains a good channel state (e.g., its office near the base station), it will always offload all its tasks to the MEC server. The total number of tasks is highly related to the user’s usage pattern (i.e., user’s app running if a certain pattern exists in the number of tasks generated by the app), which may be very important for the privacysensitive users. Hence, the MEC server may be able to infer the personal information of the user through monitoring the total number of offloading tasks and analyzing the historical statistics.
Hence, from the privacy perspective, we propose a metric to jointly quantify the location and usage pattern privacy and strike a balance between the privacypreserving level and system cost. Firstly, the total number of tasks offloaded to the MEC server at the end of time slot is defined as and we have
Then, the privacy metric of can be obtained by where the represents the indicator function that equals 1 if the statement is true and 0 otherwise; indicates the difference between and , and it has ; and are the weighting factors reflecting the importance of the location privacy over the usage pattern privacy in different situations; denotes the metric of usage pattern privacy, which is the number of dummy tasks. The dummy tasks may sacrifice some system performance but will increase the privacy level, and the proposed algorithm will balance them.
The first term of equation (8) represents that if the mobile device offloaded all its tasks to the MEC server (), in order to protect the usage pattern privacy, it will continue to offload dummy tasks to the MEC server to confuse the attacker. As such, the attacker cannot pinpoint the number of tasks actually generated by the user. According to the second term of equation (8), there are two situations correspond to the . In the first situation, denotes that the tasks either queued in the buffer or processed locally otherwise. In order to protect the location privacy, the mobile device needs to offload tasks (which is queuing in the buffer, ) to the MEC server for preventing the attacker from inferring the wireless channel status. In the second situation of , some tasks are offloaded to the MEC server () and the privacy level can be achieved by . It denotes the importance of the location privacy over the usage pattern privacy , which will increase as decreases.
3.2.4. Objective Function
In order to achieve a desirable tradeoff between the system cost (i.e., computing delay and energy consumption) and the user’s privacy level, we design different weights , , and to indicate the different preference device. These weights also can convert the privacy level and system cost into the same dimension. Thus, the objective of this paper is to achieve robust minimization of a weighted sum of the privacy level and system cost for the mobile device. Based on [29], given a finite time horizon , the problem can be formulated as
From the mobile device perspective, it is difficult for them to explore the systemwide information (e.g., the wireless channel states and resource availability) in advance and it may need extremely expensive energy cost. Therefore, devising a devicelevel adaptive privacypreserving task offloading policy is highly desirable, in which the future systemlevel information will not be needed.
4. Algorithm Design
In this section, we focus on the privacy preserving task offloading problem in the MECenabled network and propose a devicelevel privacyaware online learning scheme to minimize the objective in equation (9) for the mobile device without knowing the systemside information.
Firstly, we transform the informationconstrained multiobjective optimization problem to a contextual multiarmed bandit (MAB) problem [30] with a semiparametric reward model. Then, we propose a privacyaware online task offloading (PAOTO) algorithm which can accommodate the network dynamics at the device level and learn the optimal offloading policy for the mobile device while maintaining the user’s privacy.
4.1. Problem Transformation
In this work, we focus on the devicelevel and privacyaware task offloading problem, which is a typical sequential decision problem. For decoupling the time dependency, we formulate this problem as a contextual MAB problem with a relaxed, semiparametric reward model in [30]. It is an extended version of the conventional contextual MAB that has a linear reward model [31]. Both versions can utilize the contextual feature vector to indicate the useside information for overcoming the challenges of lacking future system information. However, why we use the contextual MAB with a relaxed, semiparametric reward model is that the privacy metric in our model is difficult to formulate as a linear reward model. The semiparametric reward model can provide a more relaxed reward model, and this proof can be found in the literature [30].
Accordingly, in order to learn the network dynamics and take the privacy protection into consideration, the problem in this paper can be transformed as a semiparametric contextual MAB problem [13], which can address the tradeoffs inherent in the sequential decision problem and has a relaxed, semiparametric reward model. This model can be described as where is the received cost of offloading task to execution platform ; is a nonparametric component; is a current contextual feature vector; is a fixed but unknown underlying expectation of the feature vector ; the is the union of historical information and . Furthermore, it has assumptions about the upper bound of some parameters, which is , and denotes the norm.
When the computation tasks arrive, the task offloading decision can be executed for each task by the mobile device. Nonetheless, only the deviceside status information can observable, which can be described as a contextual feature vector for arriving tasks. More specifically, denotes the computation demand vector of tasks. The first values of are corresponding computation demand of each task , and the remaining values are 0; is a transition vector, which denotes the number of tasks in time slot . The first values of are 1, and the remainder are 0. According to the system cost (including processing latency and energy consumption cost) defined in Section 3, we transform them as a feature vector to better learn the network uncertainty and resource availability, which is related to . Besides, the privacy level in this task offloading policy will be formulated as the aggregated nonparametric component based on the reward model of contextual MAB in equation (10). The reason of this is that it cannot be directly formulated as a linear component like other metrics (such as computing delay and energy consumption). We assume that it can be calculated when all task decisions are completed at the end of .
Hereinafter, we define , as the historical observations until , where represents the set of actions for all tasks at time slot and denotes the total received cost at time slot . And the can be denoted as the union of historical information and the current contextual feature vector . Given that , we assume that the expectation of the total received cost can be decomposed into a timeinvariant linear component (associated with processing delay and energy consumption cost) and a nonparametric component (associated with the privacypreserving level). Therefore, according to equation (10), we have
The task offloading scheme needs to select an execution platform (or called an arm at MAB) for every task at time slot . Specifically, we let denote the choice for every task and let the optimal action to be based on equation (10). Additionally, it must be noted that the nonparametric component in equation (11) depends on time and historical information, but not on the current action [30]. Hence, the optimal received cost of each task can be obtained by minimum and we can achieve the optimal offloading decision of each task by . Indeed, the privacy level will have an impact on the aggregated received cost of all tasks at the end of time slot and this aggregated received cost will be used to update the contextual feature vector for the next interval .
Beyond that, the regret at time slot is defined as the difference between the average cost of the optimal choices and the universal choices for all tasks and it does not depend on either. Hence, the regret can be expressed as
Moreover, given a finite time horizon , the total regret can be described as
This regret is used to evaluate the effectiveness of task offloading decision making based on the online learning of systemlevel information.
4.2. PrivacyAware Online Task Offloading Algorithm
In order to minimize the total system cost (e.g., delay and energy consumption cost) and protect user’s privacy without exploring any systemlevel information, a novel PAOTO algorithm is proposed in this work. In particular, this algorithm keeps the framework of the Thompson sampling (TS) with a semiparameter reward model [32]. Its key idea is to estimate and learn the device’s performance by selecting different actions over time based on the contextual information. At the same time, the privacy metric can be abstracted into the semiparametric reward model.
In the proposed PAOTO algorithm, the mobile device will learn the network information while executing the task offloading policy. As time goes by, the mobile device learns abundant information and it can estimate how to offload these tasks for achieving the optimal system cost and privacypreserving level. According to the aforementioned MAB transformation of our problem, it is known that the optimal offloading decision and the received cost of each task mainly depend on the current contextual feature vector and the fixed but unknown feature vector . Through the previous trial and error, the underlying relationship between the feature vectors and received cost will be learned by the mobile device. The is related to the privacy metric , and it can be achieved after all tasks are offloaded at the end of time slot . Hence, we let denote the estimate of the feature vector and represent the cumulative contextual vector. The estimate of feature vector and the cumulative contextual vector can be denoted as where ; is a dimensional identity matrix, where .
For ease of exposition, the in can be denoted as and it can be calculated as where is the probability of offloading task to the th execution platform at time .
Besides, we can calculate the covariance as follows:
Accordingly, the mobile device can continuously explore and then gather the relationship between the feature vector of each task and the system cost of the chosen execution platform. Then, it also measures the corresponding privacy level to estimate which execution platform is likely to give the minimum system cost while maintaining a good privacy level.
In this paper, the TSbased online learning algorithm will be applied to learn the underlying relation between the feature vector and received cost. Hence, we should construct a distributional likelihood function to sample the estimated cost. Firstly, the standard deviation of the estimated cost can be defined as and the standard deviation of the sampling cost can be denoted as , where is a control parameter.
According to Bayes’ theorem , we have:
Based on the TS algorithm in [31], if the prior for received cost at time slot is given by , it is easy to compute the posterior distribution at time slot , i.e., (details of this computation can be seen in Appendix A.1 of [31]). Hence, at every time step , we can use this Gaussian likelihood function to sample the cost for offloaded task at execution platform in our algorithm. Then, the sampling cost will be used to estimate the performance of offloaded task at execution platform and finally the execution platform that has minimum .
Hence, guided by the problem transformation and key vectors mentioned above, we introduce the PAOTO algorithm in Algorithm 1.

Algorithm 1 gives the details of exploring the optimal solution that can make an adaptive task offloading decision and preserve the privacy of users. It estimates the offloading cost of each task based on context information and performance feature vector and selects the best offloading action based on the minimum received cost . At the same time, it calculates the offloading probability to fit the MAB problem with a semiparametric reward model for privacy preservation. At the end, it utilizes total received cost to update cumulative contextual vector and cumulative contextual system cost corresponding to the decisions vector of all tasks at every time slot .
5. Simulation Results
In this section, extensive simulations are conducted to evaluate the performance of the proposed PAOTO algorithm under different scenarios. We build our simulations in Python 3.6. The implementations are conducted on a Lenovo desktop PC equipped with Intel(R) core (TM) i74500U CPU @1.80 GHz processor and 12.0 GB (11.7 GB available) RAM. The simulation settings, algorithm benchmarks, and performance evaluation are elaborated below.
5.1. Simulation Settings
In our simulation environment, we consider a MEC system, in which the access point is deployed with the MEC server. The computing tasks are randomly generated by the mobile device at every time slot , where the maximum of is in the range from 10 to 60. The maximum buffer capacity of the mobile device can be set to 10. The length of each time slot is 1 s. The total computation capacity for MEC server is uniformly distributed in (10, 15) GHz. In order to accommodate the dynamics, we assume that the computation capacity of the mobile device is determined randomly from 1 to 3 GHz and the converted computation capacity of buffer is uniformly distributed in (0.1, 0.15) GHz. Based on [33], the data size of each task is distributed in (300 K, 800 K) bits and the computation intensity is taken randomly within (250, 1000) CPU cycles/bit. Thus, we can get the required CPU cycles of a computing task by (CPU cycles). Besides, the energy consumption for transmitting one task to the MEC server is uniformly distributed in (0.1, 0.5) J and the mobile device consumes 0.8 J to 3 J to locally compute one task and 0.5 J to 1 J to buffer one task. The weights of processing delay, energy consumption, and privacy metric, which are , and ,respectively, can be dynamically set by the users according to the user’s preferences and the running application demands. In our simulation, we set them to 1, 1, and 10, respectively. The control parameter of the PAOTO algorithm is usually set to 1.
5.2. Benchmarks
The simulations are carried out based on the above setting. In order to better manifest the advantages and effectiveness of the proposed algorithm, two typical benchmarks are implemented for comparison with the PAOTO algorithm, which are presented as follows: (1)Random offloading algorithm: the random offloading algorithm is chosen as one of the baselines, which will arrange the offloading in a random way. This is the method for the resourceconstrained mobile device to decentralize computing tasks. However, it does consider the privacy preservation and performance optimization. The purpose of this benchmark is to evaluate the necessity of the proposed algorithm(2)DynaQ algorithm: we implemented the DynaQ algorithm as one of the benchmarks in our simulations. The implementation details may be slightly different from that of [15], but the main framework is the same. The DynaQ in [15] is a reinforcement learning (RL) based privacyaware offloading scheme. It is an improvement of the Qlearning method, combining the modelindependent and modeldependent methods. But, it requires more systemlevel information (e.g., assumption of the Markov model) than the proposed algorithm. As the most stateoftheart and relevant scheme to our works, the implementation of DynaQ can bring more reliable performance guarantees for our algorithm evaluation(3)No privacy scenario: the scenario that does not consider privacy protection is also used as one of our baselines. According to [10], when mobile devices do not consider privacy protection but focuses solely on optimizing delay and energy consumption, the optimal latency and energy consumption performance can be obtained. Comparing with a scenario that does not consider privacy protection, it can reflect that the proposed algorithm will compromise the system performance in order to protect user’s privacy
As such, our algorithms are comparing the performance with these two benchmarks for analysis and these values match those used in previous works.
5.3. Numerical Results
In this section, the numerical results are presented to evaluate the effectiveness of the proposed algorithm. The weighted sumcost, privacy level, computing delay, and energy consumption cost of the PAOTO algorithm in a period of time are compared with the two benchmarks to evaluate the performance of the proposed algorithm.
5.3.1. The First Set of Simulations
In the first set of simulations, we randomly generate some tasks for the mobile device at each time slot , which are the same for the three algorithms. The number of newly generated tasks are taken randomly within [5, 30]. The task offloading policy is executed for each task per round. Since the number of tasks in each round is dynamic, the simulation results are averaged for each task. The results of average weighted sumcost, privacy level, computing delay, and energy consumption cost are reported as the following and the results are plotted at every 100 time slots.
As shown in Figure 2, we trace the average weighted sumcost of the PAOTO algorithm, random offloading algorithm, and DynaQ algorithm at each time slot . It can be seen that the PAOTO algorithm can obtain lower system cost for each task with about 23.0% reduction comparing to the DynaQ algorithm and about 50.1% to the random offloading at the 1000th time slot. However, compared to another scenario that the privacy is not considered, the proposed PAOTO algorithm has a higher system cost. This shows that the PAOTA algorithm has compromised the cost in order to protect privacy, which obtains the suboptimal solution.
According to Figure 3, we can see that the proposed algorithm achieves better performance of the privacypreserving level comparing to the random offloading and DynaQ algorithm. The proposed algorithm improves 4.8% and 19.1% of the privacy level compared with the DynaQ scheme and random offloading algorithm, respectively, at the 2000th time slot. The performance comparison of computing delay and energy consumption cost also verifies the improvement of the proposed algorithm, which are shown in Figures 4 and 5, respectively. For instance, compared to DynaQ, the computing delay and the energy consumption cost of the PAOTO algorithm decrease by 22.5% and 25.4%, respectively, at the 1000th time slot. It is a pity that compared to the scenarios that privacy is not considered (i.e., the optimal solution), the latency and energy consumption performance of the proposed algorithm is slightly worse. Because it sacrifices some performance in order to preserve privacy. Besides, the simulation results of the random offloading scheme are very poor, which further proves that it is of great significance to study the task offloading and preserve the privacy of the users in the MEC system. Given these facts in the first set of simulations, it can be observed that the PAOTO algorithm outperforms the other two benchmarks and it obtains suboptimal task offloading performance while protecting user’s privacy.
5.3.2. The Second Set of Simulations
In the second set of simulations, we investigate the performance of the proposed algorithm with different maximum number of input computation tasks , which ranged from 10 to 60. The data size of each task is uniformly distributed in (300, 800 K) bits. These simulation results in the second set are averaged over the first 2000 time slots. As shown in Figures 6–8, the PAOTO algorithm can get a lower average weighted sumcost, computing delay, and the energy consumption cost than the other two benchmarks. And the improvements of these performances (histogram difference) increase as the increases from 10 to 60. For instance, when the number of computing tasks is 20, compared with DynaQ, the average sumcost, computing delay, and the energy consumption cost of the PAOTO algorithm increase by 28.1%, 44.7%, and 28.6%, respectively. Whereas, when the number of computing tasks is 60, they are 29.3%, 51.6%, and 37.2%, respectively. The reason is that as the total number of tasks increases, the DynaQ algorithm requires more time to learn, and the random offloading does not have any performance optimization effects, but the proposed algorithm has stable processing efficiency to obtain a lower cost.
Additionally, as shown in Figure 9, with the increment of the number of tasks, the privacy level of the proposed algorithm will increase significantly but the privacy level of the DynaQ algorithm will decrease slightly. It is also because the processing efficiency of the DynaQ algorithm will decrease as the number of tasks increases. Besides, the privacy level of the random offloading algorithm is not affected by the number of tasks. Hence, the simulation results of the second set validate that the PAOTO algorithm has superior and stable system performance and privacypreserving level for increasing computingintensive tasks.
From the two set of simulations mentioned above, it can be seen that the PAOTO algorithm meets the objective of this paper that receives the closetooptimal delay and energy consumption performance for MD while protecting the user’s privacy. And it has a significant performance improvement comparing to the other two benchmarks.
5.3.3. The Third Set of Simulations
In order to analyze the effect of different key parameters (i.e., and ) on the PAOTO algorithm, the weighted sumcost is plotted under different values of and the number of dummy tasks . First, the parameter in the PAOTO algorithm is associated with the standard deviation of the sampling, where . Thus, we set the values of as 0.1, 1, 5, 10, 15, and 20. As shown in Figure 10, we can observe that the values of and the average weighted sumcost of the PAOTO algorithm are positively correlated when , such as the curves , , and . As the value of is larger, the convergence of the PAOTO algorithm becomes worse. However, when , the average cost of the PAOTO algorithm will increase with the decrease of , such as curves and . The reason is the cost tradeoff in the theoretical bound, and there are different effects before and after reaching the bound.
Second, the different number of dummy tasks that is related to privacy metric is simulated for weighted sumcost of the proposed algorithm. As shown in Figure 11, can be set as 1, 5, 10, and 15. Then, we can observe that as increases, the weighted sumcost will also increase. However, as is larger, the increment of the weighted sumcost will decrease. According to equations (8) and (9) in Section 3, is directly proportional to the privacy metric and the privacy metric is inversely proportional to the weighted sumcost. When the number of dummy tasks increases, the system cost will increase at the beginning. Nevertheless, taking dummy tasks into the privacy metric can restrict the increment of the weighted sumcost. That is, there is a tradeoff between the system cost and the privacy metric.
6. Conclusions
In this paper, we investigated joint task offloading and privacy preservation for the smallsize and lowpower mobile devices without any systemlevel network information in the MEC system. The objective is to minimize a weighted sum of the computing delay, energy consumption cost, and reciprocal of the privacy metric. In particular, the joint optimization problem has been formulated as a contextual MAB problem with a semiparametric reward model to accommodate network dynamics, in which the privacy metric is taken into account. Subsequently, a privacyaware online task offloading (PAOTO) algorithm is proposed to explore the balance between the optimal system cost and the privacy level. The simulation results show that the proposed algorithm can provide nearoptimal solutions in a short computing time. In the future, we will extend our work to the scenarios that have multiple MEC servers with distinct computing capability and take the bandwidth constraint into account.
Data Availability
The (DATA TYPE) data used to support the findings of this study are included within the article.
Disclosure
Additionally, a preliminary version of this work was accepted by WASA 2020. However, we have extended the conference paper significantly and the difference between the journal version and the conference version is above 50%.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant no. XDC02040300.