Mobile Information Systems

Volume 2019, Article ID 8172698, 12 pages

https://doi.org/10.1155/2019/8172698

## Markov Approximation for Task Offloading and Computation Scaling in Mobile Edge Computing

^{1}School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China^{2}Innovation Center & Mobile Internet Development and Research Center, China Academy of Electronics and Information Technology, Beijing 100041, China^{3}Department of Computer Science and Technology, Huaqiao University, Xiamen 362021, China

Correspondence should be addressed to Weiwei Fang; nc.ude.utjb@gnafww

Received 6 August 2018; Revised 22 October 2018; Accepted 12 December 2018; Published 23 January 2019

Academic Editor: Laurence T. Yang

Copyright © 2019 Wenchen Zhou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Mobile edge computing (MEC) provides cloud-computing services for mobile devices to offload intensive computation tasks to the physically proximal MEC servers. In this paper, we consider a multiserver system where a single mobile device asks for computation offloading to multiple nearby servers. We formulate this offloading problem as the joint optimization of computation task assignment and CPU frequency scaling, in order to minimize a tradeoff between task execution time and mobile energy consumption. The resulting optimization problem is combinatorial in essence, and the optimal solution generally can only be obtained by exhaustive search with extremely high complexity. Leveraging the Markov approximation technique, we propose a light-weight algorithm that can provably converge to a bounded near-optimal solution. The simulation results show that the proposed algorithm is able to generate near-optimal solutions and outperform other benchmark algorithms.

#### 1. Introduction

In recent years, mobile devices (MDs) have become an indispensable tool for communication, information, and entertainment in our daily life. However, finite battery capacities and limited computation resources pose intractable challenges for satisfying user-experience requirements. Computation offloading is recognized as a promising solution to cope with such a problem, by migrating computation tasks from mobile devices via wireless access to more powerful servers [1]. Mobile cloud computing (MCC) has been considered as one of the potential solutions. It is commonly assumed that the implementation of MCC relies on data exchange with a centralized cloud through wide area networks [2]. Nevertheless, MCC imposes huge traffic load on mobile networks and brings high communication latency due to the long distance from MDs to the cloud.

Mobile edge computing (MEC), which deploys MEC servers directly at the base stations using generic-computing platforms, is a newly proposed solution for the above problem. In this paradigm, IT and cloud-computing services are provided in close proximity to mobile devices [3]. By endowing ubiquitous wireless access networks (e.g., macrocell and small-cell base stations) with resource-rich computing infrastructures, MEC is envisioned to provide pervasive and agile computation augmenting services when and where are needed. Since the concept of MEC was proposed by European Telecommunications Standards Institute (ETSI), it has attracted increasing attentions from academic researchers. In particular, one of the key design issues in MEC is resource allocation [3]: should a computation task be processed locally by a MD’s CPU or remotely by a MEC server, and if the latter is chosen, how many resources should be allocated to this task? This question stems from the basic tradeoff between the cost of task offloading and the reduction in task execution time brought by offloading. While conceptually simple, it is challenging to make optimal decisions since many factors are coupled and the solution space is very large. Various approaches have been proposed to tackle this resource allocation problem in many kinds of scenarios [3, 4]. Some of them focused on single-user cases, while the others focused on multiuser cases. However, in these studies, the MD is assumed to be associated with only a single MEC server. With the expectation of small-cell/femtocell base stations been massively deployed in future networks, a MD can choose to offload its tasks to multiple nearby MEC servers with computational capabilities, other than only one MEC server [5].

This paper focuses on system optimization in scenarios where a single MD is capable of scaling its CPU frequency and allocating computation tasks to multiple MEC servers. Specifically, we exploit the diversity in terms of task assignment and CPU frequency to make optimal control decisions so as to minimize the tradeoff between task execution time and mobile energy consumption. We formulate such a problem as a combinatorial optimization, NP-hard problem. Inspired by the Markov approximation framework proposed in [6], we have devised an efficient approximation algorithm with near-optimal performance. In summary, the main contributions of this paper are as follows:(1)We propose to exploit the task assignment decision and the computation scaling capability to jointly optimize the tradeoff between latency and energy in a multiserver MEC system and then formulate this as a nonlinear combinatorial optimization problem.(2)We devise an approximation algorithm based on the Markov approximation framework [6] to solve the proposed problem efficiently. This algorithm can find the near-optimal solution by implementing a Markov chain over all feasible configurations and performing state transitions [6–8]. Then, we investigate key characteristics of the designed Markov chain and analyze the algorithm in terms of performance optimality, approximation gap, and error robustness.(3)We conduct simulation experiments to demonstrate performance of our Markov approximation-based algorithm under various parameter settings. The simulation results show that this algorithm can generate near-optimal solutions and remarkably outperforms other benchmark algorithms.

The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 presents the system model and the problem definition. Then, the proposed Markov approximation-based algorithm is introduced in Section 4. Section 5 demonstrates the simulation results. Finally, Section 6 summarizes the conclusions and outlines future work.

#### 2. Related Work

From the computation perspective, MEC offers a new service environment characterized by proximity, efficiency, low latency, and high availability, making computation offloading a promising paradigm for MDs [3]. To this end, three important issues have to be taken into account, namely, resource allocation, data partition, and optimization objective.

Since offloading introduces additional communication overhead, a key technique challenge is how to allocate computation and communication resources so as to balance the energy-performance tradeoff and support the user-experience demands [4]. Recent years have seen increasing research progresses on resource allocation for both single-user [9–11] and multiuser MEC systems [12–14]. Wang et al. [9] investigated computation offloading in MEC by jointly optimizing a MD’s CPU speed, transmit power, and offloading ratio to achieve two different design objectives, i.e., minimizing energy consumption and minimizing execution time, of the MD. Liu et al. [10] adopted a Markov decision process approach to solve the power-constrained latency minimization problem in MEC, where a MD schedules its computation tasks based on queue size, execution state, and channel information. You et al. [11] proposed a framework in which a MD can not only process computations tasks at local CPU or offload them to the MEC server but also harvest energy from the base station by microwave power transfer (MPT). The offloading problem in multiuser cases is more complex than that in single-user cases. You et al. [12] studied the optimal resource allocation for a multiuser MEC system with both time division multiple access (TDMA) and orthogonal frequency division multiple access (OFDMA). Chen et al. [13] proposed a game theoretic scheme for the computation offloading decision making problem in multiuser scenarios and demonstrated that the designed game always admits a Nash equilibrium. Sardellitti et al. [14] designed an iterative algorithm based on successive convex approximation techniques to minimize the overall users’ energy consumption with latency constraints in a MIMO multicell system. However, the work introduced above only considered the scenario where a MD only associates with a single edge server. Since the future mobile networks will be heterogeneous due to dense deployment of base stations with different capabilities [3], we can exploit such a diversity to provide more offloading options and sufficient resource capacities to MDs for guaranteeing low service latency and satisfactory user experience [15].

Generally, computation offloading could be performed in two fashions, i.e., full offloading and partial offloading [9]. In full offloading, the mobile application has to be executed as a whole either locally at the MD or remotely at the MEC server [14]. Compared with full offloading, partial offloading takes advantage of parallelism between the MD and the MEC server, so it is much more capable to satisfy stringent latency requirement. However, existing work [9, 16] often assumed full granularity of data partition, i.e., the offloaded data could be partitioned as small as possible. In practice, a mobile application may contain some indivisible tasks or files that cannot be separated into parts of any size [5].

Computation offloading affects both mobile energy consumption and task execution time. On the one hand, in situations where the MD with latency-sensitive applications has a stringent requirement on energy consumption, it is essential to apply latency-oriented solutions [9, 10, 17] to utilize limited energy efficiently, so as to shorten the execution time as much as possible. On the other hand, in order to prolong the MD’s lifetime, energy-oriented solutions [9, 11, 12, 14] are proposed to minimize the overall energy consumption of MDs while guaranteeing the latency requirement of mobile applications. As far as we know, only a few studies [5, 13, 15] have focused on optimizing these two metrics simultaneously.

While recognizing their significance, our work is different from and complementary to existing studies. We investigate a joint optimization of task offloading and computation scaling problem in a multiserver MEC system, with the objective to minimize the tradeoff between latency and energy. Besides, in system modeling, we take into account many practical aspects that were missing or unconsidered in previous work investigating relevant problems. To our best knowledge, the proposed problem has not been explored by prior work.

#### 3. System Model and Problem Formulation

##### 3.1. System Model

Let us consider a MD that has a set of tasks to be processed. It can choose to either process these tasks at local CPU or offload them to any one among the nearby MEC servers . We assume that the wireless base stations operate on orthogonal wireless channels so that any two of them will not interfere each other. We leverage the binary variables , , and , to represent the task assignment, i.e.,

A task must be processed locally or remotely, i.e., the aforementioned tasks could be separated into disjoint sets. To ensure this, we impose the following constraint:

A computation task *m* is characterized by a three-tuple of parameters, , where denotes the total number of CPU cycles needed to accomplish the task *m*, while and denote the size of computation input data (in bits) and output data (in bits), respectively. In this work, we assume that the MD can apply the methods such as offline measurements [18] and call-graph analysis [13] to obtain the values of , , and . We now analyze the computation overhead in terms of both execution time and energy consumption for both local and offloading approaches.(1)*Local computing*: If the MD chooses the local computing approach, it will execute the computation task *m* locally using its own CPU. Let be the computation capability (i.e., CPU cycles per second) of the MD. Given the decision profile and , the execution time of computing a batch of tasks at local CPU is given by

For the computational energy, we have thatwhere denotes the computational power of the MD. As in [19], *θ* ranges from 2 to 3, while and are parameters depending on chip architecture. Their values can be obtained by the measurement approach in [19]. Using dynamic voltage frequency scaling (DVFS) technology [4, 14, 19], the MD could adaptively adjust to shorten execution time or reduce energy consumption. For example, the Nexus S smartphone has six levels of CPU speeds, each of which matches with some specific voltage [19]. In this work, we assume that takes value in some finite and discrete set , where and are minimum and maximum CPU frequency of the MD, respectively.(2)*MEC offloading*: If the MD chooses the MEC offloading approach, it will offload the computation task *m* to one of the MEC servers via wireless access. This chosen server will execute task *m* on behalf of the MD. Such computation offloading would incur extra overhead in terms of time and energy for transmitting the computation input and output data. We assume that each MEC server *n* can provide the MD with fixed service rate (i.e., CPU cycles per second) , which is determined according to the MEC computing service contract subscribed by the MD from its mobile operator [13]. The average uplink and downlink data rates, denoted by and , are also assumed to be known by the MD before task processing by applying the methods in [1, 5]. For simplicity, the data transmission and the task processing are assumed to be nonoverlapping and noninterfering with each other, which means the uploading, computing, and downloading steps are carried out sequentially [5]. Given the decision profile , the total execution time of computing a batch of tasks at MEC server *n* can be given as

The MD’s energy consumption on wireless transmission can be calculated aswhere and denote the transmitting and receiving power, respectively.

##### 3.2. Problem Formulation

In this work, we hope to optimize two metrics, i.e., the tasks’ execution time and the MD’s energy consumption. Because the MD and MEC servers process tasks in parallel, the time metric can be given by

The energy consumption metric can be given by

However, these two objectives are coupled by , , and , so they cannot be optimized independently and contemporaneously. To investigate the tradeoff between them, we construct a unified objective function (or system utility) aswhere denote the weighting parameters of execution time and energy consumption for the MD. In this way, the latency and energy metrics can be taken into the decision making at the same time, while and reflect the relative importance between them. If the MD is running some application that is sensitive to the latency, it can set and when making decision. If the MD is at a low battery state and cares more about the energy consumption, it can set and when making decision. Such a weighted sum approach has been extensively used for modeling similar multiobjective optimization problems [2].

In conclusion, the optimization problem is formulated as

s.t. constraint (2),

The problem is essentially a mixed-integer nonlinear programming, which is known as NP-hard. Furthermore, this problem is also a combinatorial optimization, in which the global optimal solution consists of decisions for each computation task and the MD’s CPU. Since there is no computationally efficient way to get the exact optimal solution, we propose to develop a fast polynomial approximated algorithm that solves problem based on the Markov approximation framework [6].

#### 4. Markov Approximation and Algorithm Design

Markov approximation is a recently proposed technique for solving combinatorial network optimization problems [6]. Generally, this framework is consisted of two steps: log-sum-exp approximation and constructing problem-specific Markov chains that yield efficient parallel implementation for solving problem approximately. The proofs on optimality and convergence for Markov approximation have been presented in [6].

##### 4.1. Log-Sum-Exp Approximation

Let be a feasible solution to problem , where denotes the set of all feasible solutions that satisfy the constraints (2). Furthermore, we denote utility as the system’s objective function corresponding to a given configuration *f*, so problem can be represented by . Therefore, the equivalent minimum weight independent set (MWIS) problem of iswhere the probability indicates the percentage of time that the system is in configuration *f*. Regarding as the weight of *f*, the problem is to search for a minimum weighted configuration. Following the Markov approximation framework [6], the log-sum-exp approximation of yieldswhere *β* is a positive constant that affects the approximation performance. Let be the size of set , the approximation accuracy is known as follows:

It is clear that, as , the approximation gap approaches 0 and thus the approximation becomes exact.

According to [6], the log-sum-exp approximation in (13) is equivalent to solve the following problem :

Since is a convex problem, we can solve the Karush–Kuhn–Tucker (KKT) conditions [20] and obtain the optimal solution as

However, it is difficult to solve problem directly because this requires complete information on which is typically unknown due to the large solution space. However, if we can sample the configuration space from the distribution in (16), i.e., time-sharing among different configurations *f* according to their portions , we actually solve problem and thus problem approximately [7]. Toward this, the key is to design a problem-specific Markov chain, which models feasible configurations as states, achieves stationary distribution , and allows parallel construction among the tasks.

##### 4.2. Markov Chain Design

It has been proved that there exists at least one continuous-time time-reversible ergodic Markov chain with stationary distribution in (16) and [6]. To construct such a time-reversible Markov chain, we let be two states of the Markov chain and use as the transition rate from state *f* to . It suffices to design to ensure the following two conditions: (a) any two states are reachable from each other, and (b) the detailed balance equation is satisfied, i.e., , . The above two sufficient requirements allow a large space to design a Markov chain in terms of the state-space structure and transition-matrix design.

First, the transition rate between any two states can be set to zero, if they are still reachable from any other states. That is, we only allow direct links between two states that can be reached by performing only one task migration.

Second, for two assignments with direct transitions, e.g., *f* and , we design the transition rate between them aswhere is a constant denoting the mean time of state transition [6].

##### 4.3. Markov Approximation-Based Algorithm

According to Markov approximation [6], in our system, a configuration *f* consists of computation tasks using one of its local configurations. If all tasks run individual continuous-time clocks and wait for performance-dependent amounts of time before switching their local configurations, we can implement the target Markov chain such that transitions only happen between two configurations *f* and when they differ from each other by only one task’s local configuration.

The implementation based on our designed Markov chain is shown in Algorithm 1. This algorithm is explained as follows. The MD initializes a dedicated thread for each of its computation tasks. At first, each thread randomly selects a feasible target device that satisfies constraint (2) for . Besides, the CPU frequency is also randomly picked from the candidate set . In stage 1, each thread is associated with an exponentially distributed random number with a mean equal to *γ* and counts down according to this number. In stage 2, when the timer of task *m* expires, the dedicated thread first sends RESET signals to other threads for notifying them the upcoming potential transition and then randomly generates a new configuration on task assignment and frequency scaling. The thread *m* will transit to with the probability or stay at the current configuration *f* with the probability . In stage 3, when the dedicated thread serving for a task receives a RESET signal, it terminates its current countdown process and then transits to stage 1 again. Due to the properties of the underlying Markov chain, this algorithm will converge to near-optimal configuration in probability after a sufficient number of time periods.