#### Abstract

This article studies a mobile edge computing (MEC) with one edge node (EN), where multiple unmanned aerial vehicles (UAVs) act as users which have some heavy tasks. As the users generally have limitations in both calculating and power supply, the EN can help calculate the tasks and meanwhile supply the power to the users through energy harvesting. We optimize the system by proposing a joint strategy to unpacking and energy harvesting. Specifically, a deep reinforcement learning (DRL) algorithm is implemented to provide a solution to the unpacking, while several analytical solutions are given to the power allocation of energy harvesting among multiple users. In particular, criterion I is the equivalent power allocation, criterion II is designed through equal data rate, while criterion III is based on the equivalent transmission delay. We finally give some results to verify the joint strategy for the UAV-aided multiuser MEC system with energy harvesting.

#### 1. Introduction

In recent years, wireless communication has been put into many efforts from the researchers of both academy and industry [1, 2], which inspires a lot of practical applications, such as internet of things and video monitoring [3]. Among these applications, a key feature is that massive calculating is involved due to the massive number of accessing nodes [4]. To suppress the massive calculating, cloud computing has been proposed which assisted the task calculating through wireless transmission [5, 6]. A major limitation is that the latency and power consumption (PoC) become prohibitively high in a poor channel state, which limits the development and application of cloud computing severely.

To resolve the above disadvantages of cloud computing, mobile edge computing (MEC) has been proposed to install the calculating resources at the edge node (ENs) of the network [7–9]. In this way, the users can unpack the tasks to the nearby EN through wireless transmission, which leads to a decreased delay and PoC compared to the cloud computing. A key design in the MEC system is the unpacking scale [10, 11], which gives the number of scale of tasks to be calculated at the EN. The fundamental principle of unpacking is to jointly utilize the communication and calculating resources, through achieving a fine trade-off between the calculating and wireless transmission. Moreover, some advanced wireless techniques have been proposed to decrease the delay and PoC in the calculating and transmission [12, 13].

Another new technique to assist the calculating and communication in IoT networks is the deployment of unmanned aerial vehicles (UAVs), which are easy to be used and provide flexible ability. Moreover, the price of UAV is becoming cheaper and cheaper, which inspires a lot of applications in practice [14, 15]. For the MEC system, the UAVs can rescue the data calculating with higher priority through some intelligent path routing and scheduling, which exploits the incremental system resources due to the usage of UAVs. The integration of UAVs into MEC systems has attracted much attention from the researchers of academy and industry, which becomes the motivation of this article.

Motivated by the above literature review, this article studies a MEC system with one EN, where multiple unmanned aerial vehicles (UAVs) act as users which have some heavy tasks. As the users generally have limitations in both calculating and power supply, the EN can help calculate the tasks and meanwhile supply the power to the users through energy harvesting. We optimize the system by proposing a joint strategy to unpacking and energy harvesting. Specifically, a deep reinforcement learning (DRL) algorithm is implemented to provide a solution to the unpacking, while several analytical solutions are given to the power allocation of energy harvesting among multiple users. In particular, criterion I is the equivalent power allocation, criterion II is designed through equal data rate, while criterion III is based on the equivalent transmission delay. We finally give some results to verify the joint strategy for the UAV-aided multiuser MEC system with energy harvesting.

#### 2. System Model

In this paper, we consider an unloading system model in Figure 1 which has an edge node (EN) (note that the notation of “CAP” is used in some literature, while the notation “EN” is used in other literature. Both stand for the same meaning, and these two notations can be used interchangeably) surrounded by unmanned aerial vehicles (UAVs). Specifically, the EN has an energy transmitter and a server which can provide calculating. The EN is capable of providing charging services to the UAVs, and each UAV is equipped with a limited battery capable of wireless charging. Each UAV has a calculating task . Due to the UAVs’ limited calculating power, each UAV unloads the calculating task to the EN in order to reduce the calculating time. The EN ensures that the UAV is always supplied with electricity, so the UAVs in this system unload tasks without considering power consumption. We will introduce the local calculating model and unloading calculating model in the next parts.

##### 2.1. Local Calculating Model

The local calculating delay of the is [16] where is the size of the task. is the unloading ratio from to the EN. Moreover, is the CPU cycles for executing one bit, and is the local calculating ability. Because all UAVs calculate their tasks in parallel, we use the maximum value of local calculating as the local delay of the whole system. So, the local calculating delay of the whole system is

##### 2.2. Unloading Calculating Model

In this paper, will be charged by EN, and the charging process from EN to is where notations denotes the charging factor, is the charged power of EN, is the charging time, and denotes the span of each time slot.

From (3), the transmission power at the is

The transmission rate between the and the EN is where is the total bandwidth of the system. is the channel parameter from to the EN. is the variance of the additive white Gaussian noise at the EN. The transmission delay of the is

The calculating delay at the is where is the calculating ability at the EN. Further, the transmission delay of all UAVs is

The calculating delay of all UAVs is

From (8) and (9), the unloading calculating of the whole system is

Therefore, the system target in this considered MEC network is where is the total charged power of EN. In the next section, we will describe how we optimize the system target in detail.

#### 3. System Optimization

In this section, we demonstrate our optimization scheme for the considered system target. Specifically, we first utilize deep Q-network (DQN) algorithm to obtain the task unloading strategy, and then, we proposed three methods to allocate the charged power for UAVs in the considered system. The details of our optimization scheme are expressed as follows.

##### 3.1. Scheme on the Task Unloading

Due to the complexity of wireless link in the system, it is hard to dynamically unload the task of UAVs by traditional method. Therefore, we exploit DQN algorithm to obtain the task unloading strategy. Different from the Q-learning algorithm, DQN has an experience pool and two neural networks that include the evaluation network and the target network, to interact with the training environment and break the training data correlation. Moreover, we use the Markov decision process (MDP) to model the consider task unloading issue. In particular, MDP generally consists of the state set , the action set , and reward function . The training process can be represented as follows: the DQN agent first initializes the system state set , and then, it selects an action command under the current state. After the DQN agent executes the selected action command, the system state set will be updated. Further, the DQN agent will obtain a feedback according to the reward function . Then, the DQN agent will put the previous state, the updated state, the selected action under the previous state, and the according feedback into the experience pool. When the DQN agent finishes the above process, it will obtain a state-action value that represents the network matrix of evaluation network. Then, the evaluation network will be trained by the loss function, which is where denotes the function of target network, which is where represents a discount element and denotes the network matrix of the target network. It is notable that the structures of evaluation network and the target network are the same. However, different from the target network, the evaluation network will be trained in every round, and its training process can be denoted as where is the learning rate of the evaluation network.

##### 3.2. Methods on the Charged Power Allocation

In this part, we will describe three methods for allocating the charged power from EN to . Specifically, we exploit equal-charge-power allocation method, equal-transmit-rate allocation method, and equal-charge-energy allocation. (1)Equal-charge-power allocation method

Firstly, we allocate the charge power to in a traditional way, so that each can obtain the same charge power. Moreover, we define this method as equal-charge-power allocation method or method 1, and it can be denoted as where notation denotes the allocated charge power of . (2)Equal-transmission-rate allocation method

Secondly, we allocate the charge power to by a method that ensure each can obtain the same transmission rate according to (5). Moreover, we define this method as equal-transmission-rate allocation method or method 2. This method can be represented as

From (16) and (5), we can obtain

By removing the common item of , we can have

From (4), we can obtain

Moreover, from (3) and (22), we can obtain

After removing the comment term of , we can have

Then, by further removing the comment term of , we can have

For simplicity, we assume the charging time of each is the same, which can be written as

Therefore, from (26), we can obtain

By removing the common item of for , we can have

From this equation, we have

Then, we can further obtain

By using the relationship of , we can have

From this equation, we can have the power charge allocation result of method 2 as (3)Equal-charge-energy allocation method

Thirdly, we allocate the charge power to by a method that ensure each can be charged same energy according to (3). Moreover, we define this method as equal-charge-energy allocation method or method 3, which can be represented as

From (3), we can obtain

By removing the common item of , we can have

Then, by removing the common item of , we can have

Since we assume that the charging time of each is the same, we can further obtain

Then, we can further obtain

By using the relationship of , we can have

From this equation, we can have the power charge result of method 3 as

In the next section, we will perform some simulations to demonstrate the effectiveness of our proposed scheme on task offloading and charged power allocation.

#### 4. Simulation

In this section, we perform some simulations to demonstrate our proposed scheme on task offloading and charged power allocation. Specifically, the channel in the considered MEC network adopts the Gaussian channel, and the average channel gain of the wireless link from UAVs to EN is set to 1. The variance of AWGN at the EN is set 0.1. Moreover, the number of UAVs is set to 2, and the task size of UAVs is set to 50 MB. We set the calculating ability of UAVs to cycle/s, while the calculating ability of EN is set to cycle/s. The total wireless bandwidth of EN is set to 50 MHz, and the total charged power of EN is set to 20 W, while the charging time of UAV is set 0.5.

Figure 2 shows the convergence of the proposed strategy with method 1. We can find that the system delay declines rapidly and converges after 15 epochs. For example, the system delay of method 1 decreases from 35 to less than 5. Similarly, Figures 3 and 4 show the convergence of the proposed strategy with methods 2 and 3, respectively. We can find that the system delay converges after 15 epochs and the value of delay eventually stabilised below five. These results demonstrate that the proposed DRL optimization strategy can effectively reduce the system delay and find the minimum value of the system delay.

Figure 5 shows the performance of the proposed strategy with method 1, where the value of ranges from 30 to 70. When the task size of each UAV is 100M or 50M, the system delay decreases as increases. This is because the increase in total bandwidth speeds up the transmission from the UAV to the EN and reduces system delay effectively. For example, the system delay at is lower than the delay at . Similarly, Figures 6 and 7 show the performance of the proposed strategy with methods 2 and 3 when ranges from 30 to 70, respectively. We can find that system delay decreases when the total bandwidth is increasing. These results demonstrate the effectiveness of proposed optimization strategy.

Figure 8 shows the performance of the proposed strategy with method 1, where the number of UAV ranges from 1 to 5. When the task size of each UAV is 100M or 50M, system delay increases as the number of UAVs increases. This i because the increase in the number of UAVs increases system burden and calculating delay. For example, the system delay when is lower than the delay when . Similarly, Figures 9 and 10 show the performance of the proposed strategy with methods 2 and 3 when the number of UAVs ranges from 1 to 5, respectively. We can find that system delay increases when the number of UAVs is increasing. These results demonstrate that the proposed strategy can find the lowest system delay when the number of UAV ranges from 1 to 5.

#### 5. Conclusions

This article studied a MEC system with one EN, where multiple unmanned aerial vehicles (UAVs) acted as users which had some heavy tasks. As the users generally had limitations in both calculating and power supply, the EN could help calculate the tasks and meanwhile supply the power to the users through energy harvesting. We optimized the system by proposing a joint strategy to unpacking and energy harvesting. Specifically, a deep reinforcement learning algorithm was implemented to provide a solution to the unpacking, while several analytical solutions were given to the power allocation of energy harvesting among multiple users. In particular, criterion I was the equivalent power allocation, criterion II was designed through equal data rate, while criterion III was based on the equivalent transmission delay. We finally gave some results to verify the joint strategy for the UAV-aided multiuser MEC system with energy harvesting.

#### Data Availability

The data can be obtained through email to the authors.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by the Key-Area Research and Development Program of Guangdong Province (No. 2018B010124001).