Abstract

With the dramatic increase in the number of emerging Internet services, the Fog-Radio Access Network (F-RAN) has recently emerged as a promising paradigm to enhance high-load task processing capabilities for mobile devices, such as the Internet of things (IoT) and mobile terminals. Hence, it becomes a challenge for the F-RAN to reduce the offloading cost by designing an effective offloading strategy and rational planning of limited network resources to improve the quality of experience (QoE). This article investigates the F-RAN with a binary offload policy. It proposes an intelligent algorithm capable of optimally adapting to task offload policy, fog computing resource allocation, and offload channel resource allocation. To evaluate the offloading strategy intuitively, we design a system utility metric defined as a delay-energy weighted sum. The joint optimization problem is converted into a convex problem based on this metric, i.e., a mixed integer nonlinear programming (MINLP) problem. A novel algorithm based on improved double-deep Q neural networks is DDQN, which is proposed to address this problem. Furthermore, an action space mapping method in the DDQN framework is presented to obtain offloading decisions. Extensive experimental data indicate that the proposed DDQN algorithm can effectively reduce the offloading cost and is adaptable to different offloading scenarios.

1. Introduction

Nowadays, with the rapid development of mobile communication technologies represented by the fifth generation (5G) and the wide application of artificial intelligence, our society has become increasingly intelligent, and the number of resulting Internet of things (IoT) services [1] has increased dramatically. However, several highly anticipated applications including virtual reality (VR), augmented reality (AR), and the Internet of vehicles (IoV) necessitate extremely low latency and energy consumption while being constrained by cost and computational resources. Fog computing, also known as the fog-radio access network (F-RAN) [2] and mobile fog computing (MFC) [3], was established to satisfy the needs of IoT services, fully exploit the benefits of IoT, and overcome the problem of limited computing resources of user equipment (UE). Queuing delays caused by offloading tasks to remote cloud servers [4] through the core network can be reduced by allowing UEs to offload tasks to nearby fog access points (F-APs) for processing. Meanwhile, the addition of fog servers reduces the communications between the base station and the core network significantly [5], thus relieving the load of the backhaul network.

In practice, however, in practical applications, fog servers’ computational and network resources are not unlimited. Different resource allocation schemes significantly impact users’ quality of experience (QoE) [6] of users. Hence, it becomes a challenge in the F-RAN to design an effective offloading strategy with proper planning of limited network resources. Several existing studies have proposed offloading methods to solve these problems. Goudarzi et al. in [7] proposed a new technique for task layout based on the memetic algorithm to maximize the number of tasks computed in parallel on each server. In [8], the concept of distributed decision-making is proposed. The algorithm is distributed to each device, and the offloading decision will be generated directly by the local device, which dramatically reduces the complexity of the network. However, as information is not shared among each device, it is obvious for server congestion to occur. Lan et al. in [9] divided the offloading time into peak and off-peak time. Then, different offloading algorithms are applied for each case to find the offloading decision of tasks.

1.1. Related Work

In the existing works, most of them transform offloading as a constrained convex optimization (CCO) problem with different metrics and constraints chosen, such as service delay, network capacity, backhaul rate, and energy consumption [4]. Wang et al. in [10] jointly optimize the computation of offloading decisions, resource allocation, and content caching policies and transform the original problem into a convex optimization problem. Then, they offered an alternating direction method of multipliers-based to solve the convex problem. Ma et al. proposed a genetic convex optimization algorithm (GCOA) in [11] to satisfy the diverse quality of service requirements of different users. In [12], Jiang et al. transformed the offloading problem into a nondeterministic polynomial solution (NDPS) problem with the objective of minimizing the delay.

In [13, 14], the authors have provided a novel method to solve the decision and resource allocation problems in the F-RAN. They showed the offloading decisions of tasks in binary variables. By deriving the total offloading cost expression, the allocation problem can be converted into a mixed integer programming (MIP) problem. In [15], the resource optimization problem is formulated as a quadratically constrained quadratic programming (QCQP) problem. Then, the optimal offloading decision is obtained by solving the QCQP problem. Tang et al. [16] innovatively defined the offloading optimization problem as a decentralized partially observable Markov decision process (Dec-POMDP). Each device gives the offloading decision based on its local observation of the environment. Meanwhile, to reduce the computational complexity of the CCO problem, the coordinate descent method [17] and the convex relaxation method [18] have been proposed.

On the other hand, game theory and its variants are also adopted to solve offloading problems [1922]. In [20], a distributed game method with group perception is studied to ensure the maximum utilization of resources. Jie et al. proposed a Stackelberg-based online task offloading scheme in [21]. Shuchen and Waqas [22] proposed a multiuser partial computation offloading strategy based on game theory. Based on this, the authors in [22, 23] added intelligent gateways with migration functions to the network to relieve server congestion. The abovementioned methods, nonetheless, are studied under the assumption that the transfer probabilities of each state and the complete system model can be obtained, while such assumptions are too ideal in realistic scenarios.

Furthermore, in the F-RAN, a key research problem is the joint design of computational resource allocation and channel resource allocation [4, 24]. In [25], an iterative algorithm is proposed to solve the problem of joint allocation of computational and radio resources during offloading. In [26], a multistage stochastic planning approach for offloading tasks with high computational overhead is investigated. Cao et al. [20] have studied the optimal and suboptimal resource allocation problems in F-RANs based on nonorthogonal multiple access techniques. In [27], the main problem of joint computation and communication resource allocation for a multiuser is that the multiserver system is divided into subproblems, which are then solved using matching and sequential convex programming algorithms. In [28], Liu et al. considered a fog network with energy harvesting, where each user gets energy from a hybrid access point (HAP). They aim to maximize the minimum energy balance among all users and jointly optimize the offloading time and fog resource allocation. Similarly, the authors in [29] proposed an energy-efficient computational offload-resource allocation (ECORA) scheme to optimize computational resource allocation and transmission power jointly. Gu et al. [30] combined the reputation mechanism with offloading. The system will assign a reputation value to each device. If a task is offloaded, the algorithm will allocate the computational resources to the device based on its reputation value. Not coincidentally, in work [31], the idea of pricing different resources was proposed. Gai et al. [32] proposed an EFRO model to manage the resources in the F-RAN.

In recent years, with the development of neural networks [33], deep learning has been increasingly applied to offloading computation. For instance, the authors in [34] proposed a joint offloading decision and resource allocation algorithm based on deep reinforcement learning (DRL). The computational offloading strategy for the case of the F-RAN with multiple UEs was studied in [35], and the total utility of UEs was optimized by using the DQN algorithm. Based on this, the authors in [35, 36] improved it by considering a computational offloading strategy for device-to-device (D2D) communication between UEs in the F-RAN. The DDQN algorithm has been used in the literature [37] to predict the offloading actions of UEs in semionline distribution tasks, while also calculating and updating the total reward after each offloading decision until it reaches its maximum value. In [38], deep reinforcement learning for online offloading (DROO) was proposed to solve the problem of generating decisions quickly in fast-fading channel conditions. In [39], the deep Q networks were used to predict each device with unknown channel state information that obtains its most suitable offloading pattern. Similarly, the LSMT networks and the double-deep Q networks were combined in [40] to obtain the offloading decision of tasks. In [41], Baccarelli et al. applied a network of CDDNs to mobile fog computing to generate offloading policies. However, most of the existing intelligent algorithms are premised on the assumption that a task is an indivisible whole. In real scenarios, parts of tasks can be split into multiple independent subtasks, which cannot be ignored. Other than that, the above intelligent algorithms allocate resources equally, which is too idealistic in practice. Hence, a new intelligent algorithm is needed to tackle the problems of offloading policy and resource allocation in detachable task offloading.

1.2. Approach and Contributions

In this article, we propose a novel offloading framework in the F-RAN aiming at the independence of detachable tasks with double-deep Q-learning. In an F-RAN, there are multiple users, multiple fog access points, a remote cloud server, an edge router, and a core network layer. Users can offload their tasks to any server, such as the fog server or the cloud server, to maintain a high QoE while saving battery power.

Edge routers are arranged at the edges of the network, which manage all-fog computation resources. By collecting information about offloading tasks, communication channel status, and F-APs status, the edge router outputs offloading decisions for tasks and resource allocation decisions, which contain upload channel resource allocation decisions and fog computing resource allocation decisions, with the DDQN algorithm. Meanwhile, the proposed DDQN algorithm is being trained on the edge server. The core network layer consists of a large number of routes, which are mainly responsible for data routing and forwarding.

The main contributions of this article are as follows:(1)A novel fog offloading framework is proposed, where both the offloading decision and the resource allocation policy of the task are determined by the edge router. The user uploads the task information to the edge router through the F-AP. Then, the router uses the DDQN algorithm trained by the edge server to give the offloading decision and resource allocation policy for each task based on the information.(2)In the F-RAN, we model the system utility as a weighted sum of delay and energy consumption to compute all tasks. To minimize the system utility, a joint offloading decision and resource allocation problem for the F-RAN is proposed. The problem jointly optimizes the offloading decision, the fog computation resources, and the upload bandwidth allocated by the system to each task.(3)A double-deep Q-learning-based offloading algorithm for the F-RAN and the DDQN algorithm is proposed, which consists of a main network and a target network. The DDQN generates the action space from the main network and uses the target network to evaluate the action at the next moment, improving the performance of the main network. Besides, these generated offloading decisions and resource allocation policies are stored in a public experience pool to further train and improve the double-deep Q networks.(4)Simulation results show that the proposed DDQN algorithm has better convergence and lower average cost than the benchmark. Meanwhile, it is highly adaptive in multiuser and different focus scenarios.

The rest of this article is as follows: the offloading model and the closed-form expressions for delay and energy are in Section 2, as is the construction of a delay-energy weighted sum minimization problem. The DDQN algorithm is referred to in Section 3. Section 4 mainly provides the analysis of the simulation. Moreover, the conclusions are given in the last section.

2. System Model

In this article, we consider a Fog-Radio Access Network, as shown in Figure 1, consisting of user equipment (UE), fog access points (F-AP), a remote cloud server, an edge router, and a core network layer. The UE can be represented by a set . Similarly, a set is used to denote the F-AP. These F-APs can provide computation services for the device, but they do not have the decision-making capability. Furthermore, the F-APs communicate with local devices through wireless.

Assume that each UE has unrelated tasks to compute, denoted as . At the beginning of time , a UE has only one detachable task request, noted as , where denotes the size of the offloading data for UE ’s -th task, i.e., the workload of the task which needs to be transmitted from the device to the server, and represents the number of revolutions required by the local device to process this task (expressed in cycles).

2.1. Communication Model

It is assumed that the environmental state of the F-RAN remains constant at the same moment. The wireless channel gain between the -th AP and the -th UE at time is denoted by . Besides, the channel gain follows the free-space path loss model. Then, the wireless channel gain at time can be represented as follows:where denotes the antenna gain, is the carrier frequency, and is the path loss index. represents the linear distance from the -th UE to the -th AP.

Without loss of generality, assume that all F-APs use the same channel. The total uplink channel bandwidth is noted as , which can be split into multiple mutually orthogonal subchannels. Furthermore, there is no mutual interference between these subchannels. According to Shannon’s formula, the band utilization between the -th UE and the -th F-AP can be represented as follows:where is the transmit power of device and denotes the power of white Gaussian noise.

Further, is used to represent the proportion of channel resources allocated to the -th UE at time . The transmission delay of uploading task is as follows:

Meanwhile, it is accompanied by the energy consumption of the device during uploading. Let denote the energy consumption to upload 1 of data. The energy cost of the local device when offloading can be expressed as follows:

2.2. Transmission Time Allocation Model

Here, a model of transmission time allocation for offloading is considered, as illustrated in Figure 2. If the computation task is generated, the UE will first send the data information as well as the distance between UE and each F-AP to the edge router via the nearest fog node. The router will use the trained DDQN algorithm to give the offloading decision and resource allocation policy for each task. Additionally, the time cost of uploading the relevant information is represented as .

Once a detachable task is offloaded, the UE will first send the task to the F-AP with the optimal channel state, forwarding it to the edge router. The edge route computes tasks by scheduling the computation resources of the fog server. After the computation, the results are returned to the device via the backhaul link. After the computation is completed, the results are delivered by the backhaul link to the device. Similarly, let denote the transmission time of tasks during offloading, and be the backhaul delay of the result.

Generally speaking, the size of the uploading information and the backhaul data is much smaller than the offloading task [13, 38, 40], while the downlink transmission rate is much faster than the uplink rate [40]; hence, the delay of and can be ignored. Thus, the transmission delay for UE ’s -th offloading task can be approximated by .

2.3. Offloading Computation Modes
2.3.1. Local Computing Mode

We use the binary variable to represent the decision of UE ’s -th task on the local side. means the task will be executed locally, while means it will be offloaded to the server. The computational capacity of device is denoted by . As there is no transmission cost for the task in local mode, the delay can be expressed as follows:

Meanwhile, the tasks are computed locally with energy consumption, defining the local energy consumption as follows:where indicates the amount of energy consumed by the CPU per cycle.

2.3.2. Fog Computing Mode

Using to indicate the decision of tasks at the fog side, if (), the task will be computed in the fog server (not in the fog).

Consider the detachable task, which can be divided into multiple mutually independent subtasks, for the offloading decisions. Hence, multiple subtasks can be simultaneously assigned to different F-APs for parallel computation, which can fully utilize all-fog servers. Let the set be the computation resources of F-APs managed by the edge router. Thus, the total computation resources F that are scheduled by the edge router is as follows:

The distance from the F-AP to the edge router is close to a high transmission rate, so the communication delay between both can be ignored. Accordingly, the time cost in the fog computing mode is as follows:where is the proportion of fog computation resources allocated to the -th UE by the edge router at time .

Concerning energy consumption, only the cost of the local device side is concerned, while the server-side cost is ignored. Thus, the energy consumption of tasks in fog computing mode is as follows:

2.3.3. Cloud Computing Mode

Similarly, the variable is adopted to represent the decision of tasks in the cloud. When , the task will be executed in the cloud. If the task will be processed in other servers, then .

Without loss of generality, assume that the cloud server has nearly unlimited computation resources and can process multiple tasks in parallel. As the cloud server is located at the top of the network, which is far away from the local side, the delay of cloud computing is mainly affected by the propagation delay. The propagation delay is generally fixed and can be expressed as a constant. Accordingly, the total delay of the cloud mode is shown as follows:

and the energy consumption in this mode is as follows:

2.4. Problem Formulation

According to the abovementioned offloading models, the expressions for the total delay and energy consumption of offloading can be concluded, respectively, as follows:

To minimize the delay and energy consumption for all UEs, we introduce a cost function modified as the weighted sum of delay and energy as follows:where , , , and . is weighting factor to represent the focus ratio of delay to energy.

For each input and , we are interested in minimizing the cost function to obtain the desired suboptimal decision () as follows:

Equation (14b) restricts the task to be computed in only one of the local, fog, or cloud modes at time . Equation (14c) represents that the sum of the allocated upload channel resources cannot be more than the total channel resources. Equation (14d) means that the sum of allocated fog computation resources cannot exceed the total in the F-RAN.

Specifically, there are binary variables along with continuous variables of and nonlinear terms of multiplication of unknown variables in equation (14). Thus, the minimization cost function problem can be attributed to the mixed integer nonlinear programming (MINLP) problem, which is a nonconvex problem with a difficult solution.

In the next section, we will transform this problem into a tractable convex problem and propose a deep Q-learning-based algorithm to solve it. Additionally, the meaning of symbols in Section 2 is shown in Table 1.

3. Offloading Solution

In this section, to address the MINLP problem, a model-free offloading algorithm based on a double-deep Q network, DDQN, is proposed that enables offloading decision-making and the allocation of resources in the F-RAN.

It is assumed that at the beginning of time , the edge router will collect the environmental information of devices to get the state of the F-RAN, such as the following:where is the remained fog computation resources and is the remained channel bandwidth in the F-RAN. denotes the number of unprocessed tasks for UE .

After that, the DDQN algorithm will generate numerous possible actions based on . Once an action is implemented, the algorithm will feed a reward based on the current state and the taken action. According to equation (13), the reward for time is defined as follows:

3.1. Design of Mapping-Based Action Vector

In equation (15), there are three constraint variables that determine the loss function, namely, the offloading decision , the allocated fog computation resource , and the allocated uplink channel resource , respectively. Hence, at time , let the edge router output the following decision:(1)In the offloading location of UE ’s -th task, . represents the UE ’s -th task that will be processed locally. Moreover, indicates that this task will be offloaded to the fog server, while means it will be processed in the cloud server.(2)The channel resources are assigned to UE : . As Q-learning does not apply to the continuous action space, the resources cannot be allocated in a continuous percentage manner, e.g., the proportion of channel resources and the fog computation resources allocated. Hence, intending to improve the sample quality and speed up the convergence, this article proposes using the average allocation as the benchmark. It indicates that the channel resources allocated to a UE are distributed between 0 and 4 times the average channel resources at intervals of 0.2.(3)The allocated fog computation resources for UE : are given.

Furthermore, we refer to , , and as the offloading subactions, which are denoted as . represents the decision action of the UE at time . The action space contains the set of all decision actions that may be output. Since the number of possible actions for output is {3, 21, 21}, we can calculate the size of the total action space to be 1323.

For the DDQN algorithm, we use the single-intelligence approach to output actions. First, the edge router collects the current state . Then, the DDQN algorithm outputs the action with the maximum -value in the action space . Finally, the DDQN algorithm maps the output actions into the corresponding offloading subactions to get the executable decision. Hence, we construct a one-to-one mapping relationship with the iterative approach, as shown in Algorithm 1.

(1)Initialization: Let ;
(2)for to do
(3)  for  = 1 to do
(4)   for  = 1 to do
(5)    Mapping [] = [ , ];
(6)    ;
(7)   end for
(8)  end for
(9)end for
3.2. DDQN Algorithm

The structure of the proposed DDQN algorithm is shown in Figure 3. It consists of two networks, namely, the main neural network and the target neural network. The DDQN uses the main network to generate the action space with the largest Q-value. The target network is for updating the main network and evaluating the next action to confirm whether the generated action space is the suboptimal solution or not.

3.2.1. Main Neural Network

As a first step, we construct a main neural network with policy as the decision criterion, whose network parameter is . Assume that if the DDQN algorithm generates a decision action at time with the policy , noted as , it will still adopt the same policy to generate the decision action later. We note the expected value of the reward , which is obtained by the algorithm after completing a trajectory with policy , as the Q-value of the algorithm. The Q-value is given as follows:where is the parameter of the main network at time .

Once the Q-value is calculated, the main network will record the Q-values corresponding to all selectable actions. Then, the main network selects the action space, which corresponds to the maximum Q-value, as the current output action.

However, if the action selection is only based on equation (18), it will lead to the conclusion that in the iterative computation, the DDQN algorithm always follows the same policy for decision-making. Thus, the output will still be the same decision action. The policy π cannot be efficiently updated by the main network as well.

To avoid only following the same strategy while outputting the same actions, we import an -greedy method to extend the exploration of actions as follows:where represents the probability of adopting current action selection methods. Equation (19) states the main network will select the action with the highest Q-value as with probability or random action with probability .

3.2.2. Target Neural Network

In the deep Q algorithm, if only the main network evaluates the Q-value, it will result in an overestimation after several iterations. Here, a target neural network is additionally added, which has precisely the same structure as the main network but with different parameters. The action space is generated by the main network. Additionally, the target network is responsible for making corrections to the main network while selecting the least costly action for output.

Specifically, if the main network determines a new decision action at the -th time, it will first pass the action to the target network. Then, the target network will evaluate the Q-value of that action according to a specific prediction function. Meanwhile, the Q-value will be substituted into the loss function to determine whether the main network should be updated or not. The prediction function is shown as follows:where is the Q-value calculated by the target network based on the current action, is the parameter of the target network. denotes the action of the main network, which is selected according to the -greedy method in the context of the parameter .

Separately, in this article, the mean squared deviation function is used as the loss function to update the parameters of the main network. The loss function is as follows:where is the Q-value of the target network output under the old parameter .

After generating all possible action spaces, the target network will select the action with the lowest cost for output. Furthermore, the action outputs by the target network are recorded as the suboptimal decision in the state at time .

3.2.3. Network Improvement and Training

About updating network parameters, this article uses the empirical replay method. The suboptimal decision obtained in equation (22) will be for updating the offloading policy of the main network. Specifically, after outputting each suboptimal decision by the target network, the DDQN algorithm will store a set of samples, noting as , into a finite storage space. This storage space is called the public experience pool, from which the main network randomly selects to update its parameters . Meanwhile, if the public experience pool is full, the oldest will be replaced with the new one.

As an example, suppose that in the DDQN algorithm interacting with the environment, one trajectory is able to generate 100 sets of transfer samples. Meanwhile, assuming that the size of the public experience pool is 500, it will be filled up after 5 complete trajectories. When the pool is filled, the algorithm randomly draws a certain batch of from it and gives them to the network for learning. Through learning, the parameters of the main network are updated, which results in improving its policy . Besides, the updated policy continues to interact with the environment and generate new . Accordingly, the DDQN replaces the oldest with the new and repeats this step over and over again.

In the next section, the performance and accuracy of the proposed DDQN algorithm are evaluated based on numerous simulations. Furthermore, the pseudocode of the DDQN is shown in Algorithm 2.

Input: Number of UEs , size of tasks and distance between UE and F-AP ;
Output: Suboptimal decision ;
(1)Initialization: Initialize the parameter of the main network with random weight and the parameter of the target network with random weight and empty the public experience pool;
(2) Set training interval ;
(3)for to do
(4)  Reset starting environment information ;
(5)  for to do
(6)   Reset remaining channel resources and remaining fog computation resources ;
(7)   for to do
(8)    The main network generates action with the -greedy method according to ;
(9)    Map to subactions and implement them in the environment;
(10)    Obtain status and reward based on the environmental changing ();
(11)    Mark and store it in the public experience pool;
(12)    if (the public experience pool is full) then
(13)     The target network calculates ;
(14)     Update the main network parameter based on and replace the oldest data with the new one;
(15)    end if
(16)    while (episode mod ) do
(17)     Assign the main network parameters to the target network, i.e., ;
(18)    end while
(19)   end for
(20)  end for
(21)end for
(22) The target network picks the action with the smallest ) as the suboptimal decision : .

4. Simulation and Evaluation

In the simulation, we construe a F-RAN with 30 UEs and 5 F-APs, where each UE has 100 unrelated and detachable tasks to compute. The range of values for uploading data is [150 , 1024 ], and the amount of computation needed to process locally is in [100 MHz, 500 MHz]. Moreover, all-local devices are assumed to have the same processing power of 1 GHz. Besides, the distance from a UE to a F-AP is evenly distributed within [20 m, 200 m]. Other parameters are set in Table 2.

We adopt the deep Q network model of a fully connected network with 4 hidden layers, where each layer contains 80 neurons. The hidden layer uses Relu as the activation function. The learning rate is 0.001, and the size of the public experience pool is 512. During each training, the algorithm will randomly pick up 32 DATAs. Besides, we consider 500 episodes, i.e., , where each episode has 1000 DATAs, i.e., .

4.1. Convergence and Performance Evaluation

The convergence of the DDQN algorithm was evaluated under different algorithm settings. The simulation results are shown in Figure 4. In the subplots, the x-axis represents each episode, and the y-axis shows the average cost of offloading for each episode.

The convergence performance of the DDQN algorithm under different public experience pool sizes is shown in Figure 4(b), where the size of the pool is noted as memory. Lacking sufficient DATAs, the converged cost of the algorithm is pretty high with small memory (e.g., 256). As the memory gradually increases (from 512 to 4096), the average cost of offloading is kept at a low state. However, the larger memory corresponds to a slower convergence speed. Hence, in the next simulation, we adopt a memory size of 512.

In Figure 4(c), we investigate the convergence performance under different batch sizes, i.e., the number of DATAs sampled in each training round. As the batch size increases from 4 to 32, the algorithm converges at a significantly faster speed. As it further increases from 32 to 128, the performance does not improve significantly in terms of convergence speed and cost. Furthermore, a larger batch size means more training time is required. Thus, in this article, we can choose a suitable batch size that not only reduces the training time of one round but also does not significantly decrease the performance of the DDQN algorithm, such as batch size = 32.

Figure 4(d) shows the convergence performance under different loss functions, including gradient drop (GD), mean square deviation (MSD), and adaptive moment estimation (Adam). Just as shown in Figure 4(d), the performance of the CD function is poor, which may not be suitable for the DDQN algorithm. The MSD and Adam functions lead to similar convergence speed and cost. From the above simulation results, in Figure 4, our proposed DDQN algorithm exhibits a stable convergence performance under different parameter settings.

Figure 5 shows the impact on the offloading strategy of the DDQN under different numbers of UEs. When the number of UEs is small, the tasks are mainly offloaded to the fog server. As the number increases, the percentage of the fog server gradually decreases while the percentage of local execution increases. Meanwhile, the cloud is only involved in a small amount of computation in this process. Under the conditions of Table 2 (the main influencing parameter is the computational volumes of tasks), the cost of computing locally is lower than offloading to the cloud server regardless of the number of UEs (as shown in Figure 6). Hence, when fog computation resources are insufficient, the overloaded tasks will be processed locally without the option of offloading to the cloud.

In Figure 7, we compare the impact of different computational volumes on the offloading strategy of the DDQN algorithm. With an increase in the computation volume, the proportion of tasks computed locally is decreasing dramatically, while the proportion on the cloud is increasing rapidly. When the computational volume is small, the DDQN algorithm mainly allocates tasks to be computed locally or in the fog. As the computational volume grows, the computation resources at the fog and the local are insufficient to support the current demand. Thus, the DDQN algorithm offloads more tasks to the cloud server, where computation resources are abundant for processing. It illustrates that the proposed algorithm can be applied in scenarios with different computational requirements.

Figure 8 studies the effect of different weight values μ on the offloading strategy. When , it means that we only care about the offloaded energy consumption. In this situation, most tasks are offloaded to the fog or cloud server, with the energy costs being lower. When the weight is increased to 0.1, we can observe that the proportion of cloud servers decreases significantly, while the proportion of the fog side increases. With the introduction of time utility, for the current computation volumes (as shown in Table 2), the task takes much less time to compute in the fog than the propagation delay for the cloud. Moreover, the time cost in the local computing is smaller than the time cost in the fog with current settings, but the local energy consumption is much higher than the conditions of the fog. That is why the processing percentage of the fog server decreases with increasing while the local has been rising. Besides, Figure 8 shows that the DDQN algorithm can be well applied to scenarios with different foci.

In Figure 9, we further study the average computation time of the DDQN algorithm under different numbers of UEs. For DDQN employed with a different number of UEs, the time cost is almost the same for each offloading task, which stays at 0.27s. Since the number of UEs does not affect the time cost of the algorithm, it can be applied in massive user offloading scenarios, such as unmanned factories.

4.2. System Utility Comparison

Regarding the practical performance of the system, our DDQN algorithm is compared with five representative benchmarks as follows:(i)The coordinate descent (CD) algorithm [13] iteratively swaps the offloading patterns of the UE, resulting in minimal delay and energy cost in each round. The iterations will stop when the offloading mode swapping does not further improve the system performance. Moreover, the CD algorithm is proven to achieve near-optimal decisions at different .(ii)The Joint Computation offloading, Data compression, Energy harvesting, and Application scenarios (JCDEA) algorithm [18] is a comprehensive offloading algorithm that solves the joint computation offloading, data compression, energy harvesting, and application scenario optimization problems in the F-RAN. The JCDEA algorithm obtains the optimal offloading decision and resource allocation policy by transforming the problem of finding an offload policy into solving the minimum cost of local, fog, and cloud computing. We assume that the data compression ratio is 1, i.e., the task is not compressed when introducing this benchmark algorithm. The energy harvesting efficiency is 0, i.e., the local device does not collect energy from the outside.(iii)The greedy algorithm prioritizes all tasks to a specific F-AP and invokes all of the computation resources of the current fog node. If the fog node has reached its maximum processing capacity, the priority is randomly assigned to the next empty F-AP. If all the fog computation resources are overloaded, the task will be processed locally or in the cloud, depending on which mode has the lower average cost.(iv)All-local computing tasks are processed on the local device.(v)All-fog computing tasks are randomly offloaded to the fog server in the F-RAN for processing.(vi)All-cloud computing, the remote cloud server, processes all user’s tasks.

As shown in Figure 6, while the number of UEs changes, the average cost of all-local computing remains stable. However, for the other benchmarks, the average cost of offloading increases as the user grows. As for the increasing number, the problem of bandwidth resource competition for uplink channels arises, which leads to a decrease in the upload rate allocated to tasks that increases the time cost of offloading. Moreover, when the limited fog computation resources are insufficient to support numerous tasks, additional queuing delays are forced to suffer, which further adds to the extra offloading delay. Hence, the algorithm with a single offloading mode, for the offloading of multiuser scenarios, is not suitable.

For the CD algorithm, the change in the number of UEs has little influence on the average cost, and even the cost in the multi-UE state () is lower than in the few-UE state (). Thus, the CD algorithm is more inclined to be applied in multiuser scenarios. For the greedy algorithm, we conclude that with the increasing of UEs, it is no longer effective in arriving at the optimal offloading decision. The JCDEA algorithm outperforms other benchmark algorithms but the average cost consumption is always higher than that of the DDQN algorithm. In summary, the DDQN algorithm offers a lower offloading cost scheme with better performance, compared with the benchmark algorithms.

Figure 10 investigates the impact of different computational volumes on the average cost. For the cloud server with abundant computing resources, the main time cost is determined by the propagation delay, which does not vary with the computation volume. This is why the average cost of all algorithms, except the all-cloud computing model, rises linearly with the amount of computation. The performance of the greedy algorithm is weaker in comparison, especially when the fog servers are overloaded. The DDQN, CD, and JCDEA algorithms increase their average cost relatively slowly as the amount of computation adds, eventually converging to the cost of all-cloud computing. However, the proposed DDQN algorithm consistently maintains a lower average cost than other benchmark algorithms. In contrast, the task offloading will have better performance, which always maintains a lower cost in the different focused offloading scenarios by employing our DDQN algorithm.

5. Conclusions

In this work, a novel model-free offloading algorithm, the DDQN, is proposed for the offloading scenario of detachable tasks in the F-RAN, which is based on the double-deep Q network, to allocate the offloading decision of the task, the uplink channel bandwidth, and the fog computation resources that arrive at a minimized cost. By importing binary variables in the offloading strategy, we transform the offloading into a problem of finding binary offloading decisions with resource allocation. Next, we design the delay-energy weighted sum metric and further convert the above problem into a mixed integer nonlinear programming (MINLP) problem based on delay and energy, i.e., obtaining suitable offloading decisions and resource allocation strategies that minimize the cost. Furthermore, a delay-energy weighted sum metric is designed for evaluating the offloading strategy. Moreover, we further convert the above problem into a mixed integer nonlinear programming (MINLP) problem based on the delay and energy weighted sum, i.e., to obtain a reasonable and efficient offloading decision and resource allocation strategy to minimize the offloading cost. Since the MINLP problem is intractable to resolve in a general way, the DDQN algorithm is proposed to generate decisions. Meanwhile, we innovatively combine the action space mapping method with deep reinforcement learning. Numerical simulations illustrate that the DDQN algorithm, compared to the benchmark algorithm, can significantly reduce the offloading cost of task execution.

Finally, it is hoped that the proposed DDQN offloading framework can be expanded in future F-RANs, such as smart IoTs and driverless cars, to optimize real-time offloading for various scenarios with multiple users.

Data Availability

The underlying data supporting the results of this study can be found at the official website of Beijing Natural Science Foundation.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by Beijing Natural Science Foundation, Haidian Original Innovation Joint Fund Project (No. L182039).