Abstract

The increase in the number of services in the power distribution grid leads to a massive increase in task data. Power distribution internet of things (PDIoT) is the specific application of internet of things (IoT) in the power distribution grid. By deploying a large number of PDIoT devices, the voltage, active power, reactive power, and harmonic parameters are collected to support distribution grid services such as fault identification and status detection. Therefore, PDIoT utilizes massive devices to collect and offload tasks to the edge server through 5G network for real-time data processing. However, how to dynamically select edge servers and channels to meet the energy-efficient and low-latency task offloading requirements of PDIoT devices still faces several technical challenges such as task offloading decisions coupling among devices, unobtainable global state information, as well as interrelation of various quality of service (QoS) metrics such as energy efficiency and delay. To this end, we firstly construct a joint optimization problem to maximize the weighted difference between energy efficiency and delay of devices in PDIoT. Then, the joint optimization problem is decomposed into a large-timescale server selection problem and a small-timescale channel selection problem. Next, we propose an ML-based two-stage task offloading algorithm, where the large-timescale problem is solved by two-side matching in the first stage, and the small-timescale problem is solved by adaptive -greedy learning in the second stage. Finally, simulation results show that compared with the task offloading delay-first matching algorithm and the matching theory-based task offloading strategy, the proposed algorithm performs superior in terms of energy efficiency and delay.

1. Introduction

With the rapid development of new power systems, the number of services in the power distribution grid has gradually increased, resulting in a massive increase in task data. Power distribution internet of things (PDIoT) is the specific application of internet of things (IoT) in the power distribution grid. By deploying a large number of PDIoT devices, the voltage, active power, reactive power, and harmonic parameters are collected to support power distribution grid services such as fault location and status detection [1]. For example, for the status estimation service, task data such as real-time power and voltage information are collected to supplement the load curve data, load forecast data, and meter reading data. By offloading the task data to the server for processing, the real-time load of power distribution grid can be obtained. However, traditional cloud computing with a long data transmission distance to the cloud server results in high delay, large energy consumption, and severe congestion [2, 3]. Edge computing integrated with 5G provides a new paradigm shift for real-time computing services, where PDIoT devices offload data to nearby edge servers to reduce delay and energy consumption [46].

Task offloading is a key enabler to realize efficient edge computing for PDIoT. On the one hand, the devices need to select the optimal one among the deployed edge servers. On the other hand, due to the spectrum shortage, the devices need to dynamically select channels according to available spectrum resources. Meanwhile, PDIoT services have strict requirements on energy efficiency and delay performances [7]. For instance, the delay requirement of control services is millisecond level, while the monitoring devices collect and transmit massive data to improve energy efficiency under limited battery capacity [8]. Therefore, how to achieve energy-efficient and low-latency task offloading by jointly optimizing server and channel selection remains an open issue. The joint optimization problem still faces the following challenges.

First, the task offloading decisions among massive devices are coupled with each other. Meanwhile, server selection needs to be optimized according to the change of server computing resources, and channel selection needs to be optimized according to the change of channel state. Since the change of server computing resources is not in the same timescale as that of channel state, task offloading needs to be optimized in different timescales. Particularly, the large-timescale server selection is optimized in the first stage, while the small-timescale channel selection is optimized in the second stage. Therefore, two-stage task offloading problem is constructed. Second, the wireless channels are interfered by electromagnetic interference in PDIoT, and face channel fading caused by multipath transmission. It is not feasible to obtain global state information (GSI) for task offloading. Finally, the optimization of energy efficiency, transmission delay, and processing delay are coupled with each other tightly, leading to a more complex optimization problem.

Task offloading has gained considerable attention from both academia and industry. In [9], Mitsis et al. proposed a data offloading framework for UAV-assisted multiaccess edge computing systems based on resource pricing and user risk perception. A usage-based pricing mechanism for users was introduced to utilize the computing power of MEC server. In [10], Chen et al. proposed an alternating minimization algorithm to achieve energy-optimal fog computing offloading by jointly optimizing offloading ratio, transmission power, local CPU computation speed, and transmission time. In [11], Maray et al. surveyed the latest research on task offloading from the aspects of offloading mechanism, offloading granularity, and offloading technology. Various task offloading mechanisms and optimization methods in different environments were discussed. In [12], Mustafa et al. divided the computation offloading into four categories, i.e., static, dynamic, full, and partial offloading, and compared the existing research from seven aspects, i.e., contribution, computation offloading, energy/battery lifetime, resource/task scheduling, cooperation, user fairness, and transmission/computation latency. In [13], Wu et al. proposed an energy-efficient dynamic task offloading (EEDTO) algorithm to control the computation and communication costs for different types of applications and dynamic changes in the wireless environment. However, the above works neglected the coupling of task offloading decisions among massive devices and cannot solve access conflicts among devices. Task offloading problem can be constructed as a two-side matching problem to obtain stable task offloading strategies and cope with the access conflicts among devices.

Matching theory provides an effective approach to solve the two-side matching problem by defining the preferences of matching subjects to address access conflicts among devices, which has been widely used in solving task offloading problems [14, 15]. In [16], Shi et al. proposed a two-side matching-based server selection algorithm to maximize the efficiency of device-to-device content sharing. In [17], Zhou et al. proposed a task offloading algorithm based on vehicle-device matching to maximize the utility function of the base station (BS). In [18], Abedin et al. proposed a two-side matching game to solve the problem of server selection, aiming to maximize the efficiency of user resource allocation. In [19], Wang et al. considered the impact of channel selection on task offloading, and proposed a matching-based channel selection algorithm to minimize the total energy consumption. The above works used matching theory to solve task offloading conflicts among devices, but they rely on the perfect GSI such as server states and channel states, which cannot be applied to scenarios where the global information changes rapidly and is unknown. Moreover, the above works do not consider the multitimescale joint optimization of server selection and channel selection, and the establishment of the preference list is influenced by the optimization results of other dimensions.

To solve the task offloading problem under incomplete GSI, machine learning (ML) has been applied to intelligent task offloading decision making. ML includes deep learning (DL), reinforcement learning (RL), deep reinforcement learning (DRL), and support vector machine (SVM). In [20], Jehangiri et al. proposed a mobility prediction and offloading framework that offloads computationally intensive tasks to predicted user locations using artificial neural networks. In [21], Zhou et al. proposed a task offloading strategy based on SVM, which minimizes energy consumption by optimizing clock frequency control, transmission power allocation, as well as offloading and receive power allocation strategies in edge computing scenarios. In [22], Wu et al. proposed a distributed DL-driven task offloading (DDTO) algorithm to jointly optimize the system utility and bandwidth allocation. In [23], Qu et al. proposed a deep metareinforcement learning-based offloading (DMRO) algorithm to solve the problem of limited computing resources of IoT devices and improve task processing efficiency. In [24], Chen et al. proposed a cloud-edge collaborative mobile computing offloading (DRL-CCMCO) algorithm based on DRL to solve the joint optimization problem of execution delay and energy consumption. The above works adopted ML algorithms to optimize task offloading decisions and further improve task offloading performances. However, the above algorithms have high computation complexity and high requirements for CPU performance, while PDIoT devices with limited CPU put forward lightweight requirements for the algorithm. Therefore, the above algorithms are not suitable for the scenario mentioned in the article. RL, as an important branch of ML, has low computation complexity, which can meet the needs of lightweight task offloading [25]. The task offloading problem can be regarded as a multi-armed bandit (MAB) problem and solved by RL [26]. -greedy learning algorithm is a low-complexity RL algorithm that can balance the tradeoff between exploration and exploitation through the adjustment of . In [27], Li et al. proposed an interference-aware RL algorithm to solve the joint problem of multichannel selection and data scheduling. In [28], Talekar and Terdal proposed a solution for optimal channel selection and routing and applied RL to select the best channel for routing. However, these works do not take into account the coupling between server selection and channel selection and cannot dynamically adjust the learning parameters according to the dynamic and complex communication environment to improve learning performance.

Motivated by the aforementioned challenges, we firstly construct a two-stage task offloading problem, including the server selection in the first stage and the channel selection in the second stage, which have different timescales. The objective is to maximize the weighted difference between energy efficiency and delay through joint optimization of server selection and channel selection, considering the influence of electromagnetic interference and stringent quality of service (QoS) constraint. Then, we propose an ML-based two-stage task offloading algorithm. Specifically, a two-side matching-based server selection algorithm is proposed to obtain large-timescale device-edge stable matching. For the channel selection in the second stage, we propose an adaptive -greedy learning algorithm to dynamically learn optimal channel selection strategies. The main contributions of this paper are summarized as follows. (i)Energy-Efficient and Low-Latency Task Offloading. Since the optimization of energy efficiency and delay are coupled, we construct the weighted difference between energy efficiency and delay to achieve the joint optimization of different QoS metrics(ii)Two-Stage Task Offloading. We propose a two-side matching-based server selection algorithm and an adaptive -greedy learning algorithm to optimize large-timescale server selection in the first stage and small-timescale channel selection in the second stage under incomplete GSI(iii)Extensive Performance Evaluation. Compared with two advanced algorithms, simulation results demonstrate that the proposed algorithm has superior performance of energy efficiency, transmission delay, and processing delay

The rest of the paper is organized as follows. Section 2 demonstrates the system model. Section 3 presents the ML-based two-stage task offloading algorithm for PDIoT. The simulation results are shown in Section 4 to verify the effectiveness of the proposed algorithm. Section 5 concludes the paper.

2. System Model

The considered task offloading scenario of PDIoT is shown in Figure 1, which consists of multiple BSs and PDIoT devices. Each BS is equipped with an edge server to provide overlapping communication coverage and computing resources for devices. Each device needs to offload its task data to an edge server through a BS for processing, aiming to reduce delay and improve energy efficiency. There are PDIoT devices, edge servers, and channels. The sets are , , and , respectively.

We adopt a two-timescale model with period and slot [29]. The large timescale is period, and the small timescale is slot. There are equal periods, which are large timescales, and the set is . Each large timescale contains small timescales, i.e., slots, and the set of slots in the -th period is . The total number of slots is , i.e., , and the set is . Task offloading includes two stages, i.e., large-timescale server selection and small-timescale channel selection. The server selection variable of the device towards in the -th period is defined as . indicates select , and otherwise. Define the quota of as , which represents the maximum number of devices that can be served by in each period. The channel selection variable is defined as . indicates that selects for data transmission, and otherwise. An example of two-stage task offloading is shown in Figure 1. selects to offload data in the first stage, and selects for data transmission in the second stage. The main notation used in this paper is given in Table 1.

2.1. Transmission Model

Based on orthogonal frequency division multiplexing (OFDM), each device selects an orthogonal channel to offload tasks in each slot. The data transmission rate from to on in the -th slot is given by where is channel bandwidth. is the signal-to-interference-plus-noise ratio (SINR), which is given by where and represent transmission power and noise power. is the channel gain between and on in the -th slot. is the electromagnetic interference power.

2.2. Delay Model

Denoting the total computing resources of as , the computing resources allocated by to in the -th slot is given by

Denote the size of offloaded data from in the -th slot is denoted as . Then, the transmission delay of offloading data to on , and the processing delay required by to process the offloaded data of in the -th slot are given by where (cycle/bit) is the CPU cycles required by to process one bit of data.

The result feedback delay is negligible compared with transmission delay and processing delay [30]. Therefore, the total delay is the sum of transmission delay and processing delay, which is given by

2.3. Energy Efficiency Model

The transmission energy consumption of in the -th slot is given by

The operation energy consumption of is given by where is the circuit power of device operation.

The total energy consumption is the sum of transmission energy consumption and operation energy consumption, which is given by

Based on [31], we define energy efficiency as the amount of data transmitted per unit of energy and per unit of bandwidth (bit/(J·Hz)). Therefore, the energy efficiency of offloading data to on in the -th slot is given by

2.4. Problem Formulation

In this paper, we aim to address the energy-efficient and low-latency task offloading problem in PDIoT. The objective is to maximize the weighted difference between energy efficiency and delay through joint optimization of large-timescale server selection in the first stage and small-timescale channel selection in the second stage. The two-stage task offloading problem is formulated as where is used to achieve the tradeoff between energy efficiency and delay. Specifically, when is large, the influence of energy efficiency is dominant in the optimization objective, and the proposed algorithm tends to maximize energy efficiency. When is small, the influence of delay is dominant in the optimization objective, the proposed algorithm tends to minimize delay. represents the server selection constraints. represents the channel selection constraints. denotes the task offloading reliability constraints in terms of SINR, where is the threshold.

3. ML-Based Two-Stage Task Offloading Optimization for PDIoT

In this section, we introduce the problem transformation and the proposed ML-based two-stage task offloading optimization algorithm for PDIoT.

3.1. Problem Transformation

We transform the first-stage large-timescale server selection problem of into a many-to-one matching problem between devices and servers. Then, we propose a stable server selection algorithm based on two-side matching with quota to solve it. Next, the second-stage small-timescale channel selection problem of is solved by the proposed adaptive -greedy learning algorithm.

3.2. First-Stage Server Selection Based on Two-Side Matching with Quota

Based on the many-to-one two-side matching with quota [3234], each device and server need to obtain the preference values towards each other. Then, based on the obtained two-side preference values, the first-stage server selection problem is solved according to the many-to-one two-side matching with quota to maximize the weighted difference between energy efficiency and delay.

Theorem 1. A matching with quota is defined as : . When and in the -th period, and establish a matching relationship, i.e., . Particularly, , where is the size of .

3.2.1. Preference List Construction

Based on , both devices and servers establish their matching preference lists. The preference value of for server is defined as the weighted energy efficiency, and the preference value of for is defined as the negative of total delay, which are given by where and are the empirical statistical estimates of energy efficiency and total delay in the -th period, i.e.,

Based on (11) and (12), and calculate and and establish their preference lists and by sorting preference values in descending order.

3.2.2. Implementation of Two-Side Matching with Quota

The implementation of two-side matching with quota consists of three steps, which are introduced as follows.

Step 1. Initialize the sets of server selection strategy, unmatched devices, and unmatched servers as , , and .

Step 2. , and , , calculate the preference values according to (11) and (12) to obtain the preference lists and .

Step 3. First, proposes to its most preferred server based on . Afterward, calculates the sum of temporary matches and new proposals. If the sum of temporary matches and new proposals is less than , establishes temporary matches with the devices which have proposed to it. The matched devices are temporarily removed from . Otherwise, based on , establishes temporary matches with only the top devices which have proposed to it. Next, the unmatched devices are added into , and is removed from their preference lists. The matched devices are removed from . If the sum of matches for is equal to , remove from . Finally, return to Step 2, and the unmatched devices make new proposals based on the updated preference lists.

Matching iteration ends until , establishes a match with a server or its preference list .

3.3. Second-Stage Channel Selection Based on Adaptive -Greedy Learning

The second-stage channel selection optimization problem is transformed into an MAB problem, which is addressed by the proposed adaptive -greedy learning algorithm. The MAB problem is mainly composed of decision maker, arm, and reward, which are introduced as follows: (i)Decision Maker. The decision maker generates selection decisions and constantly updates the decision by learning the reward from historical feedbacks. We define PDIoT devices as decision makers.(ii)Arm. Based on server selection, the device needs to select a channel for data transmission. Denote the set of arms as .(iii)Reward. Define as the reward of selecting in the -th slot, which is given by.

The traditional -greedy algorithm uses a linear method to adjust the exploration factor , which has a certain degree of blindness [35, 36]. In order to improve the exploration efficiency, we propose an adaptive -greedy learning algorithm, which uses the average cumulative reward to dynamically adjust the adaptive factor to balance exploration and exploitation.

Define as the historical average reward of selecting , which is given by where represents the total times that selects until the -th slot. The average cumulative reward under the previous , task offloading strategies is calculated as

Then, the adaptive exploration factor is updated as where represents the base of the logarithmic function, and .

The action decision of the adaptive -greedy learning algorithm is given by where is a random number. When , the device selects the channel with the largest historical average reward. When , the device randomly selects a channel. Specifically, is equivalent to .

1: Input:, , .
2: Output: and
  
3: Phase 1. Initialization
4:  Initialize .
5: Fordo
6:  Phase 2. Large-Timescale First-Stage Server Selection
7:  Step 1:
8:  Initialize , , and .
9:  Step 2:
10:   and calculate the preference values and
   based on (11) and (12) and establish the preference lists and .
11:  Step 3:
12:  While and do
13:    proposes to its most preferred server based on .
14:   Fordo
15:    If the sum of temporary matches and new proposals for is less than quota then
16:     Temporarily match with the devices, update
  , and remove the matched devices from .
17:    else
18:     Temporarily match with its most preferred
   devices and update . Remove matched devices
  from and add unmatched devices into . Unmatched
  devices remove from .
19:    End if
20:    If the sum of matches for is equal to then
21:     Remove from .
22:    End if
23:   End for
24:  End while
25:  Fordo
26:   Phase 3. Small-Timescale Second-Stage Channel
  Selection
27:    makes the action decision based on (18).
28:    calculates and based on (14) and
  (15)
29:   Update and based on (16) and (17).
30:  End for
31: End for
3.4. Implementation of Two-Stage Task Offloading

The proposed ML-based two-stage task offloading optimization algorithm consists of three phases, which are summarized in Algorithm 3.3.

Phase 1. Initialization. Initialize .

Phase 2. Large-Timescale First-Stage Server Selection. At the beginning of each period, each device and server construct their preference lists and , respectively. Then, perform the two-side matching process with quota based on Section 3.2 and obtain .

Phase 3. Small-Timescale Second-Stage Channel Selection: In each slot, , makes action decision based on (18) and selects the corresponding channel. Then, calculates the and based on (14) and (15). Finally, update and based on (16) and (17).

The algorithm ends until .

3.5. Computation Complexity

For the first-stage server selection, the computation complexity is , while the computation complexity of the exhaustive-based server selection algorithm is . When and are large enough, the computation complexity of the proposed algorithm is much lower than that of the exhaustive-based server selection algorithm. For the second-stage channel selection, the computation complexity is . In some ML-based channel selection algorithms such as DRL-based channel selection algorithm and DL-based channel selection algorithm, the computation complexity of the training of deep neural networks is . Here, is the number of training epochs, and are, respectively, the dataset size and batch size, and is the computation complexity of each training epoch. is related to many free variables such as the computation complexity of each layer, the size of convolution kernel, the number of input and output channels, and spatial dimensions of input and output feature maps. Therefore, the computation complexity of these DRL and DL-based channel selection algorithms are much higher than the proposed algorithm.

4. Simulation Results

We consider a m m transmission line monitoring scenario in PDIoT, which includes devices and edge servers colocated with BSs. The number of channels is . The devices are randomly distributed along the transmission line. Simulation parameters are summarized in Table 2 [3739]. Two algorithms are utilized for comparison. The first one is the matching based on Kuhn-Munkras (MKM) [40] which aims to minimize task offloading delay. The second one is the matching-based task offloading strategy (MBTO) [41] which aims to maximize energy efficiency through server selection optimization in each period. Both MKM and MBTO cannot achieve small-timescale channel selection optimization.

Figure 2 shows the weighted difference between energy efficiency and delay versus time slots. Compared with MKM and MBTO, the proposed algorithm improves the weighted difference by and , respectively. The reason is that the proposed algorithm can learn the optimal channel and server selection strategy to minimize the weighted difference based on dynamic network states.

Figures 3 and 4 show the total delay and energy efficiency versus time slots, respectively. Compared with MKM, the proposed algorithm improves energy efficiency by but increases delay by . Compared with MBTO, the proposed algorithm reduces delay by but decreases energy efficiency by . The reason is that MKM and MBTO only optimize one aspect of delay or energy efficiency. The proposed algorithm can make a well-balanced tradeoff between energy and delay by dynamically adjusting server and channel selection strategies.

Figure 5 shows the impact of the quota on transmission delay, processing delay, and total delay. The transmission delay decreases with while the processing delay increases with . When increases from to , the transmission delay decreases from  ms to  ms while the processing delay increases by  ms. The reason is that as increases, more devices can access to nearby BSs with better transmission performance, thus reducing transmission delay. However, the server allocates less computing resources to each device, thus increasing the processing delay. The total delay increases with since the processing delay has a greater impact on the total delay.

Figure 6 shows the impact of the weight on total delay and energy efficiency. With the increase of , the energy efficiency increases obviously and finally reaches bits/(JHz). Meanwhile, the total delay decreases first and then increases gradually, finally reaching  ms. The reason is that as increases, the proposed algorithm puts more emphasis on energy efficiency improvement. Devices are inclined to select the channel which can achieve a higher transmission rate, thereby reducing the transmission delay and total delay at first. However, the increasing enforces devices to select the nearby edge servers with less computing resources to improve energy efficiency, which then increases the processing delay and total delay. Therefore, the proposed algorithm can balance the tradeoff between energy efficiency and delay by adjusting the weight .

Figure 7 compares the performance of the proposed algorithm and the nonadaptive -greedy algorithm. The proposed algorithm outperforms the nonadaptive -greedy algorithm by . The reason is that the proposed algorithm can adjust based on the average cumulative reward to balance the tradeoff between exploration and exploitation. On the contrary, the nonadaptive -greedy algorithm with fixed cannot adaptively trade off exploration and exploitation based on current reward, thereby resulting in poor learning performance and lower weighted difference.

Table 3 shows the computation complexity of different algorithms. Due to the consideration of channel selection optimization, the computation complexity of the proposed algorithm is higher than MKM and MBTO, but the proposed algorithm, respectively, improves the weighted difference between energy efficiency and delay by 39.74% and 9.96% compared with MKM and MBTO. Due to the limited computing resources of PDIoT devices, the computation complexities of DRL-based task offloading algorithm (DRLTO), federated learning-based task offloading algorithm (FLTO), and meta learning-based task offloading algorithm (MLTO) are much higher than that of the proposed algorithm.

5. Conclusion

In this paper, the energy-efficient and low-latency task offloading in PDIoT was investigated. An ML-based two-stage task offloading optimization algorithm was proposed to maximize the weighted difference between energy efficiency and delay through joint optimization of large-timescale server selection and small-timescale channel selection. Simulation results show that the proposed algorithm can improve the weighted difference by and compared with MKM and MBTO. In the future, the joint optimization of power control and computing resource allocation for multi-QoS guaranteed task offloading in PDIoT will be studied.

Data Availability

No data was used for this article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Key Science and Technology Program of Haikou City under grant number 2020-009, the Key Research and Development Project of Hainan Province under grant number ZDYF2021SHFZ243, the National Natural Science Foundation of China under grant number 62062030, the Scientific Research Fund Project of Hainan University under grant number KYQD (ZR)-21007 and KYQD (ZR)-21008.