Abstract
The increase in the number of services in the power distribution grid leads to a massive increase in task data. Power distribution internet of things (PDIoT) is the specific application of internet of things (IoT) in the power distribution grid. By deploying a large number of PDIoT devices, the voltage, active power, reactive power, and harmonic parameters are collected to support distribution grid services such as fault identification and status detection. Therefore, PDIoT utilizes massive devices to collect and offload tasks to the edge server through 5G network for realtime data processing. However, how to dynamically select edge servers and channels to meet the energyefficient and lowlatency task offloading requirements of PDIoT devices still faces several technical challenges such as task offloading decisions coupling among devices, unobtainable global state information, as well as interrelation of various quality of service (QoS) metrics such as energy efficiency and delay. To this end, we firstly construct a joint optimization problem to maximize the weighted difference between energy efficiency and delay of devices in PDIoT. Then, the joint optimization problem is decomposed into a largetimescale server selection problem and a smalltimescale channel selection problem. Next, we propose an MLbased twostage task offloading algorithm, where the largetimescale problem is solved by twoside matching in the first stage, and the smalltimescale problem is solved by adaptive greedy learning in the second stage. Finally, simulation results show that compared with the task offloading delayfirst matching algorithm and the matching theorybased task offloading strategy, the proposed algorithm performs superior in terms of energy efficiency and delay.
1. Introduction
With the rapid development of new power systems, the number of services in the power distribution grid has gradually increased, resulting in a massive increase in task data. Power distribution internet of things (PDIoT) is the specific application of internet of things (IoT) in the power distribution grid. By deploying a large number of PDIoT devices, the voltage, active power, reactive power, and harmonic parameters are collected to support power distribution grid services such as fault location and status detection [1]. For example, for the status estimation service, task data such as realtime power and voltage information are collected to supplement the load curve data, load forecast data, and meter reading data. By offloading the task data to the server for processing, the realtime load of power distribution grid can be obtained. However, traditional cloud computing with a long data transmission distance to the cloud server results in high delay, large energy consumption, and severe congestion [2, 3]. Edge computing integrated with 5G provides a new paradigm shift for realtime computing services, where PDIoT devices offload data to nearby edge servers to reduce delay and energy consumption [4–6].
Task offloading is a key enabler to realize efficient edge computing for PDIoT. On the one hand, the devices need to select the optimal one among the deployed edge servers. On the other hand, due to the spectrum shortage, the devices need to dynamically select channels according to available spectrum resources. Meanwhile, PDIoT services have strict requirements on energy efficiency and delay performances [7]. For instance, the delay requirement of control services is millisecond level, while the monitoring devices collect and transmit massive data to improve energy efficiency under limited battery capacity [8]. Therefore, how to achieve energyefficient and lowlatency task offloading by jointly optimizing server and channel selection remains an open issue. The joint optimization problem still faces the following challenges.
First, the task offloading decisions among massive devices are coupled with each other. Meanwhile, server selection needs to be optimized according to the change of server computing resources, and channel selection needs to be optimized according to the change of channel state. Since the change of server computing resources is not in the same timescale as that of channel state, task offloading needs to be optimized in different timescales. Particularly, the largetimescale server selection is optimized in the first stage, while the smalltimescale channel selection is optimized in the second stage. Therefore, twostage task offloading problem is constructed. Second, the wireless channels are interfered by electromagnetic interference in PDIoT, and face channel fading caused by multipath transmission. It is not feasible to obtain global state information (GSI) for task offloading. Finally, the optimization of energy efficiency, transmission delay, and processing delay are coupled with each other tightly, leading to a more complex optimization problem.
Task offloading has gained considerable attention from both academia and industry. In [9], Mitsis et al. proposed a data offloading framework for UAVassisted multiaccess edge computing systems based on resource pricing and user risk perception. A usagebased pricing mechanism for users was introduced to utilize the computing power of MEC server. In [10], Chen et al. proposed an alternating minimization algorithm to achieve energyoptimal fog computing offloading by jointly optimizing offloading ratio, transmission power, local CPU computation speed, and transmission time. In [11], Maray et al. surveyed the latest research on task offloading from the aspects of offloading mechanism, offloading granularity, and offloading technology. Various task offloading mechanisms and optimization methods in different environments were discussed. In [12], Mustafa et al. divided the computation offloading into four categories, i.e., static, dynamic, full, and partial offloading, and compared the existing research from seven aspects, i.e., contribution, computation offloading, energy/battery lifetime, resource/task scheduling, cooperation, user fairness, and transmission/computation latency. In [13], Wu et al. proposed an energyefficient dynamic task offloading (EEDTO) algorithm to control the computation and communication costs for different types of applications and dynamic changes in the wireless environment. However, the above works neglected the coupling of task offloading decisions among massive devices and cannot solve access conflicts among devices. Task offloading problem can be constructed as a twoside matching problem to obtain stable task offloading strategies and cope with the access conflicts among devices.
Matching theory provides an effective approach to solve the twoside matching problem by defining the preferences of matching subjects to address access conflicts among devices, which has been widely used in solving task offloading problems [14, 15]. In [16], Shi et al. proposed a twoside matchingbased server selection algorithm to maximize the efficiency of devicetodevice content sharing. In [17], Zhou et al. proposed a task offloading algorithm based on vehicledevice matching to maximize the utility function of the base station (BS). In [18], Abedin et al. proposed a twoside matching game to solve the problem of server selection, aiming to maximize the efficiency of user resource allocation. In [19], Wang et al. considered the impact of channel selection on task offloading, and proposed a matchingbased channel selection algorithm to minimize the total energy consumption. The above works used matching theory to solve task offloading conflicts among devices, but they rely on the perfect GSI such as server states and channel states, which cannot be applied to scenarios where the global information changes rapidly and is unknown. Moreover, the above works do not consider the multitimescale joint optimization of server selection and channel selection, and the establishment of the preference list is influenced by the optimization results of other dimensions.
To solve the task offloading problem under incomplete GSI, machine learning (ML) has been applied to intelligent task offloading decision making. ML includes deep learning (DL), reinforcement learning (RL), deep reinforcement learning (DRL), and support vector machine (SVM). In [20], Jehangiri et al. proposed a mobility prediction and offloading framework that offloads computationally intensive tasks to predicted user locations using artificial neural networks. In [21], Zhou et al. proposed a task offloading strategy based on SVM, which minimizes energy consumption by optimizing clock frequency control, transmission power allocation, as well as offloading and receive power allocation strategies in edge computing scenarios. In [22], Wu et al. proposed a distributed DLdriven task offloading (DDTO) algorithm to jointly optimize the system utility and bandwidth allocation. In [23], Qu et al. proposed a deep metareinforcement learningbased offloading (DMRO) algorithm to solve the problem of limited computing resources of IoT devices and improve task processing efficiency. In [24], Chen et al. proposed a cloudedge collaborative mobile computing offloading (DRLCCMCO) algorithm based on DRL to solve the joint optimization problem of execution delay and energy consumption. The above works adopted ML algorithms to optimize task offloading decisions and further improve task offloading performances. However, the above algorithms have high computation complexity and high requirements for CPU performance, while PDIoT devices with limited CPU put forward lightweight requirements for the algorithm. Therefore, the above algorithms are not suitable for the scenario mentioned in the article. RL, as an important branch of ML, has low computation complexity, which can meet the needs of lightweight task offloading [25]. The task offloading problem can be regarded as a multiarmed bandit (MAB) problem and solved by RL [26]. greedy learning algorithm is a lowcomplexity RL algorithm that can balance the tradeoff between exploration and exploitation through the adjustment of . In [27], Li et al. proposed an interferenceaware RL algorithm to solve the joint problem of multichannel selection and data scheduling. In [28], Talekar and Terdal proposed a solution for optimal channel selection and routing and applied RL to select the best channel for routing. However, these works do not take into account the coupling between server selection and channel selection and cannot dynamically adjust the learning parameters according to the dynamic and complex communication environment to improve learning performance.
Motivated by the aforementioned challenges, we firstly construct a twostage task offloading problem, including the server selection in the first stage and the channel selection in the second stage, which have different timescales. The objective is to maximize the weighted difference between energy efficiency and delay through joint optimization of server selection and channel selection, considering the influence of electromagnetic interference and stringent quality of service (QoS) constraint. Then, we propose an MLbased twostage task offloading algorithm. Specifically, a twoside matchingbased server selection algorithm is proposed to obtain largetimescale deviceedge stable matching. For the channel selection in the second stage, we propose an adaptive greedy learning algorithm to dynamically learn optimal channel selection strategies. The main contributions of this paper are summarized as follows. (i)EnergyEfficient and LowLatency Task Offloading. Since the optimization of energy efficiency and delay are coupled, we construct the weighted difference between energy efficiency and delay to achieve the joint optimization of different QoS metrics(ii)TwoStage Task Offloading. We propose a twoside matchingbased server selection algorithm and an adaptive greedy learning algorithm to optimize largetimescale server selection in the first stage and smalltimescale channel selection in the second stage under incomplete GSI(iii)Extensive Performance Evaluation. Compared with two advanced algorithms, simulation results demonstrate that the proposed algorithm has superior performance of energy efficiency, transmission delay, and processing delay
The rest of the paper is organized as follows. Section 2 demonstrates the system model. Section 3 presents the MLbased twostage task offloading algorithm for PDIoT. The simulation results are shown in Section 4 to verify the effectiveness of the proposed algorithm. Section 5 concludes the paper.
2. System Model
The considered task offloading scenario of PDIoT is shown in Figure 1, which consists of multiple BSs and PDIoT devices. Each BS is equipped with an edge server to provide overlapping communication coverage and computing resources for devices. Each device needs to offload its task data to an edge server through a BS for processing, aiming to reduce delay and improve energy efficiency. There are PDIoT devices, edge servers, and channels. The sets are , , and , respectively.
We adopt a twotimescale model with period and slot [29]. The large timescale is period, and the small timescale is slot. There are equal periods, which are large timescales, and the set is . Each large timescale contains small timescales, i.e., slots, and the set of slots in the th period is . The total number of slots is , i.e., , and the set is . Task offloading includes two stages, i.e., largetimescale server selection and smalltimescale channel selection. The server selection variable of the device towards in the th period is defined as . indicates select , and otherwise. Define the quota of as , which represents the maximum number of devices that can be served by in each period. The channel selection variable is defined as . indicates that selects for data transmission, and otherwise. An example of twostage task offloading is shown in Figure 1. selects to offload data in the first stage, and selects for data transmission in the second stage. The main notation used in this paper is given in Table 1.
2.1. Transmission Model
Based on orthogonal frequency division multiplexing (OFDM), each device selects an orthogonal channel to offload tasks in each slot. The data transmission rate from to on in the th slot is given by where is channel bandwidth. is the signaltointerferenceplusnoise ratio (SINR), which is given by where and represent transmission power and noise power. is the channel gain between and on in the th slot. is the electromagnetic interference power.
2.2. Delay Model
Denoting the total computing resources of as , the computing resources allocated by to in the th slot is given by
Denote the size of offloaded data from in the th slot is denoted as . Then, the transmission delay of offloading data to on , and the processing delay required by to process the offloaded data of in the th slot are given by where (cycle/bit) is the CPU cycles required by to process one bit of data.
The result feedback delay is negligible compared with transmission delay and processing delay [30]. Therefore, the total delay is the sum of transmission delay and processing delay, which is given by
2.3. Energy Efficiency Model
The transmission energy consumption of in the th slot is given by
The operation energy consumption of is given by where is the circuit power of device operation.
The total energy consumption is the sum of transmission energy consumption and operation energy consumption, which is given by
Based on [31], we define energy efficiency as the amount of data transmitted per unit of energy and per unit of bandwidth (bit/(J·Hz)). Therefore, the energy efficiency of offloading data to on in the th slot is given by
2.4. Problem Formulation
In this paper, we aim to address the energyefficient and lowlatency task offloading problem in PDIoT. The objective is to maximize the weighted difference between energy efficiency and delay through joint optimization of largetimescale server selection in the first stage and smalltimescale channel selection in the second stage. The twostage task offloading problem is formulated as where is used to achieve the tradeoff between energy efficiency and delay. Specifically, when is large, the influence of energy efficiency is dominant in the optimization objective, and the proposed algorithm tends to maximize energy efficiency. When is small, the influence of delay is dominant in the optimization objective, the proposed algorithm tends to minimize delay. represents the server selection constraints. represents the channel selection constraints. denotes the task offloading reliability constraints in terms of SINR, where is the threshold.
3. MLBased TwoStage Task Offloading Optimization for PDIoT
In this section, we introduce the problem transformation and the proposed MLbased twostage task offloading optimization algorithm for PDIoT.
3.1. Problem Transformation
We transform the firststage largetimescale server selection problem of into a manytoone matching problem between devices and servers. Then, we propose a stable server selection algorithm based on twoside matching with quota to solve it. Next, the secondstage smalltimescale channel selection problem of is solved by the proposed adaptive greedy learning algorithm.
3.2. FirstStage Server Selection Based on TwoSide Matching with Quota
Based on the manytoone twoside matching with quota [32–34], each device and server need to obtain the preference values towards each other. Then, based on the obtained twoside preference values, the firststage server selection problem is solved according to the manytoone twoside matching with quota to maximize the weighted difference between energy efficiency and delay.
Theorem 1. A matching with quota is defined as : . When and in the th period, and establish a matching relationship, i.e., . Particularly, , where is the size of .
3.2.1. Preference List Construction
Based on , both devices and servers establish their matching preference lists. The preference value of for server is defined as the weighted energy efficiency, and the preference value of for is defined as the negative of total delay, which are given by where and are the empirical statistical estimates of energy efficiency and total delay in the th period, i.e.,
Based on (11) and (12), and calculate and and establish their preference lists and by sorting preference values in descending order.
3.2.2. Implementation of TwoSide Matching with Quota
The implementation of twoside matching with quota consists of three steps, which are introduced as follows.
Step 1. Initialize the sets of server selection strategy, unmatched devices, and unmatched servers as , , and .
Step 2. , and , , calculate the preference values according to (11) and (12) to obtain the preference lists and .
Step 3. First, proposes to its most preferred server based on . Afterward, calculates the sum of temporary matches and new proposals. If the sum of temporary matches and new proposals is less than , establishes temporary matches with the devices which have proposed to it. The matched devices are temporarily removed from . Otherwise, based on , establishes temporary matches with only the top devices which have proposed to it. Next, the unmatched devices are added into , and is removed from their preference lists. The matched devices are removed from . If the sum of matches for is equal to , remove from . Finally, return to Step 2, and the unmatched devices make new proposals based on the updated preference lists.
Matching iteration ends until , establishes a match with a server or its preference list .
3.3. SecondStage Channel Selection Based on Adaptive Greedy Learning
The secondstage channel selection optimization problem is transformed into an MAB problem, which is addressed by the proposed adaptive greedy learning algorithm. The MAB problem is mainly composed of decision maker, arm, and reward, which are introduced as follows: (i)Decision Maker. The decision maker generates selection decisions and constantly updates the decision by learning the reward from historical feedbacks. We define PDIoT devices as decision makers.(ii)Arm. Based on server selection, the device needs to select a channel for data transmission. Denote the set of arms as .(iii)Reward. Define as the reward of selecting in the th slot, which is given by.
The traditional greedy algorithm uses a linear method to adjust the exploration factor , which has a certain degree of blindness [35, 36]. In order to improve the exploration efficiency, we propose an adaptive greedy learning algorithm, which uses the average cumulative reward to dynamically adjust the adaptive factor to balance exploration and exploitation.
Define as the historical average reward of selecting , which is given by where represents the total times that selects until the th slot. The average cumulative reward under the previous , task offloading strategies is calculated as
Then, the adaptive exploration factor is updated as where represents the base of the logarithmic function, and .
The action decision of the adaptive greedy learning algorithm is given by where is a random number. When , the device selects the channel with the largest historical average reward. When , the device randomly selects a channel. Specifically, is equivalent to .

3.4. Implementation of TwoStage Task Offloading
The proposed MLbased twostage task offloading optimization algorithm consists of three phases, which are summarized in Algorithm 3.3.
Phase 1. Initialization. Initialize .
Phase 2. LargeTimescale FirstStage Server Selection. At the beginning of each period, each device and server construct their preference lists and , respectively. Then, perform the twoside matching process with quota based on Section 3.2 and obtain .
Phase 3. SmallTimescale SecondStage Channel Selection: In each slot, , makes action decision based on (18) and selects the corresponding channel. Then, calculates the and based on (14) and (15). Finally, update and based on (16) and (17).
The algorithm ends until .
3.5. Computation Complexity
For the firststage server selection, the computation complexity is , while the computation complexity of the exhaustivebased server selection algorithm is . When and are large enough, the computation complexity of the proposed algorithm is much lower than that of the exhaustivebased server selection algorithm. For the secondstage channel selection, the computation complexity is . In some MLbased channel selection algorithms such as DRLbased channel selection algorithm and DLbased channel selection algorithm, the computation complexity of the training of deep neural networks is . Here, is the number of training epochs, and are, respectively, the dataset size and batch size, and is the computation complexity of each training epoch. is related to many free variables such as the computation complexity of each layer, the size of convolution kernel, the number of input and output channels, and spatial dimensions of input and output feature maps. Therefore, the computation complexity of these DRL and DLbased channel selection algorithms are much higher than the proposed algorithm.
4. Simulation Results
We consider a m m transmission line monitoring scenario in PDIoT, which includes devices and edge servers colocated with BSs. The number of channels is . The devices are randomly distributed along the transmission line. Simulation parameters are summarized in Table 2 [37–39]. Two algorithms are utilized for comparison. The first one is the matching based on KuhnMunkras (MKM) [40] which aims to minimize task offloading delay. The second one is the matchingbased task offloading strategy (MBTO) [41] which aims to maximize energy efficiency through server selection optimization in each period. Both MKM and MBTO cannot achieve smalltimescale channel selection optimization.
Figure 2 shows the weighted difference between energy efficiency and delay versus time slots. Compared with MKM and MBTO, the proposed algorithm improves the weighted difference by and , respectively. The reason is that the proposed algorithm can learn the optimal channel and server selection strategy to minimize the weighted difference based on dynamic network states.
Figures 3 and 4 show the total delay and energy efficiency versus time slots, respectively. Compared with MKM, the proposed algorithm improves energy efficiency by but increases delay by . Compared with MBTO, the proposed algorithm reduces delay by but decreases energy efficiency by . The reason is that MKM and MBTO only optimize one aspect of delay or energy efficiency. The proposed algorithm can make a wellbalanced tradeoff between energy and delay by dynamically adjusting server and channel selection strategies.
Figure 5 shows the impact of the quota on transmission delay, processing delay, and total delay. The transmission delay decreases with while the processing delay increases with . When increases from to , the transmission delay decreases from ms to ms while the processing delay increases by ms. The reason is that as increases, more devices can access to nearby BSs with better transmission performance, thus reducing transmission delay. However, the server allocates less computing resources to each device, thus increasing the processing delay. The total delay increases with since the processing delay has a greater impact on the total delay.
Figure 6 shows the impact of the weight on total delay and energy efficiency. With the increase of , the energy efficiency increases obviously and finally reaches bits/(JHz). Meanwhile, the total delay decreases first and then increases gradually, finally reaching ms. The reason is that as increases, the proposed algorithm puts more emphasis on energy efficiency improvement. Devices are inclined to select the channel which can achieve a higher transmission rate, thereby reducing the transmission delay and total delay at first. However, the increasing enforces devices to select the nearby edge servers with less computing resources to improve energy efficiency, which then increases the processing delay and total delay. Therefore, the proposed algorithm can balance the tradeoff between energy efficiency and delay by adjusting the weight .
Figure 7 compares the performance of the proposed algorithm and the nonadaptive greedy algorithm. The proposed algorithm outperforms the nonadaptive greedy algorithm by . The reason is that the proposed algorithm can adjust based on the average cumulative reward to balance the tradeoff between exploration and exploitation. On the contrary, the nonadaptive greedy algorithm with fixed cannot adaptively trade off exploration and exploitation based on current reward, thereby resulting in poor learning performance and lower weighted difference.
Table 3 shows the computation complexity of different algorithms. Due to the consideration of channel selection optimization, the computation complexity of the proposed algorithm is higher than MKM and MBTO, but the proposed algorithm, respectively, improves the weighted difference between energy efficiency and delay by 39.74% and 9.96% compared with MKM and MBTO. Due to the limited computing resources of PDIoT devices, the computation complexities of DRLbased task offloading algorithm (DRLTO), federated learningbased task offloading algorithm (FLTO), and meta learningbased task offloading algorithm (MLTO) are much higher than that of the proposed algorithm.
5. Conclusion
In this paper, the energyefficient and lowlatency task offloading in PDIoT was investigated. An MLbased twostage task offloading optimization algorithm was proposed to maximize the weighted difference between energy efficiency and delay through joint optimization of largetimescale server selection and smalltimescale channel selection. Simulation results show that the proposed algorithm can improve the weighted difference by and compared with MKM and MBTO. In the future, the joint optimization of power control and computing resource allocation for multiQoS guaranteed task offloading in PDIoT will be studied.
Data Availability
No data was used for this article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Key Science and Technology Program of Haikou City under grant number 2020009, the Key Research and Development Project of Hainan Province under grant number ZDYF2021SHFZ243, the National Natural Science Foundation of China under grant number 62062030, the Scientific Research Fund Project of Hainan University under grant number KYQD (ZR)21007 and KYQD (ZR)21008.