Sensor Physical Interpretation, Signal and Artificial Intelligence ProcessingView this Special Issue
Research Article | Open Access
Ying Luo, Min Zeng, Hong Jiang, Bin Han, "Energy-Efficient Time-Domain Equilibrium Scheduling and Optimization Scheme for Energy Harvesting-Powered D2D Communication", Journal of Sensors, vol. 2020, Article ID 8839681, 16 pages, 2020. https://doi.org/10.1155/2020/8839681
Energy-Efficient Time-Domain Equilibrium Scheduling and Optimization Scheme for Energy Harvesting-Powered D2D Communication
Energy Harvesting- (EH-) powered Device-to-Device (D2D) Communication underlaying Cellular Network (EH-DCCN) has been deemed as one of the basic building blocks of Internet of Things due to its green energy efficiency and adjacent communication. But available energy will be one of the biggest obstacles when implementing EH-DCCN due to the immaturity of EH technology and the volatility of environmental energy resources. To improve energy utilization, this study investigates an efficient scheduling and power allocation scheme about transmission load equilibrium in the time domain. Accordingly, a short-term Sum Energy Efficiency (stSEE) maximization problem for EH-powered D2D communication is modelled, while ensuring a fundamental transmission rate requirement of cellular users. Consequently, the optimization problem is a nonconvex mixed integer nonlinear programming problem. Thus, we propose a two-layer convex approximation iteration algorithm which can obtain a feasible quasioptimal solution for the stSEE problem. Simultaneously, a two-step heuristic algorithm in a slot-by-slot fashion is also developed to acquire a suboptimal solution without requiring statistical knowledge of channel and energy arrival processes. Simulated analysis indicates that the short-term scheduling strategy can obtain better performances in terms of energy efficiency and transmission rate than conventional real-time scheduling scheme. Besides, the maximum scheduled number of EH-D2D pairs underlaying one cellular user under different EH efficiency is analysed, which can give us a theoretical reference about the deployment of future EH-DCCN.
The dawn of Internet of Things (IoT) installed for industry production, smart housing, and environmental monitoring has given birth to billions of mobile wireless devices . Ericsson predicts that there will be about 15 billion mobile devices in 2021, and most of them will be low-power devices with short-range communication . The ever-increasing proliferation of wireless devices, together with an exponential rise in users’ data demand, is already creating an urgent need for wireless cellular networks to design some new technologies that can attain desired transmission rates and, meanwhile, can achieve green communications [3, 4]. Recent emphasis on green communications has generated great interest in the investigations of energy harvesting- (EH-) powered wireless networks [5–7]. EH technique can harvest energy from environmental energy resources to prolong the lifetime of wireless communication devices with low power consumption and limited battery life. On the other hand, Device-to-Device (D2D) communication has been viewed as a promising paradigm that can offload traffic from cellular networks by communicating directly with each other in close proximity via multiplexing the spectrum resources being assigned to cellular users (CUs) [8, 9]. Accordingly, EH-powered D2D Communication underlaying Cellular Network (EH-DCCN), which combines the characteristics of short-range transmission and green communication, will be an attractive way to keep a better transmission service quality with green energy resources. Nevertheless, this sustainable energy support technique will result in intermittent energy supply issue, which do not exist in the conventional D2D communication systems with fixed energy sources. Consequently, how to efficiently utilize and manage unreliable energy so as to satisfy different transmission demand is the most urgent challenge to achieve the green D2D communication paradigm.
1.1. Related Works
A lot of research has paid their attentions to the challenges caused by the uncertain energy supply technique from different aspects, such as access control and resource management, for the EH-DCCN.
In view of access control, Darak et al. design an online learning handover scheme between D2D mode and Radio Frequency EH (RFEH) mode based on subband statistics in D2D-RFEH communication . For cognitive and EH-based D2D transmission, Sakr and Hossain propose two spectrum access policies for the cellular network, namely, random and prioritized access policies, to evaluate transmission and outage probability for D2D and cellular users . The authors of  develop a D2D communication provided by EH Heterogeneous cellular Network (D2D-EHHN), where User Equipment Relays (UERs) harvest energy from an access point to support D2D communication. This paper derives the proper distribution of RFEH-powered UER and proposes an efficient UER selection method.
To adapt the uncertainty in energy supply, researchers have also devoted their efforts to the design of efficient resource management schemes in terms of power allocation, spectrum matching, and time sharing for EH-DCCN. Tutuncuoglu and Yener study the power allocation policies for EH-supported transmitters by optimizing the sum rate , which is similar to a one-to-one spectrum sharing model in D2D communication underlaying cellular network . Yet, the Quality of Service (QoS) requirement, which mainly refers to transmission rate demand, is not involved in the proposed policy. Similarly, the sum rate maximization problem for D2D communication under a downlink resource multiplex system in the presence of multiple CUs and EH-powered D2D links is studied in . To ensure energy-efficient spectrum resource assignment, Ding et al. investigate the energy cost minimization problem . Considering the power allocation and time-sharing spectrum occupation management, Hadzi-Velkov et al. maximize the overall cellular network transmission rate based on the statistical average of the harvested energy .
The above research works have addressed many challenges caused by the unstable and unreliable power supply of EH from different aspects. Nevertheless, those works are mainly based on a one-to-one spectrum sharing mode where one CU’s radio resource is multiplexed by one D2D pair. In this sharing mode, the CU’s spectrum will be vacant when the available energy of the EH-powered D2D pair (EH-DP) cannot meet the energy consumption requirement. Hence, by taking the high spectrum efficiency demand into account, one-to-multiple sharing mode, namely, one CU and multiple EH-DPs sharing one radio spectrum, is needed. In this way, the cellular spectrum gap caused by the energy deficiency of the single EH-DP can be filled up. In practice, multiple D2D pairs can be allowed to share the same resource with the cellular users as long as the interference of D2D communications is not harmful to the cellular links [18–20]. However, under the one-to-multiple sharing mode, due to the variation of EH efficiency, environmental energy resources, and channel status, the transmission requests, which are relied on available energy, among EH-DPs at different time slots may be unbalanced. Since the traffic is delay-tolerant, the mutual interference among users can be efficiently decreased by balancing the transmission requests among time slots and finally improve energy efficiency. The following section will take two EH-powered D2D pairs underlaying cellular network as a simple example to illustrate our main motivations.
Before describing our motivations, we must make some important statements. We only consider that the D2D user (DU) has EH capability . Meanwhile, for simplicity, suppose that the traffic pattern of each user is full buffer, and all users operate in a time-slotted fashion and are synchronous.
When multiple EH-DPs multiplex the same spectrum resource, the transmission request of EH-DPs at the different time slots may be unbalanced. As illustrated by Figure 1, two EH-DPs and are permitted to share a radio spectrum with one CU. From Figure 1(a), the available energy of users is different due to various channel interference conditions and energy conversion efficiency. When the available energy of EH-DP reaches its transmission power threshold , EH-DPs will launch the transmission request. Thus, when the two EH-DPs and initiate the transmission request at the same time slots, such as time slot 1, the conventional real-time transmission strategy will let them transmit data at the same time by the corresponding power allocation scheme under the interference constraint. However, each of them may not multiplex the spectrum resource in the next time slot (e.g., time slot 2) because of energy supply or energy consumption. However, in the short-term time-domain equilibrium strategy, as depicted in Figure 1(b), either of the two EH-powered D2D pairs can be assigned to multiplex the spectrum resource of CU in the time slot 2. As a result, the interference between the two D2D links in the time slot 1 will be eliminated. Hence, to avoid unnecessary consumption of the harvested energy, the interference among users is required to be appropriately managed in one-to-multiple sharing scenarios .
As mentioned above, by fully considering the available energy and channel status (channel status mainly refers to mutual interference conditions among users (including CU and EH-DPs) in this study), how to balance the transmission loads among different time slots under EH-DCCN with one-to-multiple sharing mode so as to improve the performance of EH-DCCN is the key concern of this study. As far as we know, the considered short-term time-domain energy-efficient equilibrium program is the first attempt to do so in EH-DCCN.
1.3. Contributions and Organizations
As previously described, this study focuses on designing an energy-efficient transmission scheduling and power allocation scheme so as to increase the performance of the EH-DCCN with the one-to-multiple radio resource sharing mode. Thus, our main contributions can be divided into three main areas:(i)Firstly, this study optimizes a short-term Sum Energy Efficiency (stSEE) problem about EH-powered D2D communication to realize the energy-efficient scheduling scheme. Simultaneously, the available energy and transmission rate constraints of both CUs and EH-DPs are also considered in the optimization problem(ii)Subsequently, a two-layer convex approximation iteration algorithm (CAIA), which consists of an outer-layer iteration algorithm (OLIA) and an inner-layer convex approximation (ILCA) algorithm, is proposed to obtain a feasible quasioptimal solution for the modelled stSEE maximization problem which is a nonconvex mixed integer nonlinear programming (MINLP) problem(iii)Thirdly, a two-step heuristic algorithm, the time-division scheduling scheme (TDSS), is also developed to acquire a suboptimal solution without requiring statistical knowledge of channel and EH processes. Remarkably, TDSS not only can acquire a suboptimal solution for the stSEE problem but also has a lower computational complexity
The rest of this study is organized as follows. In Section 2, we describe the system model in detail and formulate the stSEE maximization problem. The two feasible algorithms, CAIA and TDSS, are elaborated by Sections 3 and 4, respectively. The numerical simulation performance results and the computational complexity of the proposed algorithms are presented and analysed in Section 5. In Section 6, we conclude this study.
2. System Description and Problem Formulation
This section introduces the system model and formulates the considered resource scheduling problem. To facilitate the understanding, some important notations in this study are listed in Table 1.
2.1. System Model
In what follows, we assume that spectrum matching has already finished. This is to say that multiple EH-DPs have already been allocated to one dedicated CU in some particular optimization conditions, e.g., EE maximization . Thus, as shown in Figure 2, a typical single cellular network consisting of a Base Station (BS) and EH-DP/CU groups is considered. Suppose that the system utilizes a certain number of orthogonal spectra, then we can divide the spectra into the same number of EH-DP/CU groups. Namely, the communication links in the same group transmit on the same spectrum, and the communication links in different groups use the orthogonal one. Let represent BS, denote CU in the th EH-DP/CU group, and be a pair of D2D users in the EH-DP set of the th EH-DP/CU group. In the th EH-DP/CU group, EH-DPs share the uplink transmission link of the th CU () to transmit. According to the energy assumption in Section 1.2, the transmitter of each EH-DP is supplied by EH technique and has a battery to store harvested energy. Meanwhile, the available power of the receiver of each EH-DP is deemed as unlimited due to the low-power property of the decoding process.
Generally speaking, as illustrated by the EH-DP/CU group 2 of Figure 2, each EH-DP transmission will simultaneously cause interference to receivers of cellular and EH-DP links. Likewise, cellular transmission will generate interference to the EH-DP receivers. Assume that the entire system executes on a slot-by-slot basis. Accordingly, in any time slot, the instantaneous transmission rate of the cellular and EH-DP links can be given by and , respectively,where is the indicator parameter, 1 indicates the th D2D pair chosen to transmit in time slot , and 0 indicates the th D2D pair not chosen. and are the corresponding transmission power of CUs and EH-DPs in time slot , respectively. denotes the channel gain between nodes and . means the noise power and equals to , where is the density of noise power and is the uplink channel bandwidth of each group.
2.2. Energy Model
As demonstrated in Figure 3, at time slot , the transmitter of each EH-DP needs to harvest energy from the environmental energy resources, to store the energy in a battery, and to use the available energy to finish transmission. We study the condition that the energy arrival process in each EH-DP is i.i.d. For the th EH-DP, units of energy can be harvested in time slot , where . is the time sequence of harvested energy in time slots and obeys an i.i.d. Bernoulli process:
and are called EH efficiency of the th EH-DP. Notably, the concepts of terms of energy and power in this study are equivalent in the unit time slot.
In Figure 3, units of energy are harvested by EH technique and added to the battery at each time slot. Accordingly, units of energy will be consumed for data transmission of the th device. The existing energy of the th EH-DP in a battery is defined as . Thus, a cumulative power constraint can be expressed as follows:
Suppose that the harvested energy can be stored without any loss and used for only communication purposes from the battery. Meanwhile, the battery capacity is large enough to hold every quanta of harvested energy. This assumption is especially valid for the current state of technology in which batteries have very large capacities compared to the energy harvesting efficiency . Furthermore, assume that all state information including Channel State Information (CSI) and Energy State Information (ESI) can be obtained by BS so that BS has the control capability in terms of transmission scheduling and power allocation [24, 25].
2.3. Mathematical Model
The short-term Sum Energy Efficiency (stSEE) for EH-powered D2D communication optimization problem is formulated as :where is the indicator parameter set about whether the th EH-DP is allowed to transmit or not in an instantaneous time slot . and are the sets of transmission power of CU and EH-DPs, respectively. To avoid serious mutual interference, as represented by (4a) and (4d), the maximal transmission power in each time slot should be limited at the CU and EH-DP side, respectively. In this study, multiple EH-DPs can share the CU’s uplink channel resource to transmit. Thus, the mathematical model should guarantee the minimum QoS of CU. So, (4b) defines a threshold about transmission rate demand for CU. Similarly, as (4e) shows, the chosen EH-DPs allowed to transmit in the th time slot must have a minimum transmission rate requirement. At last, (4c) denotes the available energy constraint of EH-DP.
2.4. Problem Decoupling
The maximization problem can be decoupled into subproblems according to the spectrum orthogonality. Hence, for any of EH-DP/CU groups , we have following optimization problem :where , , and .
3. Two-Layer Convex Approximation Algorithm (TLCA)
As described in , some of the variables (the components of and ) can be real-valued, whereas the other variables (the components of ) are binary-valued. Furthermore, the optimization utility function (5) and restraints (4b) and (4e) depend on and , which have nonconvex feature (the simple proof of the nonconvex ofandcan be seen in Appendix A). So, (5) is a nonconvex MINLP problem, by which computational complexity is NP-hard. An intuitive proof of NP-hardness is that MINLP includes ILP problem (Formula (5) can be reduced to an ILP problem when the power allocation variablesandare fixed), which has been proved to be NP-hard [26, 27]. Based on the above discussion, we design a two-layer convex approximation iteration algorithm (CAIA), which contains an outer-layer iteration algorithm (OLIA) and an inner-layer convex approximation (ILCA) algorithm, to obtain a feasible quasioptimal solution. The OLIA first equivalently transform the fractional programming problem. Secondly, the ILCA is implemented to approximately convert the nonconvex MINLP optimization into a convex one.
3.1. Outer-Layer Iteration Algorithm (OLIA)
First of all, the target of , (5), is a nonlinear fractional programming paradigm , which can be transformed into an equivalent multiobjective program by the Dinkelbach method. For easier description, we use to represent the feasible solution set of problem (5). Let denote the maximum stSEE of EH-DP communication in the th EH-DP/CU group. Then, we have the following definition:
Accordingly, the following theorem can be ready to present.
Theorem 1. The maximum stSEE can be achieved if and only if
Proof. The proof is similar to the proof in .
Hence, formula (7) can be addressed by an iterative process, which is demonstrated by Algorithm 1. Define as the number of iterations, as the instantaneous EE for the th EH-DP/CU group in the th iteration, and as the convergence threshold.
Although problem (Algorithm 1) is equivalent to problem (5), which is mainly transferred by Dinkelbach’s theorem, problem (Algorithm 1) is also a nonconvex MINLP formulation. Hence, to handle this situation, we propose an inner-layer convex approximation algorithm to convert (Algorithm 1) into a convex one.
3.2. Inner-Layer Convex Approximation (ILCA)
For convenience, let represent in each iteration of OLIA. ILCA should perform the following three steps to convert the nonconvex MINLP problem (Algorithm 1) into a convex one. For the first step, the value of is relaxed into a continuous interval , and is substituted by :where and shown by (9) and (10) are the equivalent transformation functions of and according to , respectively, and are related to and :
According to the above equivalent substitution, the optimization problem (Algorithm 1) can be equivalently solved by finding solutions about variables of , , and .
The bound of approximation rate is proven to be tight and has low complexity in a high-SINR regime (i.e., ). At this moment, and , . To obtain the tightened lower bound, we need an iteration algorithm (such as step 2∼step 6 in p.3751 of ) to make the approximation reach a high-SINR one.
For the third step, we perform some equivalent substitution of variables by equations of . Consequently, according to the above three steps, problem (Algorithm 1) can be approximately transformed into the following convex optimization formulation:where and are the tight approximation about and after the second and third steps and denoted by (13) and (14), respectively,where the updating of , and , is the same. Obviously, according to the convexity of log-sum-exp [29, 30], problem (12) can easily proof to be a convex one. As a result, we can utilize one of the typical convex optimization algorithms to solve it easily and efficiently. When the solutions of problem (12) are obtained, we can convert the solving variables of original problem (5) back by using equations when is no less than zero, and otherwise, .
Even though CAIA can obtain a quasioptimal solution for the original problem, there are two key obstacles to practically implement the proposed algorithm. Firstly, the iteration complexity of CAIA is one of the key obstacles to implement in the LTE (Long-Term Evolution) system which requires the scheduling period in milliseconds . Secondly, the overall CSI and ESI during a period of time are hard to obtain practically. Thus, we propose a heuristic algorithm, which is named time-division scheduling scheme (TDSS), to obtain a suboptimal solution with low computational complexity.
4. The Time-Division Scheduling Scheme (TDSS)
The complexity of CAIA and the difficulty obtained in the overall ESI and CSI promote us to design a heuristic algorithm to solve the stSEE problem. Although the ESI of all D2D pairs and the CSI of all involved communication links during a period of time are hard to obtain, the latest CSI and ESI (such as the next time slot) can be acquired through some prediction algorithms. For example, some environmental sources’ (e.g., solar and wind) behavior can be predicted through the expected availability at a given time within some error margin . Similarly, channel prediction is feasible and accurate if the predicted frequency is much higher than the channel changing time . It is important to note that the prediction algorithms are not in our consideration.
With the CSI and ESI of the latest time slot, we can simply balance the transmission requirements between two adjacent time slots and hence increase the network performance. Accordingly, a heuristic algorithm, which is called TDSS, is proposed. TDSS can decouple the stSEE problem into two steps: D2D Pairs Choosing Strategy (DPCS) and power allocation strategy (PAS). As the pseudocode of Algorithm 2 shows, firstly, DPCS determines the corresponding EH-DPs in the set of to multiplex the channel resource in each time slot for the th EH-DP/CU group, where . After that, PAS allocates the corresponding power for the CU and the chosen EH-DPs. In other words, in each time slot, the two-step scheme firstly determines the indicative factor for EH-DPs in the th group and then allocates and for the CU and the chosen EH-DPs. In the following two subsections, the detailed algorithm steps of DPCS and PAS will be described.
4.1. D2D Pairs Choosing Strategy (DPCS)
The DPCS is the first procedure in TDSS is and aimed at choosing the proper EH-DPs to multiplex the channel resource of CU at each time slot with the intention of load balance. The purpose of load balance is to schedule the transmission requirements between two adjacent time slots so as to decrease the interference between EH-DPs.
The design of DPCS is inspired by a basic characteristic of the optimization problem, which comes from the transmission rate restraints of CU and EH-DP. It means that those chosen EH-DPs in each time slot should satisfy a basic available energy threshold . If not, the EH-DP cannot be a candidate to multiplex the channel resource. Based on this, is expressed by Corollary 4 and is derived in Appendix A.
Corollary 4. In order to ensure the PAS has a feasible solution set, the available power of the chosen EH-DP battery must satisfy a minimum value:wherewhere .
Proof. The proof of this corollary is provided in Appendix B.
Therefore, based on the minimum energy threshold and the latest ESI prediction value , the proposed DPCS has three key steps to determine the candidate EH-DPs for each EH-DP/CU group as illustrated in Figure 4 and Algorithm 3.
Step 1. With the current available energy and minimum energy consumption threshold , we can pick out candidate EH-DPs from the set in an instantaneous time slot for the th EH-DP/CU group. As Figure 4 illustrates, the EH-DPs with the red color are candidates which meet the above basic energy demand.
Step 2. If candidate EH-DPs can multiplex the channel resource to transmit at the th time slot, unit power will be consumed at least. According to the power update rule , candidate EH-DPs can also be picked out from the set at time slot according to and .
Step 3. The number of candidate D2D pairs between the two time slots can be balanced for the purpose of decreasing the interference among D2D pairs. For example, in Figure 4, the current and next EH-DP candidates are 3 and 1, respectively. Thus, the number of transmitted D2D pairs of the two adjacent time slots can be evenly assigned to 2. In other words, if the number difference of candidate D2D pairs between the current time slot and the adjacent next time slot is larger than or equal to 2, we can execute the load balance procedure. Notably, we can only schedule the condition that the service requirements in the current time is larger than the next due to the store-and-use characteristic of energy harvesting . After that, in time slot , DPCS will choose the average assignment number of D2D candidates to multiplex CU channel according to the principles of lower interference and larger transmission rate.
After the DPCS, EH-DPs can be selected from to multiplex the channel resource of CU in th EH-DP/CU group. Therefore, , where is the set of the selected EH-DPs in , is set to 1. In the same way, is set to 0.
4.2. Power Allocation Strategy (PAS)
After determining the binary indicator variables , the optimal power should be allocated for each transmission node (CUs and chosen EH-DPs) by maximizing the EE of the D2D communication, while guaranteeing the CUs’ transmission service quality. Thus, the optimization problem will become an EE maximization problem as stated in :where and are the relative transmission rate equations about and , respectively, and can be obtained by (9) and (10). Notably, the EH-DPs belong to the set of at this time. The constraint (17c) is the available energy of each chosen D2D pair in battery. (17d) and (17e) are the constraints of maximum transmission power and minimum transmission rate of the relative choosing EH-DPs, respectively. The similar restraints for CU are (17a) and (17b).
Remarkably, the objective function is a fractional nonconvex optimization problem because the numerator of objective function of and the constraints (17b) and (17e) are nonconvex. Thus, it is difficult to find a solution for the objective optimization problem. However, we can utilize the same convex approximation approach as CAIA used to obtain a tight lower-bound convex approximation of the numerator in the nonconcave formula (17). Hence, inequality (11) and equality substitution can convert to a lower bound which is illustrated by :where and are shown by (19) and (20), respectively. Similarly, and can be seen in (13) and (14), respectively. As shows, it is a fractional optimization problem. As we know, log-sum-exp is convex. So, the function is jointly concave about parameters and . Besides, the function can be easily proved to be jointly convex about parameters and . Likewise, the constraints (18b) and (18e) are also jointly convex functions. Thus, is a typical fractional optimization problem and can be solved by the Dinkelbach algorithm . So, the main power allocation algorithm flow of PAS can be summarized by Algorithm 4.
5. Simulation Results
In this section, our goals are to verify the effectiveness of our proposed algorithms and study the impact of EH efficiency factors on system performance. Consequently, we will present numerical results to evaluate the proposed algorithm (CAIA) and the suboptimal heuristic algorithm (TDSS) in aspects of average (avg.) achievable EE and transmission rate of EH-DP. Furthermore, to assess the proposed algorithms, we will compare the proposed algorithms with the real-time transmission strategy (RTS), the Exhaustive Searching Scheme (ESS), and the -Learning Approach (QLA).
In each time slot, once the EH-DP has enough energy to satisfy its transmission power demand, the RTS scheme will let EH-DP transmit directly by executing the power allocation algorithm for the CUs and the chosen EH-DPs. ESS can enumerate all possible solutions during the short-term time horizon and thus attain an optimal solution. QLA, a well-known reinforcement learning program, is widely used to solve some long-term or short-term utilities [36, 37]. To better assess the effectiveness of the proposed algorithms, we implement QLA as a centralized one.
5.1. Simulation Setup
The performance of the compared methods and the proposed algorithms in this study is evaluated via simulations. Above all, the considered cellular network with radius of 800 meters is demonstrated in Figure 1. The central controller, BS, has the capability of acquiring all users’ position, and it is always located at the centre of this cellular area. Suppose that there exists EH-DP/CU groups. For the th group (), EH-DPs can multiplex the uplink channel radio resource of CU , where is randomly selected from 2 to 8. All users are randomly deployed in the cellular zone. And meanwhile, the distance between transmitter and receiver of each EH-DP pair is randomly selected between [20, 50] meters . Remarkably, to avoid serious mutual interference between each other, a minimum distance threshold, which equals to 200 meters, should be obeyed between CU and EH-DPs [38, 39]. Similarly, the distance among EH-DPs in each group must be larger than 100 meters. The energy arrival process for every EH-DP is assumed to be i.i.d. Bernoulli sample, which conforms to formula (2). The other network parameters used in this study are listed in Table 2.
We repeat each simulation scenario with different energy arrival probabilities; for example, is 0.3 among EH-DPs, 100 times and average the results.
5.2. Complexity Comparison
The computational complexity is an important aspect to better assess the effectiveness of the above algorithms. First of all, there are two important things that should be mentioned.(i)The procedure of algorithms (CAIA, TDSS, and RTS) includes convex optimization of nonlinear programming. So, we use the -approximate solution to measure the computational complexity. It means that the computational complexity of algorithms (CAIA, TDSS, and RTS) is the needed iteration times when the solution reaches condition, such as Algorithms 1 and 4(ii)We calculate and express the worst-case computational complexity of all algorithms for a fair comparison
It is hard to have a thorough and correct analysis of complexity of convex nonlinear programming problems. However, generally speaking, the complexity is related to the space required to store input data and to the running time of the algorithm until a solution is found . Besides, Vidal et al.  produced that the complexity of an -approximate solution for the continuous convex problem is , where is the number of variables, is the number of constraints, and is the constraint bound. Moreover, as we all know, the complexity of QLA is related to the size of state-action space . Thus, the complexity of the above-mentioned algorithms can be depicted in Table 3.