Abstract

A new method about renewable energy cooperation among small base stations (SBSs) is proposed, which is for maximizing the energy efficiency in ultradense network (UDN). In UDN each SBS is equipped with energy harvesting (EH) unit, and the energy arrival times are modeled as a Poisson counting process. Firstly, SBSs of large traffic demands are selected as the clustering centers, and then all SBSs are clustered using dynamic k-means algorithm. Secondly, SBSs coordinate their renewable energy within each formed cluster. The process of energy cooperation among SBSs is considered as Markov decision process. -learning algorithm is utilized to optimize energy cooperation. In the algorithm there are four different actions and their corresponding reward functions. -learning explores the action as much as possible and predicts better action by calculating reward. In addition, greedy policy is used to ensure the algorithm convergence. Finally, simulation results show that the new method reduces data dimension and improves calculation speed, which furthermore improves the utilization of renewable energy and promotes the performance of UDN. Through online optimization, the proposed method can significantly improve the energy utilization rate and data transmission rate.

1. Introduction

With the rapid development of mobile network, wireless data traffic has increased exponentially. More and more small base stations (SBSs) are needed to satisfy the traffic demand, which results in ultradense network (UDN). Meanwhile energy harvesting (EH) has become a very promising technology because of its flexible deployment and renewable energy [1]. The combination between SBS and EH can make it possible to harvest energy to extend the network running time. However, there are some difficulties for EH technology [2, 3], that is, the randomness and instability of renewable energy, the limited energy storage of EH devices, and so on.

The energy cooperation in EH wireless networks has recently been studied extensively. There are lots of researches for maximizing network throughout, energy cooperation, traffic management, and so on. Energy cooperation schemes among different BSs are considered [46], and the energy efficiency (EE) is improved by adopting the evolutionary algorithms. The jointly optimal policy that maximizes sum-throughput is proposed [79]; both energy cooperation and traffic management are considered. The capacity region coincides with a traditional K-user Gaussian MAC, and users can perform energy cooperation [10]. The cooperation between primary and secondary users at information and energy levels is considered [11]. These studies mainly focus on energy cooperation between the cells or users in cellular network or the hybrid power supply system, which aims at improving the EE and traffic management. With the increase of SBSs in the future communication network, it is necessary to divide SBSs into small groups for optimal energy cooperation.

UDN has a much higher deployment density than current mobile network, which greatly improves the network throughput. Many studies on spectrum sensing, spectrum efficiency, and intercell interference have been published [1215]. EE and resource management are studied in UDN [1618], and a cluster-based EE resource allocation scheme is proposed [19]. Taking advantage of the density feature of SBS, enhancing the energy cooperation will promote the network performance. With the increasing density of SBSs, it is more efficient and practical to utilize cluster and online learning method for greater network throughput.

There are also researches about energy supply especially EH technology. A system consisting of two cooperative microgrids is considered [20], and the two microgrids exchange renewable energy through the transmission line. Two energy sources for supplying the energy required for system operation are proposed, that is, an energy harvester and a constant energy source driven by a nonrenewable resource [2123]. An energy harvesting circuit is equipped at the energy harvesting receiver and introduces various nonlinearities into the wireless power transfer [24]. A dense small cell network consisting of a set of small cells and a set of users is considered. In the network every user selects an SBS by itself, and multiple users can be served by a single SBS [25]. For simplicity, we only consider the renewable energy cooperation among clustered SBSs in UDN.

Considering that clustering technique obtains optimal energy cooperation and reinforcement learning gains greater EE, we combine EH technology with clustered energy cooperation in UDN to solve EE problems. A renewable energy cooperation management algorithm is proposed based on clustered SBSs in UDN. Firstly, according to the geographical location and traffic load, sampling technique is used to determine the centers of data division, and then all SBSs are clustered using dynamic k-means algorithm. Secondly, within each formed cluster, SBSs coordinate their renewable energy. The process of energy cooperation among SBSs is considered as MDP. -learning algorithm is adopted to optimize energy cooperation, in which greedy policy can ensure algorithm convergence. There are four different actions and their corresponding immediate reward functions for each SBS. The reward function represents the energy value after taking the corresponding action. -learning explores the action as much as possible, learns how to predict the relationship between the action and the reward, and furthermore predicts better actions by calculating rewards. At last, simulation results show that the new method reduces data dimension and improves calculation speed, which furthermore improves the utilization of renewable energy and promotes the performance of UDN. Through online optimization, the proposed method can significantly improve the energy utilization rate and data transmission rate.

This paper is organized as follows: Section 2 presents the system model according to MDP. In Section 3, we optimize the problem of energy cooperation among SBSs in UDN using clustering technology and reinforcement learning. Simulation results are given in Section 4. Section 5 concludes the paper.

2. System Model

In practice, the EH model depends on its specific implementation. There are solar panel and wind turbine-generator which can generate renewable energy, but the EH characteristics are different in both cases. The energy arrival times in the energy harvester can be modeled as a Poisson counting process [21]. Sinusoidal curve can also be selected [22]. In order to provide a general model for EH communication systems, we model it as a stochastic process in order to isolate the considered problem from specific implementation assumptions. In this paper, the location of SBSs in UDN is randomly deployed [26, 27], and each SBS is equipped with EH unit and a limited capacity battery. Assume that a limited time-slot () system, the renewable energy, and the required data are arrived at the beginning of each . The channel state information is , which is kept constant in the same . The processing of energy/data packet can be considered as one-order discrete Markov model [28]. In each , the amount of data SBS processed is , and the minimum energy required for data transmission is . The energy arrival times in the EH unit are modeled as a Poisson counting process with . The harvested energy in each TS is .

In UDN, for a certain SBS , the system parameters are given in System Parameters in UDN section.

Energy charged to the battery is , ; and energy discharged from the battery is , . At any time, SBS will charge/discharge energy to/from battery. There is at most one of and that is strictly positive, that is, .

The SBS operation in TS satisfies the following constraints ( represents battery charging efficiency):

We consider one hot region in UDN, which is shown in Figure 1. The colored SBSs are the cluster centers, which form groups with the uncolored SBSs in their circles. Other SBSs join to the nearest group. The SBSs in one cluster achieve energy cooperation. For simplicity, we only consider the scenario of one user. For the request data rate of user, we ignore the fluctuation of data services and suppose that there is the full traffic data case. The user always has data to be received, and SBSs are in full load operation.

To simplify the system model, we give one SBS energy harvesting model in UDN, which is shown in Figure 2.

To find the required energy to reliably transmit a data packet over the channel we consider Shannon’s capacity formula for Gaussian channels.

can be approximately calculated bywhere is the transmit power and is the noise power density.

is an integer multiple of the energy unit [22] and can be calculated by

In this paper, we define that EE in terms of bits/Hz/Joule is where is the sum power consumption of SBSs in one cluster, denotes the channel-gain-to-noise ratio, denotes the addition power dissipation due to SBS’s circuitry, is the static power dissipation, and is the power amplifier efficiency.

A multiuser scenario can be derived similarly. At the beginning of each , each SBS schedules its users, such that a single user is allocated to each subcarrier. Let denote that subcarrier is assigned to user , and , otherwise. Accordingly, EE can be expressed aswhere is the user number assigned for one SBS, is the subcarrier number, and denotes power allocation.where is the channel gain, is the noise power spectral density of additive white Gaussian noise, and is the bandwidth.

3. Energy Cooperation Optimization Algorithm Based on SBS Clustering and Learning Strategy

This section optimizes the problem of energy cooperation among SBSs in UDN using clustering technology and reinforcement learning.

3.1. Dynamic -Means Clustering Method for SBSs in UDN

In this paper, a cluster-based approach for maximizing EE in UDN is proposed. According to the different realization process, the common clustering methods are divided into hierarchical clustering and partition clustering. There are also clustering methods for large scale data sets. One of the most classic clustering methods is k-means, which uses cluster centers to represent the whole group. The cluster centers need to be updated repeatedly in the process of determining the final result [29, 30]. All data should be redivided before updating cluster centers, which makes k-means lose the ability to deal with very large scale data, and the execution time of these algorithms grows with the data number increases.

Since there are more SBSs than the traditional network in UDN, it is very time-consuming to run the clustering algorithm on the whole data set. To remedy this problem, sampling strategy is selected, which can greatly save storage space and reduce computation amount. Sampling technology is used to select some samples from the original data. According to the cluster result, the distribution of the original data set is estimated. In this paper, the sampling technique is used to determine the centers of data division. If the distance between SBSs is too long, energy cooperation may cause power loss and transmission delay. We cluster SBSs in UDN according to distance and traffic. The sampled SBS is selected according to SBS traffic from large one to small, and the SBS traffic refers to the average data packets sent in the past week or month. Each SBS is viewed as a data point in a two-dimensional space, and the distances between sampling SBSs are saved [31, 32].

Assume that there are SBSs in UDN. The distances between all SBSs are saved in the matrix . The dynamic -means algorithm is described as follows:(1)Select sampling SBSs (according to SBS traffic from large to small) and save their distances in the matrix ; each column represents the distances between one SBS and other sampling SBSs, , where represents the distances between sampling SBSs.(2)Find the minimum distance between sampling SBSs, .(3)Calculate the average value of each column , where represents each column vector in the distance matrix which has a “0” element, which is the distance between each point and its own.(4)Calculate the average value of all , .(5)Calculate high density radius ; is added to make big enough so as to ensure that most high density points are correctly labeled.(6)Calculate cluster radius according to and select the two furthest points from sampling SBSs as the initial cluster centers and mark to ensure that the centers come from different clusters.(7)Divide the data near the centers into two clusters according to , find the next farthest point according to the centers, mark , and divide the data again until all the data are clustered completely.(8)Cluster the remaining SBSs into the nearest center point.(9)Calculate , and . If the distance between the cluster centers is smaller than and the distance between the boundary points is smaller than , then combine the two clusters and label .(10)Give the final clustering results and the value.

How to determine value is a very difficult problem. Once the value is not reasonable, it is likely to lead to great errors in clustering. For this case, according to the data distribution properties and their distances, after a series of transformations, the final clustering number is obtained. Obtaining value is a changing, dynamic process, and there is no need to know initial experience value. The dynamical division is closer to actual demand. It solves the problem that the algorithm needs manual input and improves the automatic clustering ability.

EE is considered to be one of the main benefits from clustering architecture. When the network is dense, the benefit of clustering is the improved stability of cooperative relationships. Clusters are formed in order to maximize the time availability of clusters, hence in this paper maximizing the availability of energy cooperative. The clustering centers are all SBSs of large traffics, which is benefit for energy cooperation among SBSs of small traffics. It also ensures the supply of renewable energy and improves the data transmission rate of the network.

3.2. -Learning Approach for Energy Cooperation

Due to the instability of the renewable energy and the arbitrary distribution of SBSs, it is necessary to improve EE through energy cooperation. In this paper, we propose a renewable energy cooperation scheme among different SBSs, in which one SBS can collect/share energy from/to another SBS. The energy cooperation efficiency will be improved when the number of the cooperation SBSs is large.

Consider energy cooperation among SBSs in UDN as the finite states and discrete time MDP. , where is the finite environment state space; is the finite system action space; and , respectively, represent the state transition probability and the immediate reward of transferring the state from to by taking action . The probability and the immediate reward depend only on the current state and the selected action and are irrelevant to the past states and actions.

In the proposed model, the system state of th SBS in TS is , and action set is . At the beginning of each , SBS has four actions for each data packet, which is shown in Table 1.

Assume that the optimal state value function and the optimal action value function are and , which satisfy the Behrman optimal equation:

As a result, the optimal policy can be obtained:

The goal of MDP is to find the system’s optimal policy , which can be obtained by the optimal value function [33]. -learning algorithm is adopted, and its iterative formula is as follows:where is a state-action pair in TS , is the learning factor, and is the discount factor.

iterative learning uses the reward of state-action pair as the evaluation function. First initialize value, then determine the action in the state according to greedy policy, get the knowledge and experience of training samples , and then modify value. When agent gets the target state, the algorithm terminates one iteration loop. The algorithm starts from the initial state again until the end of learning.

-learning algorithm is applied in renewable energy cooperation within one SBS cluster based on UDN, and its process is as follows:(1)Initialize: any value, and : given values.(2)Repeat.Given initial state : .Repeat.①  Choose according to greedy policy, then obtain and .②   +   −  .③  .Until is termination state.Until all are converged.(3)Output the final policy: .

In ①, has four choices and the corresponding shown in Table 2 ( represents storage efficiency of the battery, and represents resistance loss).

Actually, four actives can be combined together:

Their values are listed in Table 3.

The reward function is actually the sum of the energy for sending data package and the energy for cooperation. In each , it encourages sending data and carrying out energy cooperation.

In -learning algorithm, actions with the highest values at a particular state should be taken at each step. The agent who rigidly follows this rule might underperform since the same decision will be investigated over and over again. In order to be exploited, the state-action pair needs to be explored firstly [34]. In this paper, we utilize greedy policy , which is commonly used during the process of state-action space exploration. It enforces sporadic jumps to suboptimal states for the exploration purposes, but also to detect changes of the environmental conditions. Whenever a decision is to be made, the one will be picked at random with the probability, which is given to the action with the highest value.

-learning converges to the optimal function. While , converges to with probability 1 [35], and the convergence rate is related to many factors. The convergence rate increases with the value of and the number of learning iterations and decreases with the number of , , and [36]. Action selection follows the greedy policy with probability at each . The exploration probability is , and the exploitation probability is .

4. Numerical Simulations

According to the traffic amount of SBS from large to small, their relative positions in UDN are marked as Table 4. All SBSs satisfy the constraints .

20 SBSs are, respectively, , , which are denoted by th data point. 20 SBSs are clustered according to dynamic k-means clustering method mentioned in Section 3.1. The sampling rate is 50%; that is, the (1–10) data points are selected. As shown in Figure 3, the cluster radius is , the number of cluster is , and the cluster centers are , , and , which are denoted as . The three final clusters are , , and .

Through dynamic k-means clustering, without knowing the value before, SBSs that have large business amount can be selected as the cluster centers. There are relatively larger energy demands for cluster centers, which can effectively improve the utilization rate of energy. In addition, -learning in this paper only considers the energy cooperation between the same clusters, which can reduce the dimension disaster problem caused by too many states.

In the numerical analysis, we take one class as an example. All parameters are based on an IEEE802.15.4e [36] communication system. The system parameters are listed in Table 5. Each time-slot is 10 ms, in which 5 ms is used to send data [30] and 5 ms to zero signal level. The channel state at TS is , . The channel state transition probability function is characterized by . The transmit power is . We consider Shannon’s capacity formula for Gaussian channels. The battery capacity is 5, that is, . Each basic energy unit is [37, 38]. The possible data packet sizes are 300 bits or 600 bits.

As shown in Figure 4, -learning approach for energy cooperation in one class is convergent. The horizontal axis is the iteration times (the time of one iteration is  ms, and it is the same as other figures), and the vertical axis is the difference between the adjacent two sampling value functions. The line uses 5-degree polynomial fitting, and the sampling interval is (it is the same as in Figures 410). The change of the value function proves the algorithm’s convergence. When the iteration number () reaches about , the function value is basically unchanged.

In Figure 5 it shows that the learning factor in -learning algorithm can influence the iteration times when the packet transmission rate reaches stable. When and , the black diamond line is basically unchanged; when and , the triangle blue line reaches stable; when and , the red star line keeps stable. We can conclude that -learning algorithm can keep stable with smaller iteration times as increases.

As shown in Figure 6, greedy policy can ensure the convergence of -learning algorithm. When uses different values, it can influence the final package transmission rate. When , the package transmission rate reaches about as increases; when , it reaches about ; when , it reaches about . We can conclude that -learning algorithm is able to learn the optimal policy with increasing accuracy as decreases.

In Figure 7 we show the effect of the max battery size on the expected data transmission rate. We can conclude that the expected data transmission rate increases with for the proposed algorithm. In our model, the system state of th SBS in TS is , in which all the parameter values are finite and discrete. If has a relatively larger value, then has more choices, which leads to increased computation and dimension disaster in learning. In this paper we make to simplify the algorithm, but we have executed exhaustive numerical simulations with different parameter settings and observed similar results.

Figure 8 displays the data transmission rate for different values. We see that the expected data transmission rate increases with . It means that the more stable the channel state is, the higher the data transmission rate is. In class , when , the data transmission rate is about  kbit/s; when , the data transmission rate is about  kbit/s. As increases, EH process becomes less random, and the proposed algorithm can better estimate its future states and adapt to it.

In Figure 9 it shows the relationship between the data transmission rate and iteration times. Energy cooperation can improve the energy utilization; that is, the data transmission rate can be higher through energy cooperation. The black diamond line which represents the rate of energy cooperation reaches about  kbit/s, and the green triangular line which represents the rate of no energy cooperation is below  kbit/s. The transmission rate through energy cooperation is about higher than no energy cooperation. It is the same with other clusters in UDN. The proposed method in this paper can significantly improve the data transmission rate.

The ratio of the network throughput to the power consumption per unit area is defined as EE (the energy efficiency). The energy efficiency metric is a performance indicator that measures the benefit-cost ratio by comparing the achievable rate to the energy costs. In Figure 10 we illustrate, together with the performance of the other approaches, the expected average energy efficiency by the proposed approach against the number of learning iterations times. It can be observed that the average EE of energy cooperation is higher than that of no energy cooperation. The black curve is more closer to the Offline-LP algorithm. The proposed method can significantly improve energy utilization rate.

5. Conclusion

This paper presents a renewable energy cooperation management algorithm based on cluster and learning strategy in UDN. Firstly, according to the geographical location and traffic load, SBSs are clustered using dynamic -means algorithm, in which sampling technology is utilized to improve computation speed and clustering effect. Secondly, within each formed cluster, SBSs coordinate their renewable energy. The process of energy cooperation is considered as MDP. -learning algorithm is adopted to optimize energy cooperation, which considers four immediate reward functions, and the convergence of the algorithm is realized by greedy policy. Thirdly, simulation results show that the new method can improve the utilization of renewable energy and promote the data transmission rate. At last, conclusion and future research directions are presented, which include energy cooperation between clusters, combination of renewable energy, smart grid, and so on.

System Parameters in UDN

TS:Time-slot
:Channel station information
:Energy harvested
:Energy for transmitting data
:Data amount to be transmitted
:The maximum capacity of the battery.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work in this paper is supported by “Beijing Natural Science Foundation (Grant no. 4164101),” “National Natural Science Foundation of China (Grants no. 61501185, no. 61377088),” “Hebei Province Natural Science Foundation (Grant no. F2016502062),” and “the Fundamental Research Funds for the Central Universities (2015MS125, 2016MS97).”