Mobile Information Systems

Volume 2017 (2017), Article ID 7965767, 11 pages

https://doi.org/10.1155/2017/7965767

## D2D-Enabled Small Cell Network Control Scheme Based on the Dynamic Stackelberg Game

Department of Computer Science, Sogang University, 35 Baekbeom-ro (Sinsu-dong), Mapo-gu, Seoul 121-742, Republic of Korea

Correspondence should be addressed to Sungwook Kim; rk.ca.gnagos@10mikws

Received 5 July 2017; Accepted 27 September 2017; Published 6 December 2017

Academic Editor: Habib M. Fardoun

Copyright © 2017 Sungwook Kim. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

For current and future cellular networks, small cell structure with licensed and unlicensed bandwidth, caching content provisioning, and device-to-device (D2D) communications is seen as a necessary architecture. Recently, a series of control methods have been developed to address a myriad of challenges in next-generation small cell networks. In this study, we focus on the design of novel D2D-enabled small cell network control scheme by allowing caching and unlicensed D2D communications. Motivated by game theory and learning algorithm, the proposed scheme adaptively selects caching contents and splits the available bandwidth for licensed and unlicensed communications. Under dynamically changing network environments, we capture the dynamics of the network system and design a new dynamic Stackelberg game model. Based on a hierarchical and feedback based control manner, small base stations and users can be leaders or followers dynamically while improving 5G network performance. Simulations and performance analysis verify the efficiency of the proposed scheme, showing that our approach can outperform existing schemes by about 5%15% in terms of bandwidth utilization, cache hit ratio, and system throughput.

#### 1. Introduction

Today, we are witnessing a tremendous increase in mobile data traffic at a rapid pace owing to the increasing number of users and the explosive growth of mobile multimedia services. Ubiquitous devices such as smart phones have fueled the demand for data intensive applications including live video streaming, real-time social networking, and mobile gaming. With the explosive growth of multimedia data traffic, the wireless cellular networks need more bandwidth to boost their system capacity. However, the wireless bandwidth is an extremely valuable and scarce resource, and it is almost exhausted. Furthermore, classical ways of improving cellular network capacity have suffered from physical and economical limitations. Therefore, current research on 5G networks is geared towards developing intelligent ways of data dissemination by deviating from the traditional network architecture [1–3].

Small cells have been widely viewed as a key enabling technology for 5G mobile wireless networks. By densely deploying low-power low-cost small cell base stations (SBSs), network system can improve local coverage, bandwidth efficiency, network throughput, and energy efficiency [1, 2, 4]. However, small cell networks cannot solve the problem of backhaul congestion. With the rapid increase of multimedia data traffic, the pressure on the backhaul traffic overhead becomes more and more serious while jeopardizing QoS satisfaction throughout the entire network [4]. In order to solve this problem, caching mechanism in SBSs is an attractive approach to improve the transmission rate while reducing the backhaul load. In the cache-based operation, each SBS is equipped with a local cache and serves user requests using its cached contents. If the users’ requested contents already exist in the caches of SBSs, the SBSs can directly transmit the contents to users without backhaul. This approach allows the caching mechanism to shift the backhaul traffic with effective access delay [4–6].

In next-generation cellular networks, device-to-device (D2D) communication has recently attracted a substantial amount of attention from both industry and academia. D2D technique enables mobile users to communicate directly when those users are in range for direct communications [2]. Initially, D2D technique was proposed to enhance the performance of multihop systems. In a 5G cellular network, D2D communication is one of the key technologies to get to very high data rates through offloading part of the cellular traffic onto D2D communications. This approach can reduce the backhaul load without the cost of additional network infrastructure. While extensive research is targeted on addressing the many 5G networks challenges, we face a myriad of technical challenges of D2D communications [7, 8].

To provide D2D communication services, the bandwidth of SBS should be divided into two subbands, called licensed and unlicensed bands. Unlicensed band is able to be leased to the unlicensed users to perform D2D communication. The licensed users of the SBS in the licensed band are protected while the unlicensed users seek chances to transmit with a limited amount of power. However, due to the temporal fluctuations of service requests, the assumption of fixed assignment for licensed and unlicensed bands may not be practical. To get a globally desirable 5G network performance, the bandwidth in each SBS should be adaptively split to improve the system performance [9, 10].

To design a novel 5G network control scheme, we need a new control paradigm. Nowadays, the interaction between the rational agents, who are conflicting objectives, is often characterized using game theory. Game theory is the study of strategic interactions between multiple intelligent rational decision makers trying to maximize the expected value of their own payoffs. In particular, game theory has been successfully applied to wireless communications for solving competition problems on network resources [11]. In the situation of D2D-enabled SBS operation, SBSs and users are rational individuals. Motivated by the facts of 5G network system, we have adopted a game theoretic approach to develop a practical cache placement and bandwidth splitting algorithms. In this way, we are able to ease the heavy computational burden of theoretically optimal centralized solutions.

However, game theory also has its own shortcomings. First, the idea of classical game theory has mostly been developed in a perfect rational perspective. This rationality of the player requires complete information in real-world operations. However, in reality, this assumption rarely holds. Second, most game theoretic models seek one-sided or unilateral stability in a static setting. Therefore, they cannot capture the adaptation of players to change their strategies and reach an effective solution over time. Last, but not least, game theoretic methods require arduous endeavors to solve multiple high-order polynomial equations. In practical cases, it is a complex and difficult work to be solved in the real-time process [11].

In the study, we devise a new game model, called dynamic Stackelberg game, to adapt effectively to the D2D-enabled small cell network situation. Specifically, the hierarchical relationship between SBSs and users best suits the Stackelberg game model. In a classical Stackelberg game, one player acts as a leader and the rest as followers, and the main goal is to find an optimal strategy for the leader, assuming that the followers react in such a rational way that followers optimize their objective functions given the leader’s actions [11, 12]. However, in our dynamic Stackelberg game, game players can be leaders or followers dynamically. Based on the current roles, players make control decisions logically in order to pursue their own interests while learning the current system conditions. For the cache placement algorithm, each SBS is a single follower, and its corresponding users are multiple leaders. For the bandwidth splitting algorithm, each SBS is a single leader, and its corresponding users are multiple followers. Under dynamically changing 5G network environments, this dynamic and flexible approach can obtain the finest solution.

To maintain a well-balanced network performance, we use learning and bargaining algorithms in a distributed manner. By taking into account local, global, and social learning ways and simple bargaining process, individual SBSs make intelligent decisions to effectively address the caching and splitting issues. Different from existing work, we focus on design principles such as feasibility and self-adaptability to provide a desirable solution. Therefore, the major novelty of our scheme is its effectiveness for 5G network dynamics. Although several D2D-enabled small cell network control schemes have been proposed, there has been very little research by integrating game theory and learning algorithms.

##### 1.1. Contribution

Our study generalizes the cache placement and bandwidth splitting algorithms in the following aspects. To model the interaction between each SBS and users, we design a new dynamic Stackelberg game. As a follower, the SBS monitors his multiple leaders, that is, its corresponding users to decide cache contents. As a leader, the SBS splits his bandwidth for licensed and unlicensed services. Employing learning and bargaining approaches, control decisions in each algorithm are made in an adaptive online manner. Finally, a fair-balanced solution can be obtained under diversified D2D-enabled small cell network situations. In summary, the contributions of this paper are as follows.

*(i) Dynamic Stackelberg Game Model*. Motivated by hierarchical and feedback depending situations, we introduce a new game model while capturing a variety of D2D-enabled network characteristics. This approach is generic and applicable to various small cell network scenarios.

*(ii) Cache Placement Algorithm*. We design a multiple-leaders single-follower Stackelberg game to decide the SBS’s cache contents. By considering users’ external social influences, a single-follower SBS focuses on how to utilize social ties of leaders for efficient multifile dissemination. Most of the current studies ignore the social relations among mobile users.

*(iii) Bandwidth Splitting Algorithm*. We design a single-leader multiple-followers Stackelberg game to adaptively split the SBS’s bandwidth. According to an interactive learning and bargaining process, we model the responsive tradeoff between licensed and unlicensed communications.

*(iv) Implementation Practicality*. As game players, SBSs and users learn how to modify their prior knowledge and select their strategies with bounded rationality. This approach requires a lower control and computational overhead. It is practical and suitable for real-world network operations.

*(v) Solution Concept*. The main idea of our dynamic Stackelberg game lies in its responsiveness to the reciprocal combination of optimality and practicality. Instead of analyzing the equilibrium of our dynamic Stackelberg game, the main goal of this study is to investigate the potential benefit gained from practical cooperations of different control methods. The concept of our solution is to approximate the finest solution using local, global, and social learnings and simple bargaining approaches.

*(vi) Conclusions*. Numerical study shows that our dynamic Stackelberg game approach can increase the bandwidth utilization, cache hit ratio, and system throughput by 5% to 10% under different service request rates, comparing to the existing SADC [13], HSAC [14], and SDWC [15] schemes.

##### 1.2. Organization

The remainder of this article is organized as follows. In the next section, we review some related D2D-enabled small cell network control schemes and their problems. Section 3 presents our Stackelberg game model and the proposed cache placement and bandwidth splitting algorithms in detail. In particular, this section provides fresh insights into the benefits and design of a game-based various learnings and bargaining approaches. For convenience, the main steps of the proposed scheme are then listed. In Section 4, we analyze the performance of the proposed scheme, and numerical results are presented comparing with some existing methods. Finally, the paper is concluded with Section 5. In this section, we also discuss the remaining open challenges in this area along with possible solutions.

#### 2. Related Work

There has been considerable research into the design of D2D-enabled small cell network control schemes. Ma et al. proposed a contract-based cooperative spectrum sharing mechanism to exploit transmission opportunities for the D2D links while maximizing the profit of cellular links [16]. At first, they designed a cooperative relaying algorithm that employed superposition coding at both the cellular transmitters and D2D transmitters. This algorithm can maximize the data rate of the D2D links without deteriorating the performance of the cellular links. Secondly, they modeled the spectrum trading process and derived the optimal power-payment contracts for the cellular links. Finally, some numerical results on the performance of the proposed cooperative relaying scheme and optimal contracts were provided [16].

Zhao et al. considered the complex social connections in the social domain and introduced social relationships in the continuum space into the resource allocation problem for D2D communications [17]. In order to evaluate the joint optimization performance of social and physical domains qualitatively, they investigated users’ payoffs and defined the utility of each D2D communication user. To maximize the social group utility of each D2D user, a social group utility maximization game was formulated, and the Nash equilibrium of proposed game was theoretically investigated. Finally, they demonstrated numerical results, which increased the utility of overall social groups [17].

In the paper [18], a new game theoretic approach was employed to analyze the interactions and correlations among user equipment. And then, an iterative power allocation mechanism was developed to establish mutual preferences based on the nonlinear fractional programing. To match D2D communication pairs with cellular user equipment, the well-known Gale–Shapley algorithm was adopted; it can obtain a stable and weak Pareto optimal solution. Also, the proposed matching algorithm was extended to address scalability issues encountered in large-scale networks. One main focus of this study was how to establish mutual preferences from the perspectives of energy efficiency. The existence and uniqueness of the Nash equilibrium were analyzed theoretically via mathematical proofs [18].

In [19], authors attempted to analyze the impact of mobile social networks on the performance of edge caching in fog radio access networks. Based on the Markov chain, they analyzed edge caching among edge nodes. The proposed scheme in [19] computed the expectation of bandwidth consumption of radio access networks and fronthaul with edge caching, as well as the corresponding content diffusion ratio in complicated scenarios in social aware fog radio access networks. In [20], authors proposed a hierarchical game framework to investigate the distributed solution of the resource allocation problems in D2D-enabled small cell networks. This hierarchical game was consisting of two subgames: overlapping coalition formation game and Stackelberg game. The subband allocation problem was modeled as an overlapping coalition formation game, in which the D2D links in same subbands acted cooperatively to maximize the payoff sum. The interference control problem was modeled as a Stackelberg game, in which the base station acted as the leader to make decision and the D2D links acted as followers to play the best response after the leader’s move [20].

Ma et al. [13] developed a new* Socially Aware Distributed Caching* (SADC) scheme based on a decentralized learning automaton to optimize the cache placement operation in D2D-enabled cellular networks. The SADC scheme was a new and practical feedback scheme by taking into account three key factors: (i) file request probability, (ii) physical distance between D2D transmitters and receivers, and (iii) social influence. Furthermore, the SADC scheme not only considered the file request probability and the closeness of devices as measured by their distance but also took into account the social relationship between D2D communication users. Finally, they characterized the mutual impact between the contents cached in different D2D users [13].

Zhi et al. [14] designed a novel* Hierarchically Social Aware Caching* (HSAC) scheme to make mobile nodes cache for others. To address the incentivizing data cache issue, this scheme adopted a hierarchical social aware incentivized caching method based on both physical and social relationships. And then, an incentive method was proposed to ensure the maximization of benefits based on the selfish nature of nodes. In particular, this approach considered the social ties and physical distance as the factors for the cost. Finally, authors showed the existence of Nash equilibrium in this HSAC scheme and demonstrated that it can significantly reduce total cost of mobile nodes [14].

Liu et al. [15] formulated a new* Stackelberg based D2D Wireless Caching* (SDWC) scheme to solve the interests’ conflict in D2D-enabled wireless caching networks. Based on the Stackelberg game, system model was characterized by a hierarchical structure, where the base stations optimized their strategies based on the prices and then the optimal price and optimal power were derived in closed-forms. The optimal price was associated with the channel gain; a high channel gain led to a high price. Finally, the tradeoffs between power and prices were presented in the simulation results [15].

Some earlier studies [13–20] have attracted considerable attention while introducing unique challenges in handling the caching and D2D-enabled small cell network control problems. In this paper, we demonstrate that our proposed scheme significantly outperforms these existing SADC [13], HSAC [14], and SDWC [15] schemes.

#### 3. The Proposed D2D-Enabled Network Control Algorithms

In this section, we provide a brief introduction to our new game model, which forms the theoretical basis of the proposed D2D-enabled small cell network control scheme. By adopting a dynamic Stackelberg game-based approach, we design cache placement and bandwidth splitting protocols to adapt the dynamic changing 5G network environments.

##### 3.1. Dynamic Stackelberg Game Model

During the D2D-enabled small cell network operation, SBSs and user equipment (UE) make control decisions individually while taking considering their mutual relationship. This situation is well-suited for study using game theory. In this paper, we develop a new dynamic Stackelberg game model for each SBS and its corresponding UE. This game procedure consists of two phases. At the cache placement phase, each individual SBS observes the file request frequency of its corresponding UE and deploys file placement in the limited cache size. In this case, we can assume that users are multiple leaders, and SBS is a single follower, who keeps track of the availability of the cached content. Therefore, a multiple-leaders single-follower Stackelberg game is an appropriate model.

At the bandwidth spitting phase, each SBS investigates the underlaid D2D communications and splits the bandwidth for licensed and unlicensed communications services to improve communication capacity. In this case, a traditional single-leader multiple-follower Stackelberg game model is suitable; the SBS is a leader and UE is follower. In our single-leader multiple-follower Stackelberg game model, different UE is in different situations. There are two types of UE, that is, S-UE and D-UE, in the SBS coverage area. S-UE can connect to the SBS straightly with licensed bandwidth, and D-UE communicates each other without traversing the SBS with unlicensed bandwidth. As game players, SBSs and UE select their strategies to maximize their payoffs based on the interactions of feedback mechanism. At each time period of gameplay, we formally define our dynamic Stackelberg game model as follows:(i)In , represents a set of SBSs and is the set of UE; they are game players.(ii) can be divided into two subsets ; is the subset of S-UE, that is, , and is the subset of D-UEs, that is, .(iii)The bandwidth capacity of each SBS is ; it is divided up into licensed and unlicensed bandwidth bands.(iv)Each SBS has two strategy sets, that is, . is the caching placement strategies to decide cache contents and is a set of bandwidth splitting strategies for licensed and unlicensed communications.(v)In , means the th bandwidth splitting ratio for unlicensed D2D communications. If the SBS selects the strategy, the bandwidth amount is assigned to D2D communications and the remaining bandwidth amount is assigned to cellular communications.(vi) is the strategies of . decides his type, that is, S-UE or D-UE, for his communications.(vii) is the learning value for the ’s strategy ; is used to estimate the probability distribution () for the next bandwidth splitting strategy selection.(viii)In , is the payoff received by and is the payoff received by the during the D2D-enabled small cell network operation.(ix) denotes time, which is represented by a sequence of time steps with imperfect information for the dynamic Stackelberg game process.

Our dynamic Stackelberg game () is a special case of traditional Stackelberg game. To solve the joint problem of cache placement and bandwidth splitting, it is natural that is designed as a two-stage game approach. Either leaders or followers, all individual game players select their strategies independently and selfishly to maximize their payoffs. At the end of each game iteration, players examine their payoffs periodically and dynamically adapt their decisions in an entirely distributed fashion. During the step-by-step iteration, this feedback process is repeated until the best solution has been found.

##### 3.2. Cache Placement Algorithm in Dynamic Stackelberg Game

Caching technology can cache popular contents to effectively serve UE, locally. Otherwise, UE should download these files via the backhaul. Therefore, using the caching technique, backhaul overhead and access delay can be reduced while improving system performance. However, it is impossible for caching all the files due to the limited cache capacity in each SBS. Therefore, popular contents are carefully cached to achieve an effective content distribution. To collaboratively select the proper cache contents, we model the interaction between each SBS and its corresponding UE as a new dynamic Stackelberg game.

For the cache placement algorithm, we consider a commercial small cell caching system consisting of SBSs and a number of UE. By adopting the multiple-leaders single-follower Stackelberg game model, a cooperative SBS caching algorithm is developed. Commonly, a practical caching mechanism is coupled with the file placement. In our small cell network architecture, we assume that a multimedia file set consists of popular files among total multimedia files, and files in can be possibly cached in each SBS. The popularity distribution among is represented by a vector , which is frequently requested by users. Generally, the vector can be modeled by a Zipf distribution, which is a discrete probability distribution commonly used in the modeling of rare events [21].

In this study, we consider the social relations of EU and interactions among SBSs to adaptively obtain the values in . In fact, social characteristics such as the external influence for users’ relationships and ties have played a crucial role in information propagation over the Internet and will continue to shape the way information is accessed [22]. To exploit the correlation between users’ social relations, centrality can be used to identify the most influential users, who may act as a conduit for information diffusion [23, 24]. In this paper, centrality is considered weightily to estimate values. At time , the file’s value in the is defined as follows:where is the set of UE, who requests file in the . is the number of first degree social friends of . is the maximum number of in . In (1), the function represents the skewness of the distribution in ; a higher outcome corresponds to a higher file reuse. To adaptively obtain the outcome, we concentrate on the notion of global aware networking, which attracts significantly the social and behavioral communities. By considering neighbor SBSs’ file request situations, each SBS learns the global trends of UE’s propensity. Finally, is given bywhere function returns the value of file in the at time . If the file was not in at time , function returns zero. is a weighted average between local and global caching information, and and are the upper and lower bounds of function, respectively. returns the median value of all SBSs. Generally, the most popular files account for the majority of download requests. In the proposed algorithm, social and global properties of small cell network can be used to design a cache placement protocol using , , and functions. Finally, the file with a higher value corresponds to be cached in each SBS from (1) and (3).

##### 3.3. Bandwidth Splitting Algorithm in Dynamic Stackelberg Game

Recently, the traffic offloading technology is introduced to improve the system capacity significantly. It can reduce the amount of data being carried on the cellular bands, freeing bandwidth for other types of UE. However, due to the constraints of the limited bandwidth, bandwidth splitting needs to be carefully studied. In this study, we consider a scenario that the bandwidth is licensed to the SBSs, and they are willing to lease a part of assigned bandwidth to the unlicensed UE for D2D communication. In different aspects of system performance, the unlicensed bandwidth can provide excellent capacity and coverage. Therefore, bandwidth splitting technique carries critical importance for maximizing the total system capacity and QoS satisfaction of UE [9, 25].

For D2D-enabled small cell networks, we would face a two-tiered network structure where licensed bandwidth is allocated for cellular communications and unlicensed bandwidth is allocated for D2D communications. To support this mechanism, each SBS splits the total bandwidth () for two kinds of communication paradigms [9, 25]. To cope with the design challenges of bandwidth splitting, we design a single-leader multiple-followers Stackelberg game model. In this model, traffic offloading through opportunistic communications exploits D2D communications in the unlicensed bandwidth bands. However, the bandwidth splitting problem for D2D communications is generally NP-hard. For this reason, our single-leader multiple-followers Stackelberg game model is reformulated based on the reinforcement learning algorithm with low computational complexity. To fine-tune the system performance, it is a suitable approach.

Under the coexistence situation of cellular and D2D communications, UE should consider using the unlicensed bandwidth for D2D communications or the licensed bandwidth for cellular communications. From a standpoint of UE, two utility functions are defined for cellular and D2D communications. Both of them are formulated by considering the tradeoff between throughput and transmit power. For , the utility function for cellular communications and for D2D communications is given by where , are the assigned bandwidth channel for ’s cellular and D2D communications, respectively. () is the gap between uncoded M-QAM and the capacity, minus the coding gain. is the ’s signal-to-interference plus noise ratio (SINR) with power vector for all UE, and , are the ’s power level for cellular and D2D communications, respectively. As UE, the goal of each is to maximize its own payoff by selecting a strategy in ; it decides his type, that is, S-UE or D-UE. From a standpoint of SBS, utility function is designed based on the total system throughput; it is obtained from the sum of cellular and D2D communications in its coverage area. For , the utility function is defined as follows: where , are the set of S-UE type SBSs and D-UE type SBSs, respectively. To maximize , each SBS selects his strategy, which decides the amounts of licensed and unlicensed bandwidth bands. To decide effectively the bandwidth splitting policy (), we develop a new learning algorithm. In the proposed algorithm, learning is divided into two categories: local learning and global learning. Local learning refers to an insight temporal learning in its local SBS, and global learning refers to spatial leaning through neighboring SBSs. The main novelty of proposed bandwidth splitting algorithm is a joint-design manner concerning local and global learning approaches. To specify the global relationship of SBSs, the affinity indicator () between and is defined as [26]where , , and are the current traffic amount, the mean, and standard deviation of traffic history, respectively. (·) is the expectation value. To learn the local traffic situation of each individual SBS, the current value is periodically monitored. Based on the global () and local () information, we can estimate the learning values of SBS’s bandwidth splitting strategies . If the strategy is selected at time by , updates the strategy ’s learning value for the next time step as follows:where is a learning rate that models how the -values are updated. In (9), and represent local and global learning values, respectively. Therefore, is a control factor for the weighted average between different learning approaches.

Based on the values, a strategy selection distribution () for each SBS is defined. During the dynamic bandwidth splitting process, we determine as the probability distribution of ’s strategy selection at time ; it is sequentially modified over time. In , the strategy selection probability by the at time is defined asAccording to (11), the stochastically selects the strategy using his strategy selection distribution (). Based on the selected strategy, the finally attempts to adjust the system performance. By using a simple bargaining process, the carefully deliberates on his final decision. In particular, the outcome from the selected strategy is considered as the status quo point of this bargaining process. At time , this point is the vector of cellular and D2D communication payoffs achieved by with ; that is, where and are and values with the strategy at time . Based on the selection strategy , which was obtained by the learning process, selects finally a strategy to obtain the desirable best solution as follows:At each game round, the bargaining process will be taken sequentially. Through sequential bargaining process, each SBS can improve the unexpected result. In fact, the basic concept of bargaining solution has become an interesting research topic due to its many appealing properties. However, traditional bargaining approach is equal to a random optimization method in a huge space, which converges hardly [10]. Therefore, it is impractical to be implemented for real-world network operations. In this study, we effectively implement the bargaining model by adopting the learning process. It is a promising approach for practical network operations and attains better performance under diverse system environments.

##### 3.4. Main Steps of Proposed D2D-Enabled Small Cell Control Scheme

In recent years, special focus has been put on D2D-enabled cellular network system with caching SBSs to maximize the total system capacity while ensuring QoS. This approach is expected to play a crucial role in the 5G network-controlled decentralized communications. However, designing a proper combination of cache placement and bandwidth splitting algorithms is a particularly challenging problem. In this paper, we proposed a new dynamic Stackelberg game model, which is implemented as a distributed and dynamic repeated game while SBSs and UE can be leaders or followers dynamically. In the proposed scheme, individual game players can learn locally and globally the current network situation and determine their best strategies to maximize their payoffs through a step-by-step interactive game process.

Generally, well-known solution concepts of game theory are presented in closed-form expressions under the complete information. However, they cannot capture the adaptation issue of 5G network operations over time. In the point view of practical operations, our learning based solution concept is suitable for the dynamic and unknown D2D-enabled cellular network environments. In addition, we can transfer the computational burden from a central system to individual SBSs in a distributed online fashion. It is practical for real-world decision making process. The main steps of the proposed scheme are described as follows.

*Step 1. *Control parameters are determined by the simulation scenario (see Table 1).