#### Abstract

This paper introduces a hierarchical Wireless Random Access scheme based on power control where intelligence is split among the mobile users in order to drive the outcome of the system towards an efficient point. The hierarchical game is obtained by introducing a special user who plays the role of altruistic leader whereas the other users assume the role of followers. We define the power control scheme in such a way that the leader_first chooses the lowest power to transmit its packets among available levels whereas the followers re-transmit by randomly choosing a power level picked from higher distinct power levels. Using a 3D Markovian model, we compute the steady state of the system and derive the average system throughput and expected packet transmission delay. Our numerical results show that the proposed scheme considerably improves the global performance of the system avoiding the well known throughput collapse at high loads commonly characterizing most random channel access mechanisms.

#### 1. Introduction

In wireless communication multiple users often share a common channel and contend for the channel. To resolve the contention problem many different medium access control (MAC) protocols have been designed assuming that the users comply with the protocol. Unfortunately, the self-interested and strategic user might manipulate the protocol in order to obtain a larger share of the channel resource at the expense of the other users. In the literature, using game theory approaches, numerous studies have shown that the presence of self-interested users usually leads to a suboptimal use of the channel resource and to the degradation of the performance of MAC protocols [1–4], where the prisoners dilemma Game Phenomenon arises in a noncooperative game. For example, in [4], the IEEE 802.11 distributed MAC protocol, DCF, and its enhanced version, EDCF; the authors have shown that competition among selfish users can lead to an inefficient use of the shared channel in Nash equilibria. Similarly, a prisoner’s dilemma phenomenon arises in a noncooperative game for slotted Aloha protocols [1–3, 5].

In an ever increasing demand for wireless communications resources, the design of cooperative strategies will be a must for the sustainability and offerings of wireless services. In this paper, we introduce a special user called leader as an additional user, making it access the channel with certain probability, whereas the other users are considered to be followers. We formulate the interaction between the followers and the single leader as a Stackelberg game and propose a solution concept based on “Stackelberg equilibrium.” The main role of the leader is to sustain cooperation in terms of transmission probabilities of concurrent users (followers). The game is played sequentially such that the followers play knowing the decision of the leader. Moreover, the leader chooses its strategy knowing that the followers will play depending on their chosen strategy (best response). Thus the leader will choose its strategy and will communicate its decision to all the other users (followers) via a mediator. Given that the followers play after the leader, the followers will take into account the strategy played by the leader to decide their own strategies. Thus the leader is different from the followers in that its transmission probability will be based on the transmission probabilities of the followers. This feature may be effectively exploited by the leader, enabling it to act as police; that is, if the followers access the channel excessively, the leader can intervene and punish them by choosing its own transmission probability. It may in fact choose to transmit at a probability close to one, to take over the channel, thus reducing the success rates of all the other users. Such strategy has been studied in [6], whereas the followers are penalized once the leader decides to take over the channel by choosing to transmit with a probability close to one.

In this paper, to avoid the network collapse and in order to take advantage from such hierarchy, we develop a hierarchical scheme based on power diversity. We introduce a prioritization between the leader and the followers in terms of transmit power. In particular we assume that the leader first chooses the lowest power among available levels whereas the followers retransmit at random power levels picked from distinct power levels. The use of multiple power levels results in a general capture effect allowing the packet to be decoded whenever the signal to interference plus noise ratio is higher than a given threshold.

Here the leader is impulsively altruistic; that is, it punishes the follower’s decisions at its own cost, which is a fundamental mechanism enabling cooperation between users (followers) and aiming to improve the global network throughput. In our setting the role of the base station is to measure the aggregate success rate over the network; see Figure 1. This information is used to decide if the followers should keep their actual retransmission strategy or if they should change it according to the congestion level of the network.

#### 2. State of the Art

Stackelberg game provides a general framework to analyse and design hierarchical interaction among selfish users [7]. This kind of game arises naturally on some context of practical interest. For example, hierarchy is naturally present in the context of advanced wireless communications systems where primary (licensed) users and secondary (unlicensed) users have the ability of sensing their environment using cognitive radio capabilities [8]. Such approach has been used recently on the performance evaluation of communication systems [9]. Specifically, the application of such kind of mechanism in a power control game has proved much more effective on improving the performance of MAC protocols than the use of game based on Nash equilibrium [10]. A closely related incentive scheme has been proposed in [11] and has been applied to MAC protocols [12]. The authors in [13] proposed a Stackelberg game in which the leader chooses the optimal strategy before followers make their own decisions. In the proposed game, the leader sets an intervention rule first and then implements its intervention after users choose their strategies. Alternatively, the proposed Stackelberg game can be considered as a generalized Stackelberg game in which there are multiple leaders (users) and a single follower. In [14] the authors proposed an efficient power control algorithm for Wireless Mesh Networks based on Stackelberg game in order to optimize the aggregate throughput and energy efficient. The work in [15] studied the selfish behavior of slotted Aloha using Stackelberg game concept. The authors proposed a model and evaluated the throughput that can be achieved in a system of nodes using generalized Aloha protocols where the nodes transmit data using a two-state decision system, under the pricing mechanism for different budget constraint. Authors in [16] analyzed a collision channel access game and proposed a methodology that transforms the noncooperative game into a Stackelberg game; this allowed them to overcome the deficiency of the Nash equilibria of the original game. The authors associate with an additional user, called “manager” or also leader, an administrative role based on the intervention function. This function is simply the effective transmission rate of the manager. It is used to regulate the access probabilities of competitors users and make them transmit at some target transmission rate vector defined by the manager. The work [6] studied the slotted Aloha protocol as a Stackelberg game, in which the feasible outcome can be achieved at Stackelberg equilibrium with independent strategy. The authors have introduced a virtual controller which is played by the base station itself. Under the proposed game, the Stackelberg solution leads to a network collapse, in particular for average and relatively high loads.

Recently, the research community tends to believe that the performance of future networks will be enhanced by the use of a cross-layer design involving the physical, MAC, network, and potentially higher layer protocols. Indeed, a cross-layer approach transports feedback dynamically via the layer boundaries to compensate the QoS. Power diversity is studied with a general capture model based on the experienced signal to noise plus interference ratio in cooperative and noncooperative framework. In [17, 18], power diversity is studied with a general capture model based on the experienced signal to noise plus interference ratio. In [17] it was shown that the system capacity is enhanced from 0.37 to 0.60 as the number of available power levels increases. In [19, 20], authors have introduced a random power level scheme for slotted Aloha. These new algorithms have allowed them to improve the network throughput and reduce the packet delays. However, the capture model considered in [19] is not realistic; authors assumed therein that when a unique mobile chooses the highest power, compared to other mobiles, its (re)transmission is successful independently of the power levels used by the other users. However, this assumption may not be always true. Indeed, the aggregate signal of other mobiles may jam the signal of the tagged mobile, that is, of the one using the highest power level. A general capture model has been analyzed and considered in [21], where a mobile station transmits successfully if its instantaneous SINR is higher than the target threshold. In addition to the power diversity, the authors have associated a cost with each transmission attempt. The results show that such pricing could be used to enforce an equilibrium whose throughput corresponds to the team optimal solution [17].

In this paper we develop an enhanced hierarchical scheme of the Wireless Random Access (WRA) algorithm, where power diversity and a general capture effect are considered. In this scheme an additional user called “leader” is placed in the network. A hierarchical scheme proves to be more robust than pricing since under the former, users cannot avoid the intervention of the leader user while, under the latter, they may be able to find a way to avoid to be charged. The implementation of the hierarchical scheme requires simply the inclusion of a device capable of monitoring the network. Such functionality can simply be played by the base station itself.

The rest of the paper is organized as follows. We first describe the design principles of multiple power levels as related to its use in conjunction with the WRA mechanism in Section 3. In this way, we set the basis to understand the existing interaction between the physical and the MAC layers of the protocol stack following a cross-layer approach. In Section 4, we present an overview of the noncooperative game using the Nash solution concept. Next we provide the Stackelberg formulation in Section 5. In particular we introduce the equilibrium analysis principles and the performance metrics of interest used in the evaluation of the mechanisms under study. Section 6 provides a numerical evaluation of the proposed mechanism. Finally, our concluding remarks are drawn in Section 7.

#### 3. Cross-Layer Design

We consider a wireless multiple access system composed of one central receiver (base station BS) and geographically dispersed mobile stations communicating with the BS. The users use a common wireless access to send data to the base station. Time is divided into multiple equal and synchronized slots. Transmission feedback (success or collision) is received at the end of the current slot. As mentioned before, the use of multiple power levels, also referred to as power diversity, is to be considered in conjunction with the Wireless Random Access mechanism. Throughout this paper, we will refer to the resulting mechanism as the Multiple Power Level Wireless Random Access mechanism or simply MPL-WRA. The use of such facility will give place to a capture effect. Due to this effect, a receiver may be able to decode a message even in the presence of a collision. In fact, the unsuccessful concurrent messages are lost and treated as interference.

##### 3.1. Multiple Power Levels

In this MAC/physical cross-layer design, the leader contending for a message transmission chooses the lowest power level among available levels , whereas the followers retransmit at the random power levels picked from higher distinct power levels. The power levels selection follows the probability vector , where th entry is the probability of choosing the power level . We consider a general capture model where a message transmitted by a user is received successfully when and only when its signal to interference plus noise ratio (SINR) exceeds some given threshold . Let be the variability of the thermal noise and denote by the vector of selected power level at the beginning of the current slot. Note that the components of are selected from the vector . The received power on the BS can be related to the transmitted power by the propagation relation , where is the channel gain experienced by the base station when receiving a message transmitted by user . Note that does not depend on the value of using power level . Thus, the instantaneous SINR of user transmitting at power level experienced by the receiver isWe denote by the probability of a successful transmission when attempt transmission. is calculated using the following expression:whereas , , and . is Kronecker’s delta function and is the Heaviside function (unit step function) and are given by the following expressions:Computing the success probability is a challenging task. The difficulty of formula (2) is to consider one single transmitting MS at the highest power level and list all the cases where the remaining mobile stations transmit at lower power levels. This corresponds exactly to the set of partitions (a partition of a positive integer is a way of writing as a sum of positive integers) of the positive integer considering all possible permutations. Generating all the partitions of an integer has been widely studied in the literature and several algorithms have been proposed; for example, see [22]. The computational complexity of such algorithms is very expensive and may take long time to compute the set of all partitions and their permutations. Fortunately, in our model, the success probability depends on none of the following: the instantaneous state of the system, the arrival probability, and the retransmission probability. Henceforth, the success probability vector , , can be computed once and reused to derive the transition matrix.

##### 3.2. MAC Protocol

We consider a Wireless Random Access mechanism in which each user handles a buffer sufficient to store exactly one packet; thus no new packet is generated by user till its successful transmission or drop of the current message. A user can be in one of two distinguished states: “I” (idle) or “T” (transmitting). At the beginning of each slot and being in state “I,” a user has no packet to transmit and does generate a new packet with probability . Users at state “I” that generate a new message switch to state “T” at the beginning of the next slot. Being at state “T,” the tagged user attempts to transmit with probability , until its message is successfully transmitted. If two or more users at state “T” attempt to simultaneously access the channel, then the messages collide. In the case that the messages could not be properly decoded, then the corresponding mobile users immediately return to state “T.” All corrupted messages get backlogged and are retransmitted after some random time. Whereas if exactly one mobile station attempts a transmission or if the SINR of the received signal is higher than certain threshold from state “T” then the transmission is successful, and the corresponding mobile user jumps to state “I.”

*Remark 1. *We consider users without buffer; that is, users do not generate new packet till the current one is successfully transmitted. Indeed quite frequently one uses the WRA protocol for sporadic transmissions of signaling packets such as packets for making reservation for a dedicated channel for other transmissions; see the description of the SPADE on demand transmission protocol for satellite communications in [23]. In the context of signaling, it is natural to assume that a source does not start generating a new signaling packet (e.g., a new reservation) as long as the current signaling packet is not transmitted. In that case, the process of attempts to retransmit a new packet from a source after the previous packet has been successfully transmitted coincides with our no-buffer model.

Next we present an overview of the noncooperative game when each user seeks to maximize its own network’s throughput.

#### 4. Overview on the Noncooperative Game

In the noncooperative game, the payoff of user depends not only on its own decision but also on the actions of other (adversarial) mobile users. A typical conflict situation is then induced, and game theory tools seem to be suitable to analyze such scenarios. The primary focus of this section is to build a noncooperative framework for WRA, in which each user optimizes its own payoff. We start by presenting the Markovian model for WRA mechanism and then analyze the noncooperative game.

The elements of our contention game are as follows:(i)We consider a set of bufferless mobile users with cardinality . Each user is labeled by an integer from to .(ii)Messages arrive from higher layers of each mobile station following a Bernoulli process with parameter (nonsaturated regime).(iii)The state of the system is given by the number of mobile stations in state “T,” that is, backlogged messages and messages that will be transmitted for the first time.(iv)Mobile user transmits its messages with probability in every slot.(v)Each mobile station has two actions (transmit or not) and its retransmission probability is considered to be its strategy.(vi)For simplicity, we first assume that transmissions are cost-free.(vii)The objective function of each user is denoted by . It can be either individual throughput or alternatively minus expected delay at user .Let be the policy vector of retransmission probabilities for all users. Under rationality, each user seeks to maximize its own function utility.

We use as the state of the system the number of users in state “T” among the set (denoted by and takes value in ) and the number of messages (either backlogged or first time transmission) of tagged user (denoted by and takes value in ) at the beginning of a slot. For any choice of values , the state process is a Markov chain that contains a single ergodic subchain (and possibly transient states as well). Indeed, it is easy to check that, conditioning on the actual state of the system, the future and the past are mutually independent (Markov property). We denote by the probability that the system jumps from state to state . Let be the corresponding vector of the steady state probabilities where its th entry denotes the probability that the state of the system is . The only point where the Markov chain does not have a single stationary distribution is at . The outcome of any instance of the game is a Nash equilibrium (if it exists).

*Definition 2. *The strategy profile is a Nash equilibrium if no mobile user can improve its utility by unilateral deviation; namely,

Although the adoption of a different retransmission probability for each user provides a degree of freedom that could be beneficial for the system performance, we will be restricted to find a symmetric policy where all mobile stations are payoff-balanced. Assume that there are actual backlogged users/messages among the set , and all use the same value as retransmission probability. Let be the probability that out of the () backlogged messages are retransmitted in the current slot. ThenSimilarly, let denote the probability that unbacklogged mobile stations among the set generate new messages in the current slot. Thus

##### 4.1. Equilibrium Analysis

Define to be a retransmission policy where each mobile user retransmits at any slot with probability for all and where tagged user retransmits with probability . We restrict our studty to a symmetric policy where the actual transmission probability is the same for all users but user , . We then identify, with some abuse of notation, the retransmission probabilities of all users by and the retransmission probability of user by .

Define the set as the set of best response strategies of tagged user ; it can be written as where denotes the policy where all mobile users in retransmit with probability and the maximization is taken with respect to . Therefore, the strategy profile is a symmetric Nash equilibrium if and only if

##### 4.2. Steady State and Performance Metrics

Denote by the steady state of the Markov chain where is the number of backlogged messages among the mobile stations and is the binary number of the backlogged messages of tagged user .

We define the average throughput of the tagged MS as the sample average number of messages that are successfully transmitted by the user. Using the rate balance equation at steady state, we can easily derive the expression of the throughput of tagged user as follows:As mentioned in [1, 19], the Nash equilibrium in such a game may be inefficient and may penalize the network performance. Indeed, the mobile stations become more and more aggressive as the arrival probability increases which results in a dramatic decrease in the aggregate average network throughput. Moreover, the equilibrium retransmission quickly goes to 1 when the number of mobile users increases. We note that a similar aggressive behavior at equilibrium has been observed in [24] in the context of flow control by several competing mobile users that share a common drop tail buffer. However in that context, the most aggressive behavior (of transmission at maximum rate) is the “equilibrium” solution for* any arrival rate*, and not just at high rates as in our case. We may thus wonder why retransmission probabilities of 1 are not an equilibrium in our game (in the case of light traffic). An intuitive explanation could be that if a mobile station deviates and retransmits with probability 1 (while others continue to retransmit with the equilibrium probability ), the congestion level in the system (i.e., the number of backlogged messages) increases. This situation induces more retransmissions from other mobile stations, which then results in more collisions of messages from the deviating MSs and a degradation on its own payoff.

#### 5. A Stackelberg Formulation

We introduce a user called leader, who will play the role of the altruistic user [25]. In the case of a wireless infrastructure setup as the one under study, the role of the user can be played by the base station itself. The main role of the leader is to sustain a partial cooperation in terms of retransmission probabilities of the followers. The leader fixes the optimal strategy and lets the followers optimize their own utility according to the leader’s strategy. The leader is cognizant of the noncooperative behavior of the other users and performs its functions’ utility based on this information. This property, however, enables the leader to predict the best response of the noncooperative followers to any strategy; that is, it chooses and hence determines a strategy that would pilot them to an operating point that optimizes the network performance.

The ultimate goal of the leader is to control the retransmission probabilities of the followers and find the distributed choice of retransmission probabilities. The leader controls the retransmission probabilities of the concurrent followers over the shared channel. In particular it will predict the strategy played by the followers and makes all users converge to Stackelberg equilibrium.

##### 5.1. Stackelberg Equilibrium

Let and denote the arrival probability and the retransmission probability, respectively, of the leader. The parameter will represent the intervention level of the leader. Therefore the base station computes the optimal parameter . Then the variation of results in the reaction of the followers by choosing the retransmission probabilities and decreases the system congestion. In order to provide an incentive to choose a lower transmission probability, the leader needs to vary its intervention level depending on the transmission probabilities of the followers. Let be the response strategy of the followers when the leader’s strategy is . We define the utility function of the leader as the aggregate throughput over the network:

The leader decision problem is:Define to be a retransmission policy where each follower with retransmits at any slot with probability for all and where tagged follower retransmits with probability and when the maximization is taken with respect to .

The follower decision problem is:The definition of Stackelberg equilibrium in our proposed game can be simply stated as follows.

*Definition 3. *A policies vector is a Stackelberg equilibrium if and only if(1),(2).

##### 5.2. Performance Metrics

Next we give the performance metrics based on a 3D Markov chain with three states , where is the number of backlogged packets among the set , is the number of backlogged packets of the th user (either 0 or 1), and is the number of backlogged packets of the leader. Denote by the probability that the system jumps from state to state . The transition probability diagram is depicted in Figure 2. For the sake of clarity, we do not show the transition probabilities in the figure. The detailed transition probabilities of the Markov chain of both schemes WRA and MPL-WRA are given in Appendices A and B.

For instance, we are particularly interested in deriving the aggregate throughput and the expected delay of the transmitted packets. We first discuss the procedure to obtain the steady state probabilities. Then, we derive the expressions of the performance metrics of interest from the steady state equations. Denote by the steady state of the Markov chain. The steady state of the Markovian process is given by the following system:where is the transition matrix of the Markov chain and .

Using a stationary iterative method, such as the Jacobi method [26], we compute the stationary distribution:subject to

The average throughput of the tagged user is given as a function of the steady state of the Markov chain as follows:In order to determine the expected delay for the tagged user , we use Little’s formula [27]. We give the average number of backlogged packets at user as follows:Finally the expected delay of source is given byExistence of a solution is as follows. The only point where the associated Markov chain does not have a single stationary distribution is when the retransmissions vector is such that . The steady state probabilities are continuous over and . Since this is not a closed domain, a solution does not exist. However, as we are restricted to the closed domain , where 0, an optimal solution indeed exists. Therefore for any , there exists some which is . is said to be -optimal for the throughput maximization if it satisfies for all . A similar definition holds for any objective function (e.g., delay minimization).

#### 6. Performance Evaluation

We turn now to compare the performance of the three schemes: Nash WRA, hierarchical WRA, and hierarchical MPL-WRA in terms of the average throughput, retransmission probabilities (equilibrium policy), and expected packet delay. A symmetric Stackelberg equilibrium is computed in three steps as follows:(1)The leader selects a retransmission profile.(2)The base station communicates the transmission policy of the leader to the followers. The best response, which results naturally in a symmetric Nash equilibrium, of the followers is then computed.(3)The leader receives the decisions made by the followers and verifies that they do correspond to a symmetric Nash equilibrium. If they do not, the followers update their strategies by unilaterally deviating till getting absorbed by a symmetric equilibrium. The vector of leader and followers strategies forms the symmetric Stackelberg equilibrium.

*Remark 4. *Studying an asymmetric network numerically requires one to consider all possible combinations of the network parameters. Since the degree of freedom (the parameters to choose) is usually very large in asymmetric networks, such a numerical study is not carried out generally.

##### 6.1. Numerical Results

We perform here numerical results with users. Each user maximizes its own throughput whereas the leader is purely altruistic and maximizes the global throughput. We consider the throughput as the utility function. Similar trends are obtained when minimizing expected delay. We set , and consider four selectable power levels mWatts according to the distribution .

##### 6.2. Hierarchical Equilibrium versus Nash Equilibrium

We plot in Figures 3(a) and 3(b) the global throughput and the expected delay under the intervention of the leader. We note that both in terms of aggregate throughput and the expected delay and taking the protocol WRA as reference, the hierarchical scheme without power control outperforms the Nash WRA for low and average loads. We remark that the expected delay of the hierarchical scheme is efficiently reduced for low and medium loads. However, in both cases, the expected delay increases exponentially as the network load increases due to an increase in the number of channel collisions.

**(a)**

**(b)**

We also notice that the followers are less aggressive at low and medium loads as shown in Figure 4. In the case of heavy loads as the channel reaches saturation, the followers become very aggressive and attempt to transmit with a probability close to one. As a result, the leader reacts by setting its probability retransmission to 0.9. As expected, this results in the network collapse; the throughput drastically drops as the arrival rate increases (throughput is almost 0 at ). This result explains in fact that the Stackelberg equilibrium per se is unable to prevent the network collapse.

##### 6.3. Hierarchical MPL-WRA Scheme versus Hierarchical WRA Scheme

Next we compare MPL-WRA and WRA schemes. Figure 5 shows that the MPL-WRA scheme outperforms considerably the WRA scheme in terms of aggregate throughput and expected delay for all loads, due to the randomization and priority in terms of transmit power. Indeed, under the proposed scheme the average throughput increases till achieving at and then decreases till achieving at . An important result is that the expected delay is reduced, compared to the hierarchical scheme with single power level where an exponential increasing of average delay is observed due to bad collisions resolution.

**(a)**

**(b)**

We depict in Figure 6 the equilibrium retransmission probabilities. We note that the power diversity solution breaks the impact due to the selfish behaviors of the followers as they become less aggressive compared to the hierarchical scheme. Indeed the follower transmits using the highest power level, playing the role of “dominant station” during that slot. Since the power level is chosen randomly, this role is fairly shared by all the followers in the subsequent slots. An important result is that due to the randomization of the power choice and the general capture the leader predicts the Nash equilibrium of the followers given its strategy and chooses the best response for it by reducing its retransmission probability (transmits 0.1). One can explain this result by the fact that the leader needs to vary its transmission probability depending on the transmission probabilities of the followers, to provide an incentive and avoid the network collapse. This results in a positive impact on the system performance and explains the improvement of the global throughput and the effective limitation of the expected delay.

#### 7. Conclusion

We have analyzed a hierarchical slotted random access mechanism with single power level and multiple power levels based on a Stackelberg game. The main idea is to introduce a special user named leader whose role can be played by the base station. Its mission is to control the transmission activities of the other users (followers). We have first shown that at low and mild loads the performance both in terms of global throughput and expected delay is improved compared to a system without hierarchy since the followers become less aggressive. According to the congestion level of the channel the leader intervenes and punishes the followers’ strategy by setting its transmission probability to 1. In this way, the followers get an incentive to reduce their retransmission probabilities. Later and in order to keep the load of the system reasonable and therefore take the advantage from hierarchy, we have developed a new hierarchical scheme by integrating a power control scheme. Under the power control scheme and for all workloads, the followers are less aggressive so the capture effect results in successful transmissions. We have shown that the use of power diversity solution helps the leader to regulate its transmission probability according to the congestion level of the system and provide an incentive to the followers to choose low retransmission probabilities. This result has a positive impact on the performance system; indeed an efficient delay limitation is then obtained compared to hierarchical scheme with single power level as well as avoidance of the well known throughput collapse, in particular at high loads.

Our approach of using such hierarchy among users to improve network performance can be applied to other scenarios in wireless communications. Potential applications of the idea include sustaining cooperation in multihop networks and limiting the attack of adversary users. The altruistic leader may be designed to serve as a coordination device in addition to providing selfish users with incentives to cooperate. Finally, designing a protocol that enables users to play the role of the leader in a distributed manner will be critical to ensure that our approach can be adopted in completely decentralized communication scenarios, where no hierarchy is present.

#### Appendices

#### A. Transition Probabilities of Hierarchical WRA

For , for ,for ,for ,for ,for ,for ,for ,for ,

If , for , for , for , for , for , for , for , for , for , for , for , for , for ,

#### B. Transition Probabilities of Hierarchical MPL-WRA

For , for ,for ,for ,for ,for ,for ,for ,for ,If , for , for , for , for , for , for , for , for , for , for , for , for , for , for , for ,

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported by European Commission FEDER funds and the Spanish Ministries, MEC, and MICINN, under Grants CSD2006-00046 and TIN2009-14475-C04.