Abstract
Game theory has been a tool of choice for modeling dynamic interactions between autonomous systems. Cognitive radio ad hoc networks (CRAHNs) constituted of autonomous wireless nodes are a natural fit for game theorybased modeling. The game theorybased model is particularly suitable for “collaborative spectrum sensing” where each cognitive radio senses the spectrum and shares the results with other nodes such that the targeted sensing accuracy is achieved. Spectrum sensing in CRAHNs, especially when used in emergency scenarios such as disaster management and military applications, needs to be not only accurate and resource efficient, but also adaptive to the changing number of users as well as signaltonoise ratios. In addition, spectrum sensing mechanism must also be proactive, fair, and tolerant to security attacks. Existing work in collaborative spectrum sensing has mostly been confined to resource efficiency in static systems using requestbased reactive sensing resulting in high latencies. In this paper, evolutionary game theory (EGT) is used to model the behavior of the emergency CRAHNS, providing an efficient model for collaborative spectrum sensing. The resulting implementation model is adaptive to the changes in its environment such as signaltonoise ratio and number of users in the network. The analytical and simulation models presented validate the system design and the desired performance.
1. Introduction
As wireless nodes become more autonomous and the network architecture more decentralized as in the case of ad hoc networks, game theory has become a powerful tool to understand the results of repeated interactions that may occur in such networks [1]. Evolutionary game theory (EGT) is a branch of noncooperative game theorybased on the principle of “Survival of the Fittest” and has been applied to model the evolution of stable solutions [2].
Cognitive radio ad hoc networks (CRAHNs) comprise of cognitive radios (CRs) connected in an ad hoc manner. CRs need to have the capability to access any spectral band based on availability (dynamic spectrum access (DSA)) in order to share the spectrum with the primary or licensed users (PUs). Spectrum sensing is an essential feature required to implement DSA [3]. In CRAHNs, spectrum sensing may be performed in a collaborative manner to improve reliability under hidden node and fading conditions. However, collaborative spectrum sensing consumes additional resources (such as energy from the batterypowered radios and bandwidth) to sense the spectrum and communicate the sensing information to other users. Process of spectrum sensing also affects the “Quality of Service” of the underlying applications due to the latencies associated with spectrum sensing, allocation, and handover [4]. When CRAHNS are used in emergency networks such as military and disaster management applications, it is especially important that the results from spectrum sensing mechanism are accurate. Accuracy of the sensing results is very important since missing the presence of the legacy PU would cause interference to the PU and to the CR (also referred to as secondary user, SU) itself, thus resulting in the communication failure. Accuracy also includes the probability of false alarm that depicts an event of missed opportunity which may be crucial in emergency situations. Since CRAHNs are dynamic, that is, number of users as well as environmental conditions such as signaltonoise ratios (SNRs) may change frequently, the collaborative spectrum sensing mechanism must be able to adapt to such variations. Equally important are the requirements of resource efficiency and fairness in energy consumption in the battery operated hand held devices for the sustenance of the network. In addition, the protocols for emergency networks need to be proactive to avoid latencies in the packet deliveries [5]. CRAHNs are vulnerable to data falsification attacks and they need to be tolerant to such attacks [6]. Data falsification attack is where some SUs send false local spectrum sensing results to the fusion center, causing the fusion center to make a wrong spectrum sensing decision.
In this paper, a collaborative spectrum sensing mechanism is presented that meets the various performance requirements of emergency CRAHNs outlined above. Development of this model involves visualizing the network system at three levels of abstraction, that is, policies, behavior, and implementation as suggested in [5]. The first level of abstraction, based on policies, defines the hierarchy and authentication issues among the users which are typical of the emergency situations. The second level of abstraction based on the behavior of networks defines “what the system needs to do?”. Finally the third level or the implementation involves “How the solution is realized?”. Figure 1 illustrates the various levels of abstraction involved in the system design process. In this work, the behavioral model (second level) has been developed based on evolutionary game theory, where CRs are visualized as autonomous agents [7].
This paper is organized as follows. Section 2 outlines the current stateoftheart in the area of collaborative spectrum sensing in CRAHNs. Section 3 describes the emergency CRAHN system model and network utility function defined in this work. Section 4 presents the core of this work that involves modeling the behavior of the SUs in evolutionary game framework. A reward system is proposed that will allow the network to evolve to a stable state. Section 5 presents the application of the model to an adaptive spectrum sensing scheme for emergency CRAHN. Conclusions are presented in Section 6.
2. Related Work
Related work has been presented in two subsections, Collaborative Spectrum Sensing and Game Theory.
2.1. Collaborative Spectrum Sensing
In fading channel conditions, the local spectral sensing decisions may be less reliable due to time variant nature of the channels as well as hidden node conditions. However, combining information from multiple sources provides spatial diversity resulting in more reliable global information on the spectral behavior of the PU. This combining of information is called data fusion which may use “hard decisions” or “soft decisions” from individual SUs. Log likelihood ratio test (LLRT) based data fusion has been shown to be optimally provide the reliability information for measurements from each CR, that is, probability of detection and probability of false alarm are available. Collaborative sensing, where different CRs in the network cooperate with each other and share their spectrum sensing results, has been presented in detail in [8, 9]. It has been shown that the targeted error bound can be met without requiring all the CRs in the network to sense the spectrum all the time. The number of CRs that should be sensing and sharing the information at a time will depend upon various factors such as network size and average SNR conditions.
The fundamental components of collaborative spectrum sensing include local sensing technique employed by the CRs, data fusion technique used at the central coordinator, and control channel used for communication as well as reporting and choice of collaborators (i.e., who should sense). A collaborative spectrum sensing system is vulnerable to attacks in which malicious CRs report false detection results. Techniques to improve the security of collaborative sensing have been investigated [10], where the suspicion level of SUs is based on their past reports. Trust values and consistency values are calculated to eliminate the malicious users’ influence on the PU detection results.
Much of the work in the literature in this area did not consider dynamic SNR conditions or changing network size, which is typical of an ad hoc network. In addition, these sensing mechanisms use requestbased reactive sensing, which has higher latencies compared to proactive sensing.
2.2. Game Theory
For over two decades, game theory has been applied to networking problems including routing, pricing, flow control, and quality of service, to name a few [11, 12]. Game theory is a mathematical analysis technique applicable to scenarios involving intelligent players competing for limited resources. Principles of game theory have been applied in wireless networking at various layers [5]. Most of the work on game theory for CRs has been focused towards interference management, frequency allocation, and MAC scheduling [11, 12].
In a spectrum sensing game, players involved are the SUs capable of detecting white spaces. Electromagnetic spectrum is the limited resource that the players of the game, that is, SUs, are competing for. Each SU would like to have more information about the spectral occupancy and to be able to gather this from the information broadcasted by its peers and by selfsensing. The SU must expend considerable amount of energy sensing the spectrum and broadcasting the information, while listening to information broadcasted by others over the common channel is relatively free. Thus, for each player in this game (SU), the need to contribute towards the global decision process conflicts with its own desire to save energy. Using information from others is a much more attractive option for the players making it a classic freeloading game [10]. Each SU would prefer using information shared by others since it comes at no extra cost. So long as the number of nonsensing users is small, it will not affect the overall performance of the system, but it will affect individual SU performance. The unfair load distribution may lead to poor spectral knowledge and failure of the group goal of maintaining the spectrum occupancy information accuracy with minimal energy consumption. To control the behavior, the rules of the system can be designed, such that free loading is avoided [10]. From spectral sensing point of view, the sense or sleep dilemma has been fitted into an evolutionary game framework [13, 14].
Evolutionary game theory originates from the biological model of survival of the fittest and helps designers to model the behavior of autonomous agents trying to follow the strategy which has the maximum payoff. In an evolutionary game, actions of the SUs are based on the belief factor when the game is played repeatedly. The strategies may change between generations and this change is based on the comparison between the payoffs for the group following a certain strategy and average system payoff. Any strategy that results in higher than average system payoff will be followed by the majority of the population, ultimately becoming the winning strategy. This behavior is modeled by “Replicator Dynamics” [15]. In [13, 14], the behavior dynamics of SUs is modeled and explored with the goal of throughput maximization using a reactive spectrum sensing scheme with “OR” in based data fusion, meant for civilian networks. It is assumed that sensing activity is limited to a few subbands and data communication takes place over other subbands. In other words, sensing and communication can occur concurrently over different frequency bands. In comparison with [13, 14], our focus is on fairness with respect to energy consumption, while maintaining stability in a scenario where the entire spectrum, that is, all the subbands, are sensed by the SUs in the allocated quiet period. As per the standard [4], these quiet periods are a must and are synchronized such that all SUs observe silence during this time. As a result, when the SU is not sensing, it is not transmitting data instead it is sleeping or saving energy. We discuss a proactive spectrum sensing mechanism using LLRT based data fusion [16, 17] for emergency networks. Fairness with respect to energy consumption among the CRs is also maintained.
It is intuitive that whenever the incentives to contribute do not occur naturally like in a public good game [11], artificial incentives must be offered in the form of award for good behavior or penalty for misbehavior. In this paper, we show that without such an incentive, the spectrum sensing game will reduce to a public good game, where the stable solution is for all SUs to sleep. It is possible to reach an evolutionary stable solution that achieves the network objective and is fair to all SUs by incorporating an award system. The amount of award required depends upon the energy expended by the SUs for sensing/broadcasting and the constitution of the population at the time. For a large enough reward, a stable solution can be reached.
3. System Model
The system under consideration consists of cognitive radios connected in an ad hoc manner as shown in Figure 2. Each radio is assumed to be sensing a common spectral band of bandwidth split into spectral bands centered at frequencies , respectively. The SUs are assumed to be closely clustered, as a result the distance between the SUs is much smaller than the distance from a typical PU [13, 14]. PU signal detection at different SUs is influenced by independently fading Rayleigh channel. Hence, the effective SNR at the SUs is assumed to be exponentially distributed with an average SNR . The number of SUs in the network or size of the network is dynamic, typical of an ad hoc network. The PU spectral usage pattern is represented by on/off Markov model, with on/off periods of a known distribution [18]. The local sensing results are sent via an error free common control channel [19]. To facilitate spectrum sensing, the group observes a socalled quiet period periodically [20]. During this period, all SUs must refrain from communication over all the frequency bands of interest. The only options available to the SUs during the quiet time are to sense the spectrum or to sleep, that is, conserve energy. Table 1 depicts the terminology used in this paper.
Due to their varying physical locations, each SU provides a unique local “snapshot” of the spectral occupancy, which is used to make a combined (fused) global decision about spectral occupation. The success of any spectral sensing technique is measured in terms of probability of detection () and probability of false alarm (). For the rest of the discussion, probability of detection and false alarm associated with local decisions made by th SU are represented as and . The probabilities of detection and false alarm associated with global decision are represented by and , respectively, where is the number of inputs fused to achieve the global decision.
One of the SUs (by taking turns or the first one in the network) assumes the status of spectrum coordinator that performs the following functions.(a)Fuse the spectral decision data received over the common control channel from all the sensing SUs in the network to obtain the global decision about spectral occupancy. Techniques like “AND”, “OR”, and “Majority Logic” may be used for data fusion [8]. Data fusion using optimal fusion using loglikelihood ratio test (LLRT) [16, 17] is used in this work due to its superior performance and ability to thwart malicious attacks. (b)Monitor probability of detection () and probability of false alarm () for each SU. The and of each SU are estimated by the spectrum coordinator using the counting rule by comparing the local sensing results with global or fused results [21].
It needs to be noted that the choice of the coordinator could be policy based or as in [22] where a cabinet of leaders are used to ease the overhead on the choice of a leader. This does not impact the system model. To achieve a targeted accuracy for the network, represented by and , it is sufficient if some out of users sense the spectrum in each quiet period. This value of depends upon the local SNR conditions and the system requirements for accuracy [23, 24]. The rest SUs can sleep during this quiet period and conserve the energy. Global probability of detection () and probability of false alarm () using decisions from SUs is shown in (1) and (2) [8]. Since each SU is represented by the average , value, out of majority logic rule is used for fusion, where depends on the average , as Bayesian risk function defines the risk associated with making a wrong decision [17] as is the cost associated with false alarm and is the cost associated with missed detection. As the number of sensing users or increases, is expected to decrease approaching zero. Cumulative information gain is defined as .
Goal of the network as a group is to obtain accurate information with the least number of SUs sensing. It is also assumed that only the SU that have sensed the spectrum in a particular time epoch will broadcast the sensing information in that time epoch. This ensures the minimal utilization of energy for sensing and minimal bandwidth overhead for broadcasting. The smaller the number of sensing users, the higher the resource efficiency at the cost of low information gain and vice versa. At this point, we define a function called network utility that underlines how efficiently the network obtains information about spectral occupancy as Network utility function with active (sensing users) is defined as the weighted average of information gain and the resource efficiency (either energy consumption or bandwidth overhead) of the network. It can be seen that is a convex function of with a peak at beyond which any gain in information is offset by energy spent. Figure 3 shows the behavior of and for different values of and average SNR. Location of the maxima of the network utility function depends upon the weighting coefficient , average SNR conditions, and the network size ().
(a)
(b)
The situation of users sensing at a time in a network of size can be represented as each SU sensing with probability , defined as sensing probability (). Figure 4 shows the behavior of as a function of , , , and . With increasing , the sensing probability to achieve the maximum utility is lower as the sensing effort to be put by each SU reduces. Increasing (for constant ) or decreasing (for constant ) is equivalent to improvement in average SNR, resulting in reducing the sensing effort required by each SU for a targeted sensing performance.
(a)
(b)
(c)
(d)
Equation (1) can be rewritten as which essentially means that the combined is a function of local conditions and number of SUs sensing. For a targeted accuracy , the number active SUs is a function of and average is associated with the network as For a known , , and , it can be estimated that
Knowing the value of , energy efficient solution may be implemented where only out of users senses the spectrum in the quiet time and allowing the rest to sleep and conserve energy. Of the users, which must senses is an important question and has been dealt with in the literature [25]. Since the SNR conditions, as well as network size, are time varying, one of the solutions is to program each SU to sense with probability . On an average, SUs will sense in a given time epoch, satisfying the requirements about spectral information and energy consumption. Probability of sensing or will be a function of the network size (), weighting coefficient (), and average SNR ( and ) as
Knowledge of these parameters will allow the SUs to program their such that the network utility is maximized. It should be noted that since each SU is choosing to sense or not, locally, the solution maintains fairness, that is, on average each SU will expend energy equally. However, in mobile ad hoc networks, both the size of network and SNR conditions are time variant. Each SU must be able to adjust its sensing profile locally such that the network utility is maximized. The local utility of an individual SU is defined as where is the probability of sensing for the th user. The conflict mentioned earlier is clearly shown in this equation. Reducing local sensing profile () will increase the resource efficiency (second term in (9)), but with every reduction in local , the global information will suffer, effectively reducing and local utility.
So long as an average out of users sense the spectrum, network utility will be maximized, which may come at the cost of some SUs sensing more often than others. Such a solution is not fair, as a result it is not stable and is not advisable in emergency CRAHNs. We model this scenario using game theory, where each SU is treated as a player that makes a rational decision (in this case whether to sense or not in a given quiet time period [20]), based on its local utility function. The goal of a rational player is to maximize its own utility. We use this model to finally derive an adaptive system that will allow a network to reach the minimal probability of sensing, such that the resulting network has maximal network utility and is also fair to all its users. Evolutionary game model is used to model the CRAHN to analyze the performance of the network and find a strategy that gives stable solution of maximum utility.
4. Evolutionary Game Model
An evolutionary game consists of a large number of players playing a given game repeatedly among them. Actions of the players are based on their beliefs. Inbuilt learning process allows them to update their beliefs based on experience. In the spectrum sensing game, it is assumed that each SU is trying to achieve the highest payoff for itself and will change its strategy based on the belief as to which strategy will lead to higher payoff [2]. Discussion follows the cycle of autonomous choice shown in Figure 5, where a one shot game is first analyzed, followed by iterative game and then the evolutionary model.
4.1. One Shot Game
One shot simultaneous game is defined as , where is the set of players, is the action, space and is the vector of utility functions of each player. Payoff matrix for one shot game between two players is depicted in Figure 6. The utility function values , , and in the action space are The spectral information gained with two active SUs is expected to be greater than or at best equal to .
Case 1 (). The convergence is towards quadrants 2 and 3. If both players are in “sense” state (quadrant 1), both players have motivation to move towards “sleep” state. After playing the game once, action profile of players and would be {sense, sleep} or {sleep, sense}.
Case 2 (). The convergence is towards quadrant 1 where both players would sense.
Case 3 (). The game converges to quadrant 4 where the best strategy for each SU is not to sense at all.
Arrows in Figure 6 indicate direction of preference of choices leading to Nash equilibrium (NE) for Case 1. Quadrants 2 or 3 constitute the NE as neither player could get a better payoff by changing its strategy unilaterally. This is also Pareto Optimal as neither nor can make a unilateral change to get a better or same utility without hurting the other player’s utility. This game is a symmetric, nonzero sum, and simultaneous whose solution lies under mixed strategy. Figure 7 shows the Pareto boundary coinciding with the NE.
4.2. Iterative Game
In an iterative game the actionpayoff cycle is repeated several times (Figure 5). As seen earlier, it is possible to achieve equilibrium when the game is played once, that is, one SU senses while the other sleeps, resulting in low payoff for the sensing SU. However, when the game is played repeatedly, the strategy adopted by the players will change depending upon their past experience, past payoff, and expected duration of the game.
Titfortat, titfortwo tats, grim trigger strategy have been shown to achieve a stable solution [26]. In these strategies, a rogue SU (an SU that refuses to contribute to the sensing effort) will be penalized by others for not cooperating, that is, if an SU refuses to sense, others in the group will not sense either. This would result in complete collapse of the network, resulting in trivially stable solution. Analogy can be drawn to the “public good” game, where everyone benefits from others’ contribution. Although the solution strategy where few players contribute for the public good is Pareto optimal, it is not fair to the contributing players. As a result, this solution is not stable. It has been proposed that a stable strategy in a public good game can only be achieved by using a sufficiently large reward or significant penalty [27, 28]. Provided the reward is significant, all players could be motivated into cooperation and, effectively, into stability.
4.3. Evolutionary Game
The evolutionary game is defined as follows: , where is the strategy space over action set . The number of the SUs following strategies is where . The population profile is defined as , where .
For theoretical analysis, we shall first consider only 3 strategies which can be extended further.
Strategy “1” corresponds to al_sleep always sleep, .
Strategy “2” is KbyN sense the spectrum with probability . This sensing probability would maximize the network utility provided that all follow it.
Strategy “3” corresponds to al_sense always sense the spectrum, .
Average utilities of groups using the corresponding strategies of are given by It can be seen that for , average utility of the group following strategy 1 will be higher than the other two groups as
In an evolutionary game, actions of the SUs are based on the belief factor when the game is played repeatedly. The strategies may change between generations based on the comparison between the payoffs (utilities) for the group following a certain strategy and average system payoff. Any strategy that results in higher than average system payoff (network utility) will be followed by the majority of the population, ultimately becoming the winning strategy. This behavior is modeled by “Replicator Dynamics” [26, 27]. The network utility or payoff involving populations following different strategies is defined as Substituting (11) in (13)
For strategy 1 to be the winning strategy, the condition must hold good, that is, Since , , , and are all positive, the condition above is always satisfied and all the SUs will converge to strategy 1. This would be a trivially stable strategy. For network viability, it is necessary that strategy 2 or strategy should be the winning strategy. For strategy 2 to win, the following conditions must be satisfied:
However, with the current definition of utilities, convergence to strategy 2 is not possible, since SU seem to gain much more by using the knowledge from others rather than sensing themselves. Drawing an analogy with the “Public Good games” [28], the best network utility may be achieved by providing the SUs with an incentive or a penalty. Such incentivization has been discussed in depth in [29]. To encourage contribution from all the SUs, we introduce a “Reward factor” () as a reward metric. The value of the reward awarded to the th SU depends upon the sensing effort by an SU as compared to the required effort () that will achieve the targeted performance.
Consider the following: where is the sensing probability of th SU. The reward saturates to 1 once the sensing probability reaches the desired value . The reward for the SUs following strategies al_sleep,, and al_sense are listed below: (Strategy 1) for al_sleep, (Strategy 2) for by , (Strategy 3) for al_sense.
After inclusion of the reward factor, the utilities can be rewritten as, For strategy to win, it must be the strategy with the highest payoff compared to other strategies, or
The condition will hold true so long as the information gain is greater than 0, which is always true for as Since both and are both bounded by 1, the above condition is always satisfied when and . Thus, strategy 2 has higher payoff than strategy 1 and 3 and thus will be the winning strategy with inclusion of the reward factor.
4.4. Replicator Dynamics and Evolutionary Stable Strategy
A system is considered to be evolutionary stable when change in the populations following different strategies goes to zero. Let represent the change in the population following strategy . According to replicator dynamics [24, 25], this change will be directly proportional to difference between group payoff and network payoff as
Since the number of SUs in the network remains constant, the population with a winning strategy must increase at the cost of population following other strategies as
A fixed point of the replicator dynamics is where the relation is satisfied for all [24, 25].
It can be shown that inclusion of reward factor allows strategy 2 to be the winning strategy as well as the stable strategy. A fixed point of the replicator dynamics (or any dynamical system) is said to be asymptotically stable if any small deviation from that state is eliminated by the dynamics as . Let be the stable point, being perturbed by a disturbance [26]. The new population density is given by Substituting (27) in (25) for , we have With the assumption that ; is negligible for . Substituting (20), (30) can be solved as Thus, asymptotic stability with , is ensured when the condition below is satisfied, which is always true since , and are all bounded between 0 and 1 as
The strategy set can be extended where different groups of SUs can sense with different probabilities varying from 0 to 1, that is, over strategy space . The network converges to strategy because of the presence of reward factor introduced in the utility functions. If there is no group following the strategy , the game will converge to a strategy that is close to the optimum strategy. Implementation of the reward system requires a central entity that can monitor the broadcasting activity by each SU and allocate rewards based on their sensing probability computed over several iterations. With various SUs in the network sensing with different probabilities, the incentive factor will ensure that the system will evolve such that all SUs will follow a strategy that has the highest network utility.
4.5. Validation of the Evolutionary Game Model
To evaluate the performance of the proposed analytical model, a simulation model has been built using software Netlogo [31]. 100 iterations are construed to represent one generation, and the strategies followed by the players are assumed to be unchanged over a generation. The number of SUs in the network has been set to 30, and the initial population distribution of the three groups has been assigned randomly. The process of evolution of population can be seen in the plots shown in Figure 8. The first graph shows the change of population or (25) over generations. The third graph shows the number of SUs following each strategy over several generations, or the dynamic demographics of the game. As the game evolves, the population following by (strategy 2) strategy increases at the cost of the other two groups.
Fairness of a system is judged based on the difference between the average energy spent by individual SUs in the network. In a fair system, the average energy spent will be similar or the variance between the energy spent by individual SUs will be small. The sample variance is computed using average energy spent by individual SU () and the mean of the average energy spent by all the SUs in the group . High variance shows an unfair situation and vice versa. Plot shows reduction in variance as the system evolves towards a stable, fair solution. The second graph shows the variance over generations. It can be seen that the variance goes down to zero as the game stabilizes as One essential condition for convergence to the by strategy (strategy 2) is that there should be at least one SU following the strategy. If no subgroup is following by strategy, the system will evolve towards the subgroup following the closest to strategy by group, resulting in a suboptimal solution. However, in all cases the system is fair, stable and evolves to a nontrivial solution.
5. Application of the Evolutionary Game Model to Adaptive Spectrum Sensing Mechanism in Emergency CRAHN
The game model described in Sections 3 and 4 can be applied to an adaptive collaborative spectrum sensing scheme in emergency CRAHNs [23]. In this scheme, each SU can sense with an initial sensing probability from 0 to 1. For viability of the network, it is desirable that all SUs in the network should sense with probability , but the SUs do not have the knowledge about the values of or due to dynamic operating conditions. The central entity monitoring the spectrum sensing activity by each SU keeps track of network size () and estimates the optimum sensing frequency periodically. It provides this information to the SUs indirectly through the reward factor (). Implementation of the reward system requires a central entity (shown in Figure 9) that will monitor the spectrum sensing activity of each SU and allocates rewards based on the individual contribution and desired contribution.
With various SUs in the network sensing with different probabilities, the incentive factor will help individual SUs adapt their sensing strategy such that all SUs will follow a sensing schedule, that is, (or the closest to) the optimum sensing schedule, resulting in highest network utility. Each SU has the flexibility to adjust its sensing schedule at the rate it desires. Any change in network size or SNR conditions which should result in corresponding change in sensing schedule is communicated to the SU through the reward factor.
Figure 10 shows the functional diagram of the proposed system. Functions such as data fusion, estimation of the parameters for each SU , keeping track of network size (), and estimation of are performed at the spectrum coordinator. Estimation of the probabilities is done over a period of time, assumed to be time epochs. The SC computes the reward for each SU at the end of each time epochs.
This reward factor is sent in real time to each SU. The SU uses this information to adapt its sensing probability such that its reward is maximized. The update equations for when and for are as shown below. Each SU can adjust the rate at which its probability of sense is being adapted with the factors and as
The pseudocode for the algorithm is as shown below. Table 2 lists the parameters used for simulations. The SC receives decisions from all the SUs in the network and generates the global decision using those decisions. It also updates probabilistic parameters for each SU . After averaging over iterations, it determines the award for each SU based on its sensing frequency and the required sensing frequency. The SC also keeps tracks of the number of SUs in the network (from the proactive routing table) and adjusts the optimum sensing frequency if the network size changes. Increase in the network size will result in reduction in and vice versa. If the average and change significantly it is assumed that SNR change event has occurred. This triggers learning at SC as indicated by resetting the flag Ps_opt_Learning_flag (see Algorithm 1).

The code at each SU is shown below. Here and the group SNR need not be known. The is updated based on the reward Fs. It should be noted that this reward is a function of both and SNR and reflects any changes in their value (see Algorithm 2).

Figure 11 shows the behavior of the simulated CRAHN under changing environmental conditions. Plots show the variance of the energy spent by each SU, sensing performance of the system in terms of over time. The group SNR has been set to −2.25 dB and initial network size is 30. At the 50th time unit the SNR of the system is changed to −2.25 dB from −1.75 dB and at 150th unit, the number of SUs in the system () is changed to (). It can be seen that any change in the environment will result in temporary increment of variance and reduction of , but the system can adapt itself with the help of the learning that takes place at the SC and the reward factor that is fed back to SU. With that a low energy variance as well as a desired is achieved. Last plot shows the changes in the probability of sensing at any th SU. Reduction in the SNR results in slow increment of , and similarly increment in results in gradual reduction of . The rate of change of is governed by and which can be adjusted locally.
6. Conclusions
In this paper, we propose a spectrum sensing system that can meet the requirements of emergency CRAHNs as listed in Table 3. The adaptation algorithm implemented at each SU adapts its local sensing schedule such that the network utility is continually maximized even in case of changing network size and the SNR variations. As the network size and/or SNR conditions change, the optimum number of SUs that must sense in a given quiet period changes, this number is learnt at the SC periodically. SC communicates this information via a reward factor, which allows the SU to adjust its sensing schedule.
The LLRTbased fusion implemented at the SC inherently is resilient to Byzantine attacks [30] and provides protection against malicious users. Since the sensing is done proactively and periodically in the quiet periods [20], latency of the system is reduced compared to requestbased reactive sensing.
Concept of the reward coefficient is derived from the analysis performed using evolutionary game theory and replication dynamics. Comparison with public good game showed that unless a reward or penalty is given, the stable solution is where no SU senses and the network cannot function. Stability achieved using a proportional reward concept ensures high spectral accuracy, while consuming the smallest amount of energy by the group. The presented evolutionary framework helps a CRAHN to adapt itself to an optimal sensing schedule, that is, to decide whether to sense or not sense in a time epoch adaptively without having any information of the optimal schedule.
Acknowledgments
This paper was supported by CEEMS Laboratory at the International Institute of Information Technology Bangalore funded by the Governmentof Karnataka. The authors wish to thank Dr. Srinath Srinivasa, IIIT Bangalore, for various discussions on multiagent systems.