#### Abstract

Security is a critical issue in cognitive radio (CR) relay networks. Most previous work concentrates on maximizing secrecy capacity (SC) as a criterion to guarantee the security requirements in CR relay networks. However, under the requirement of “green” radio communication, the energy consumption is largely ignored. This paper proposes a relay selection scheme which jointly considers the best relay selection and dynamic power allocation in order to maximize SC and to minimize energy consumption. Moreover, we consider finite-state Markov channels and residual relay energy in the relay selection and power allocation process. Specifically, the formulation of the proposed relay selection and power allocation scheme is based on the restless bandit problem, which is solved by the primal-dual index heuristic algorithm. Additionally, the obtained optimal relay selection policy has an indexability property that dramatically reduces the computational complexity. Numerical results are presented to show that our proposed scheme has the maximum SC and minimum energy consumption compared to the existing ones.

#### 1. Introduction

Cognitive radio (CR) is a promising technology to improve the utilization efficiency of the wireless spectrum resources [1]. In CR networks, the secondary users (SUs) are allowed to transmit concurrently on the same spectrum bands with the licensed primary users (PUs), as long as the resulting interference power at the PUs’ receivers is kept below the interference temperature limit. Such an operation mode is known as spectrum underlay [2]. In the underlay paradigm, the performance of the SUs degrades significantly in fading environments due to the constraints on their transmission power. One of the efficient ways to enhance the performance of SUs is to use cooperative relaying, which is capable of mitigating wireless channel fading [3], saving transmission power [4, 5], and increasing capacity [6–8] through multipath propagation offered by cooperative nodes.

The security concerns in CR relay networks have been attracting continuously growing attention [9]. Due to the open nature of wireless transmission medium, the CR relay networks are particularly susceptible to eavesdropping [10]. Traditionally, the cryptographic techniques have been employed to protect the communication confidentiality against eavesdropping attacks, which, however, increases the computational and communication overheads and introduces additional system complexity for the secret key distribution and management.

As an alternative, physical layer security has emerged as a new secure communication method to defend against eavesdroppers by exploiting the physical characteristics of wireless channels. This work was initiated by Wyner in [11], in which the notion of secrecy capacity is developed from an information-theoretical prospective and shown to be the difference in capacities between the main channel (i.e., the channel from the transmitter to the legitimate receiver) and the wiretap channel (i.e., the channel from the transmitter to the eavesdropper). It was proved in [12, 13] that if the wiretap channel is stronger than the main channel, the eavesdropper will succeed in intercepting the source information. Some recent work has been proposed to overcome this limitation by taking advantage of multiple-antenna [14–17] and cooperative relay [18–21] techniques. For instance, Pei et al. [14, 15] addressed the secrecy capacity optimization problem in multiple-input single-output (MISO) CR networks. Kwon et al. [16] explored MISO CR systems where the SUs secure the PUs in return for permission to use the spectrum. Zhang et al. [17] proposed efficient algorithms to solve the secrecy capacity maximization problem in multiple-input multiple-output (MIMO) CR networks. Apart from this, Zou et al. [18] proposed user scheduling scheme to achieve multiuser diversity for improving the security level of cognitive transmissions. Sakran et al. [19] proposed a relay selection scheme in CR networks where the considered scheme selects a trusted decode and forward relay to assist SUs and maximize the secrecy capacity that is subjected to the interference power constraints at the PUs. The power allocation strategies for relays were introduced in [20] with the goal of maximizing the total secrecy capacity in CR networks. Authors in [21] studied the relay precoding scheme to improve the secrecy capacity of SUs in CR systems.

Notice that the aforementioned work [14–21] on CR networks addressed the issue of secrecy capacity maximization but did not take into account the energy consumption. In wireless networks, most wireless devices are powered by batteries with limited energy. The network lifetime is an important factor to characterize the performance of such networks. In order to prolong the network lifetime, the battery energy should be consumed efficiently. In CR relay networks, the improvement of energy consumption can be realized by reducing the transmission power and balancing energy consumption among relays. However, the reduction of transmission power leads to degradation of the secrecy capacity. Therefore, the secrecy capacity and the energy consumption should be jointly considered for efficient implementation of CR relay networks. In addition, most previous works for relay selection use the current observed channel conditions to make the relay selection decision for subsequent data transmission. However, this memoryless channel assumption is not realistic in the time-varying radio environments. Finite-state Markov models have been considered as an effective approach to characterize the time-varying nature of the radio environments.

In this paper we propose an energy-efficient relay selection scheme which jointly considers best relay selection and dynamic power allocation in order to maximize SC as well as to minimize energy consumption. The main contributions of this paper are summarized as follows.(1)A scenario in which a secondary transmitter () communicates with a secondary destination () with the help of the best relay in the presence of different numbers of PUs and eavesdroppers is considered.(2)An energy-efficient relay selection scheme which jointly considers best relay selection and dynamic power allocation is proposed to maximize SC and minimize energy consumption.(3)In order to accurately describe the time-varying characteristic, the spectrum occupancy state, the channel state information (CSI) of the related channels, and residual relay energy are modeled as finite-state Markov model.(4)The relay selection and dynamic power allocation scheme is formulated as restless bandit problem, which is solved by the primal-dual index heuristic algorithm. The obtained optimal relay selection policy has an indexability property that dramatically reduces the computational complexity. Simulation results show that the proposed scheme outperforms the existing one in terms of the achievable secrecy capacity and energy consumption.

The remainder of this paper is organized as follows. In Section 2, the system model is described. Section 3 formulates the relay selection and dynamic power allocation scheme as a restless bandit problem and solves the problem with the primal-dual index heuristic algorithm. Extensive simulation results are presented and analyzed for performance evaluation in Section 4. Finally, Section 5 concludes the paper.

#### 2. System Model and Secrecy Capacity

We consider an underlay CR system with the coexistence of primary and secondary networks. As depicted in Figure 1, in the primary network, a primary transmitter (PT) communicates with primary destinations (PDs) denoted by . Meanwhile, in the secondary network, a secondary transmitter () wants to send confidential information to a secondary destination () assisted by the best relay selected from the candidate relay set, , over the spectrum band that is licensed to the primary network. At the same time, eavesdroppers, denoted by , try to eavesdrop and intercept the message sent by and relay nodes. A Rayleigh block-fading channel is assumed in this paper. We define , , , , , , and , as the channel coefficient of link, link, link, link, link, link, and link, respectively, where , , and . In addition, the global channel state information (CSI) is assumed to be available, and even the eavesdroppers’ channels are known when the eavesdropper is also a user of the secondary network, but it is not the intended destination for some particular confidential information [22, 23].

##### 2.1. Cooperative Relaying Protocol and Secrecy Capacity

We consider the decode and forward (DF) relaying protocol with two stages. In the first stage, transmits its encoded information with transmission power to the relay nodes. In the second stage, the selected relay reencodes the message and forwards it to with transmission power . Meanwhile, the eavesdroppers can overhear the information at the two stages due to the broadcast nature of wireless medium. For the secondary transmission in the presence of eavesdroppers, the secrecy capacity is characterized as where ; and are the achievable rates at and , respectively. The achievable rate at can be written as

In this paper, we assume that eavesdroppers independently perform their tasks to intercept the secondary transmission. The overall rate of the wiretap links is the maximum of individual rates achieved at eavesdroppers. Thus, the overall rate can be obtained as where is the noise power of all the links.

To guarantee the QoS of PDs, the transmission power of and is limited by the interference temperature limit ; that is, where is the maximum transmission power limit.

##### 2.2. Finite-State Markov Channel Model

In this paper, the finite-state Markov channel (FSMC) model is used to characterize the Rayleigh block-fading channel. The range of the channel gain is quantized into discrete levels, each corresponds with a state in the Markov chain. Specifically, the channel gain of the considered links is modeled as a random variable evolving according to a finite state Markov chain which is characterized by a -state set . Here, belongs to . Let denote the probability that transits from state to state at time . The channel state transition probability matrix is defined as where .

##### 2.3. Finite-State Markov Spectrum Occupancy and Energy Model

In CR relay networks, the radio spectrum is either occupied by the primary users or not. The spectrum state evolves according to a two-state Markov model, where means that the spectrum is occupied by the primary users and shows that the spectrum is idle. Let denote the probability that transits from state to state at time . The spectrum occupancy state transition probability matrix is defined as where .

The residual energy of the battery powered relay can also be modeled by a finite-state Markov energy model [24]. In this model, the continuous battery residual energy is divided into discrete levels denoted by , each corresponds to an energy state in Markov chain; is the number of energy sate levels. Let denote the probability that residual energy of transits from state to state at time . The energy state transition probability matrix is defined as where .

We need to find out the optimal relay selection and power allocation scheme, which can set one relay to be active at time slot according to the relays’ states that contain their channel state , where belongs to , residual energy state , and the spectrum state . Our optimization objective is to maximize the secrecy capacity as well as to minimize the energy consumption.

#### 3. Stochastic Formulation

In this section, we propose the relay selection and power allocation scheme to defend against eavesdropping attacks and to save the energy consumption. The proposed scheme can be formulated as a restless bandit problem which has been widely used to solve the stochastic selection issues [25]. In the restless bandit system, the relay is equivalent to the arm, the relay selection and power allocation are the actions of the arm, and the secrecy capacity and energy consumption correspond to the reward. The restless bandit problem can be solved according to the indices of the arms, which is calculated by a primal-dual index heuristic algorithm.

##### 3.1. Formulation of the Restless Bandit Problem

###### 3.1.1. Action Space of Relay

At time slot , each relay node decides whether to cooperate with the confidential communication between and or not and then decides how much power is provided if it joins the cooperation. Thus, the action of relay in time slot is represented by , where , 0 denotes that the relay is passive and 1 denotes that the relay is active. If the relay is active, is the corresponding power allocation which must satisfy the power constraint in (5). For relays in time slot , the action space is . In our proposed scheme, we only select a single relay to assist with data transmission. Hence, the relay selection satisfies .

###### 3.1.2. State Space and Transition Probabilities

The state of relay in time slot is determined by the channel states, the spectrum state, and the residual energy state. Consequently, the state of relay can be modeled as

The changes of the channel states , , , , and , spectrum state , and residual energy state are independent of each other. The relay state evolves in a Markov fashion with a finite-state space , . The state transition probablity matrix of relay is defined as where , , , , , and are defined in (6), (7), and (8), respectively, and . The element of is , denoting the transition probability that the state of relay transits from to , where and .

###### 3.1.3. System Reward

The goal of our proposed relay selection and power allocation scheme is to maximize SC and to minimize energy consumption in CR relay networks. Thus, we formulate the system reward to be the function of the SC, the residual relay energy, and the energy consumption. At time slot , if relay , in state , takes action , then the immediate reward is earned: where , , and are weights and is the achievable secrecy capacity and calculated by (1) while and are the residual energy and power consumption.

The immediate reward is earned when relay takes action in state . For a stochastic process, a maximum immediate value is not equivalent to the maximum expected long-term accumulated value. We assume that the duration of the whole communication is long enough and that is approximately infinite. We denote by the discount factor and denote by the set of admissible Markovian policies. The relay selection and power allocation problem is to find an optimal scheduling policy that maximizes the expected total discounted reward over an infinite horizon and compute its optimum value: where is the optimal expected total discounted reward. The discount factor is required to be to ensure that the expected total discounted reward is converged over an infinite horizon.

##### 3.2. Solution to the Restless Bandit Problem

The restless bandit problem mentioned above can be solved by the primal-dual index heuristic algorithm based on the first-order LP relaxation, which has been demonstrated to have less complexity and very close performance compared to the optimal one [25].

###### 3.2.1. Linear Programming (LP) Relaxation

In order to formulate the restless bandit problem as a linear program we introduce performance measures:
where is an admissible scheduling policy,
Notice that represents the expected total discounted time that in state takes action under policy , where if action is taken in time and otherwise. The corresponding performance region, spanned by performance vector under all admissible policies, is denoted by :
Reference [25] proved that the performance region is the* restless bandit polytope *. The restless bandit problem can thus be formulated as the linear program:
The approach developed in [25] is to construct relaxations of polytope so as to yield polynomial-size relaxations of linear program. Denote by the relaxations not on the original variables , but in a higher-dimensional space that includes new auxiliary variables. Define , which is precisely the projection of* restless bandit polytope * over the space of the variable for . A complete formulation of is given by [25]:
where denotes the probability that the initial state of relay is . According to Whittle’s condition, the average number of active relay can be written as
In our scheme, only one relay is selected at each time slot, so .

Therefore, the first-order relaxation can be formulated as the linear program There are variables and constraints of this linear program , with the polynomial size in the problem dimensions.

###### 3.2.2. Primal-Dual Priority Index Heuristic

In this section, we present a heuristic for the restless bandit problem, which uses information contained in optimal primal and dual solutions to the first-order relaxation . The primal-dual heuristic is interpreted as a priority-index heuristic as well. The dual of linear program is Let and be an optimal primal and dual solution pair to the first-order relaxation () and its dual . The corresponding optimal reduced cost coefficients are defined as which must be nonnegative. and are the rates of decrease in the objective value of linear program (19) per unit increase in the value of variables and , respectively. Based on the cost coefficients, the index of relay in state is defined as The priority-index rule is that the relay with the smallest index is selected to be active.

##### 3.3. Process of Relay Selection and Power Allocation Scheme

In this section, we present the indexable relay selection and power allocation scheme in CR relay networks. Our proposed scheme is divided into offline computation and online selection. The specific procedure is given in Algorithm 1.

*Algorithm 1 (process of relay selection and power allocation scheme). *Consider the following steps.*Step 1 (offline computation).* (1) According to the spectrum state, channel state, and residual relay energy state, the state space and transition probability matrices under different actions can be determined.

(2) Input the state transition probability , the reward , the discount factor , and initial state probability and then compute the priority indices according to (20)–(22); the indices are stored in an index-table.

(3) Each relay stores this index-table. *Step 2 (online selection).* (1) At the beginning of each time slot , all the candidate relays sense the spectrum occupancy state, estimate the channel gain, and detect the residual energy to obtain the spectrum state, channel state, and residual energy state.

(2) Each candidate relay shares its state with each other.

(3) Each candidate relay looks the indices up for all relays in the index-table; the relay with the smallest index is selected to be active, and the corresponding power allocation can be obtained.

#### 4. Numerical Results and Analysis

In this section, numerical results are provided to show the physical-layer security and energy consumption improvement by exploiting the proposed relay selection and power allocation scheme. The maximum transmission power limit for and is set as 150 mw; the battery capacity of each relay is set to be 1000 mAh with the output voltage 1 Volt. The discount factor is 0.7. The links , and are divided into “bad” and “good” states; the spectrum occupancy state is “busy” and “idle.” For both and , the transition probability between the different states is 0.3 and the probability of staying in the same state is 0.7. The residual energy of each relay is divided into “high,” “low,” and “dead” states; set to be the residual energy state transition probability matrix when the relay is active and when it is passive; that is,

The following methods are simulated for comparison:(i)the proposed relay selection and power allocation scheme;(ii)the memoryless relay selection scheme, in which the relay node is selected for subsequent data transmission according to the current channel condition;(iii)the traditional relay selection scheme [8], in which the eavesdroppers’ channel condition is not taken into account;(iv)the arbitrary relay selection scheme.

The computational complexity of the proposed relay selection and power allocation scheme and that of those existing schemes are tabulated in Table 1. denotes the number of candidate relays and denotes the time horizon. Compared to the memoryless selection scheme with order of operations, the complexity of the proposed selection scheme is reduced due to the indexability property. It also shows that the complexity of the arbitrary selection scheme is independent of the number of candidate relays.

##### 4.1. Secrecy Capacity Performance Improvement

This subsection presents the numerical secrecy capacity results of the proposed relay selection scheme. We do not consider the energy issue here, which will be considered later. Thus, the weighs can be specified as , , and .

Figure 2 shows the average secrecy capacity improvement of the proposed scheme with the different number of candidate relays. We assume that the interference temperature limit mw and eavesdropper . We can see that as the number of candidate relays increases; the probability that there exists a candidate relay with better state is high so that there is always a good candidate for the relay selection schemes. It also can be seen that the proposed scheme always has the larger average secrecy capacity compared with the memoryless scheme, the traditional scheme, and the arbitrary scheme. This is because the memoryless scheme selects the relay node for subsequent data transmission according to the current channel condition, which may change during the subsequent data transmission. The traditional scheme does not take the eavesdropper channels into account; it is not able to support systems with secrecy constraint. The arbitrary selection scheme has the worst secrecy capacity performance.

Figure 3 shows the average secrecy capacity versus the number of eavesdroppers for different schemes with candidate relays and interference temperature limit mw. We can see that as the number of eavesdroppers increases, the achievable secrecy capacity of all the schemes is significantly reduced. This is because with the number of eavesdroppers increasing, the probability that the wiretap links become much better than the main link is high. As a result, the eavesdroppers will most likely succeed to intercept the legitimate transmission. However, the proposed scheme defends more effectively against eavesdropping attacks than the existing schemes, which confirms the advantage of the proposed scheme.

Figure 4 illustrates the average secrecy capacity under different interference temperature limits with candidate relays and eavesdropper . We can observe that the average secrecy capacity of all schemes changes nonsignificantly when dBm and increases with the increasing when dBm. This is due to the fact that when the interference temperature limit is less than 10 dBm and the spectrum is sensed to be “busy,” ’s transmission power directly depends on to guarantee the QoS of primary users.

##### 4.2. Energy Consumption Improvement

In this subsection, we demonstrate the energy consumption improvement of the proposed scheme. We set the weights as , , and . For fair comparison, the memoryless selection scheme is revised to select relay with the highest residual energy without considering the energy consumption. The traditional selection scheme selects relay to maximize the achievable data rate and minimize the energy consumption without considering eavesdroppers while the arbitrary selection scheme selects relay among the alive relay nodes.

Figure 5 shows the average reward comparison among the proposed scheme, the memoryless scheme, the traditional scheme, and the arbitrary scheme. There are candidate relays and potential eavesdropper. Since more and more relays run out of energy with the increase of simulation time, the number of available relays decreases. As a result, the average reward for all the schemes declines with time. The proposed scheme outperforms the other three schemes. This is because the achievable secrecy capacity and the energy consumption contribute to the reward. The proposed scheme selects relay node that costs less energy at the decision time, while the memoryless scheme and the arbitrary schemes do not take the energy consumption into consideration, and the traditional scheme’s objective is to maximize the achievable data rate and minimize the energy consumption without considering eavesdroppers, so it cannot support the secure secondary transmission.

The energy consumption also has some effects on the average secrecy capacity. As shown in Figure 6, the average secrecy capacity declines with increasing simulation time. This is due to the fact that increasingly more relay nodes run out of energy after data transmission for some time slots. It can be seen that there is hardly any live relay at about 1000 s, and the average secrecy capacity of the proposed scheme outperforms the other selection schemes.

Figure 7 compares the energy consumption for different relaying schemes with available relays and potential eavesdropper. We can see that the energy of the memoryless selection scheme, the traditional selection scheme, and the arbitrary selection scheme run out earlier than that of the proposed scheme, which further confirms the advantage of the proposed scheme.

Figure 8 reveals the average network lifetime of different relaying schemes with available relays and potential eavesdropper. In this paper, the network lifetime is defined as that the number of dead relays that reach a threshold, th, such that the considered cognitive network can no longer achieve the target secrecy performance. As expected, the network lifetime of all schemes increases with th. In addition, our proposed scheme always has the best performance.

#### 5. Conclusion

In this paper, we have explored the physical layer security and efficient energy consumption of the secondary transmission and proposed the best relay selection and dynamic power allocation scheme. Moreover, the spectrum occupancy state, the wireless channels, and residual relay energy are characterized as finite-state Markov model in order to accurately describe the time-varying radio environment. Specifically, we formulated the relay selection and power allocation problem as a restless bandit system and solved this stochastic control problem with a primal-dual index heuristic algorithm. Finally, simulation results have been presented to illustrate that the proposed relay selection and power allocation scheme can significantly maximize the secrecy capacity as well as minimize the energy consumption compared to the existing schemes.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 61471060), National High Technology Research and Development Program of China (863 Program) (no. 2014AA01A706), and Funds for Creative Research Groups of China (no. 61421061).