Cognitive wireless mesh networks have great flexibility to improve spectrum resource utilization, within which secondary users (SUs) can opportunistically access the authorized frequency bands while being complying with the interference constraint as well as the QoS (Quality-of-Service) requirement of primary users (PUs). In this paper, we consider intercluster connection between the neighboring clusters under the framework of cognitive wireless mesh networks. Corresponding to the collocated clusters, data flow which includes the exchanging of control channel messages usually needs four time slots in traditional relaying schemes since all involved nodes operate in half-duplex mode, resulting in significant bandwidth efficiency loss. The situation is even worse at the gateway node connecting the two colocated clusters. A novel scheme based on network coding is proposed in this paper, which needs only two time slots to exchange the same amount of information mentioned above. Our simulation shows that the network coding-based intercluster connection has the advantage of higher bandwidth efficiency compared with the traditional strategy. Furthermore, how to choose an optimal relaying transmission power level at the gateway node in an environment of coexisting primary and secondary users is discussed. We present intelligent approaches based on reinforcement learning to solve the problem. Theoretical analysis and simulation results both show that the intelligent approaches can achieve optimal throughput for the intercluster relaying in the long run.
1. Introduction
Wireless mesh networks (WMNs) are experiencing rapid growth around the world. The limited spectrum resource and conventional allocation methods are resulting increasingly in over-crowding as the demand for wireless communications increases. On the other hand, it already has been observed that most of the authorized spectrum is significantly underutilized due to the traditional static spectrum allocation [1]. Cognitive radio (CR) is a promising wireless communication paradigm proposed to improve the inefficient spectrum usage [2, 3]. It is suitable for opportunistic access to various licensed or unlicensed spectrum bands, making it specifically applicable to the heavy spectrum access requirements seen in a dynamic wireless mesh networking environment. The research on CR has already penetrated into different types of wireless networking scenarios, covering almost every aspect in wireless communications [4–8].
In this paper, we focus on the cognitive wireless mesh networking framework, named as CogMesh which is described in [4] with more details. As illustrated in Figure 1, CogMesh is a self-organized and self-configured hierarchical network architecture combining the cognitive radio accessing technologies with the distributed mesh structure. It provides an integrated service platform over a wide range of converged heterogeneous networks, which will enable opportunistic spectrum access in various licensed and unlicensed frequency bands. Basically, the CogMesh networking configuration is restricted by the activity of primary users, depending on the locally perceived spectrum availability and the spatial-temporal variations of the primary users’ behavior. This fundamental feature inherently leads to the natural partitioning of the network architecture. The wireless network will be partitioned into clusters within which the involved secondary users agree on one or more common control channels for networking configuration based on the locally varying spectrum availability. The clusters themselves can be reconfigured subject to the presence of the primary users. Accordingly, the CogMesh network is built by interconnecting a number of clusters through various gateway nodes, as shown in Figure 2. The gateway nodes will transfer data which includes control channel messages between any two possible neighboring clusters.
Figure 1: Cognitive wireless mesh neteworking (CogMesh) scenarios.
Figure 2: Cluster-based network formation in CogMesh.
There are two typical cases for intercluster connection: the two neighbor clusters are overlapping or nonoverlapping. In the first case, the gateway node is one-hop neighbor of the two corresponding clusterheads. As depicted in Figure 2, and are clusterheads of cluster and cluster respectively. is selected as the gateway node, interconnecting the two clusters. When the clusterhead has information (e.g., control channel message) sent to the clusterhead it firstly sends the information to node Then node relays it to the cluster head In the reverse path, the cluster head B sends the information (e.g. control channel message) to node and node relays it to the clusterhead In the second case, if the two clusters are nonoverlapping but there are nodes belonging to the two clusters that can hear each other, they are chosen as the gateway node to interconnect the two clusters. Because the coordination of the two gateway nodes needs one more hop, the information exchange in this case is a little more complex but still follows the same principle and procedures.
This paper studies the first case and the relevant results can be easily extended to the second case. We model such intercluster connection as a two-way relaying channel model [9]. In the basic scenario, there are two clusterhead and (i.e., two source stations) exchanging the data, including the control channel message, through the gateway node (i.e., relaying). The direct link between and is impossible because they are too far away from each other. The traditional approach, discussed in the previous paragraph, uses a time-division multirelaying scheme which usually needs four time slots to complete a round of message exchange (Figure 3(a)). Recently, network coding, which was first introduced by Ahlswede et al. [10], has inspired intensive research activities in the context of wired and wireless networks [11–13]. Network coding can offer network throughput improvement for two-way communication flows [11, 12].
Figure 3: Intercluster connection in CogMesh.
Moreover, by applying the idea of network coding, the authors in [11] have proposed a method to reduce the number of required time-slots from four to three for internode data exchange. In this method (Figure 3(b)), first sends the message to during time slot 1, and decodes . During time slot 2, sends the message to and decodes . In time-slot 3, broadcasts to and a new message which consists of bits obtained by bit-wise exclusive-or (XOR) operations over and . Since knows , A can recover its desired message by decoding and then obtaining as . Similarly, can recover . The principle of network coding has been further investigated in [12], within which the proposed scheme is named as analogue network coding (ANC). In comparison, this scheme lets and send signals simultaneously in the first time slot. Then after amplifying, the gateway node broadcasts a scaled signal in the second time slot to both and (see Figure 3(c)).
In our paper, we take advantage of the ANC-based network coding scheme for enhancing the data flows across the neighbor clusters. The obvious advantage of network coding is that it effectively utilizes the broadcasting nature of wireless communications to fulfill the data exchange in two time slots. Generally, the aforementioned network coding approaches are mainly carried out in interference-free wireline and wireless networking scenarios. However, due to the PUs’ presence in the context of CogMesh networks, the data flows including the control channel message exchange between any two neighboring clusters. This should not violate the interference and QoS constraints of the locally coexisting PUs, which gives rise to the unique reason to implement the network coding scheme and will be specifically dealt with in the following section of this paper.
A large amount of research work on cognitive radio-enabled dynamic spectrum access has been mainly concentrated on addressing two major technical issues. The first issue is the detection of spectrum opportunities (“spectrum holes”) that can be used by the secondary users for transmission. The second one is to develop resource allocation solutions for efficient usage of the detected “spectrum holes” for the secondary users while realizing peaceful spectrum sharing with the primary users. In this paper, another subject will be addressed as the third challenge. In parallel with the aforementioned ANC-based approach, we pay special attention to the interaction of cognitive wireless user (i.e., gateway node) with its local wireless environment via a learning processes. We focus on developing intelligent solutions that can be employed by the gateway node to improve its relaying performance in the CogMesh framework. In particular, we aim at exploring how to efficiently predict the future value function impact of these solutions and then determine its transmission power level and the associated relaying strategy over time, based on information about the current spectrum opportunities, the transmit power and channel characteristics, and the interaction with the clustering environment.
Accordingly, unlike the previous work on spectrum sensing and resource management, our main concern is how users can predict, adapt to and learn from their wireless communication environment and optimize the associated transmission strategies given networking “dynamics” experienced during the multiple-round interactions. Corresponding to the colocated multiple clusters in the CogMesh framework, we apply advanced learning techniques to the gateway node to improve its relaying performance for effectively increasing the data flows including the control channel message exchange under various dynamic wireless environmental constraints, resulting from variations in the behavior of the wireless sources, such as the stochastic behavior of the primary users.
Experiencing repeated interaction, the gateway node can obtain partial historic information of the outcome of the data flows, from which the estimation of the impact on the expected future rewards can be performed using different types of interactive learning. In this paper, we focus on reinforcement learning because this allows the gateway node to improve its strategy based only on the knowledge of its own past received payoffs. Our proposed best response learning policies are inspired from the Dynamic Programming (DP) and -greedy learning for the single agent interacting with environment. Unlike the aforementioned two learning policies, the proposed best response learning explicitly considers the interaction and coupling between the environment and the gateway node. By applying the best response learning policies, the gateway node can strategically predict the impact of current actions on future performance and then optimally make its decision.
Our work in this paper mainly includes two parts. The first part gives detailed theoretical analysis about Traditional Intercluster Connection (TIC) and Network Coding-based Intercluster Connection (NCIC) in CogMesh. In the second part of our work, we present reinforcement learning-based policies for the gateway node selecting appropriate transmission power level. An intelligent gateway node learns from interactions with the environment on how to behave in order to achieve the goal of optimal relaying throughput in the long run. Accordingly, our contribution is mainly in three aspects. First, we investigate the intercluster connection within the framework of CogMesh. Secondly, network coding is applied to enhance the connection between the neighboring clusters. Thirdly, by further applying reinforcement learning to select transmission power level at the gateway node, we get optimal relaying throughput in an interference-restricted environment. This paper is organized as follows. Section 2 discusses the traditional and network coding-based intercluster connection. In Section 3, how to get policies of selecting transmission power level based on reinforcement learning are presented. Simulations and results are provided in Section 4. The conclusion is given in Section 5.
2. Intercluster Connection in CogMesh
As shown in Figure 4, we consider a typical scenario which has one specific PU link and two neighboring clusters. By applying opportunistic spectrum access techniques, the PU and SUs may share the same frequency band There are two intercluster communication flows, and , respectively. The gateway node performs Amplifying-and-Forwarding (AF) operation in CogMesh in order to relay the data flows across the two neighboring clusters. All SU nodes are half-duplex within each cluster. is the signal transmitted from the secondary user in time slot . If only one node is transmitting, the received signal at node in time slot is
where is the channel coefficient between the primary transmitter (PT) and the secondary receivers is the additive white Gaussian noise (AWGN) with zero mean and variance . The transmitted signal has zero mean and a variance , and denotes the transmitted signal from the PT with zero mean and variance is the channel coefficient between and and for analytical simplicity, is assumed to be flat and symmetric in the local cluster area, which implies
If and transmit simultaneously, receives
Figure 4: Two-way relay channel of cognitive users coexisting with PU.
Furthermore, the channel coefficient is denoted by here, between the secondary user and the primary receiver (PR). is the channel coefficient between PT and PR. In order to find the routing-rate, we assume that the time-invariant channels and their coefficients are perfectly known by all SUs.
In this paper, we are particularly interested in how to improve the relaying performance of the gateway node and to increase the routing-rate during the data flow exchange by exploring the network coding scheme.
Definition 1. During time slot (ts), receives bits reliably from and receives bits reliably from , then the routing-rate is given by
In order to ensure the feasibility of data relaying, the collocated clusters have to follow the following constraints.
(1)Mean-squared error (MSE) constraint. The interference caused by SUs to PU should not exceed a certain threshold. The MSE derived by memory-less estimation of the primary signal at the primary receiver should be less than or equal to a predefined value , which also represents the acceptable QoS level required by the primary user as indicated in reference [8].(2) Maximum transmit power constraint. The transmit power of an SU should not exceed In this paper, for the sake of simplicity, we assume the following.(a) The maximum transmit power is same for all SUs, that is, . It is easy to extend the discussion to the case where is user dependent.
(b) The clusterheads and can transmit with the maximum transmit power without violating constraint . Since in this paper we place our emphasis on the gateway node’s performance, this assumption is especially suitable for the targeted scenario that PUs appear in the overlap area of two clusters. PUs are nearer to the gateway node than the clusterheads such that the transmission power of the gateway node is constrained by and in while the two clusterheads can transmit with the maximally permitted power and still maintain constraint at the same time. Our future work will discuss other scenarios where the transmission power of the clusterheads and the gateway node needs to fully satisfy both and .
From now on, we compare the Network Coding-based Intercluster Connection with the Traditional Intercluster Connection. The theoretical analysis of the achievable routing-rates is given in details as follows.
2.1. Traditional Intercluster Connection
As mentioned above, the clusterhead transmits in time slot to the gateway node at first. Then relays the received signal by an amplifying factor under the constraints and . In this case, the optimal amplifying factor for increasing the relaying throughput can be obtained as
that is
where the detailed derivation of (5) is given in the appendix. Clusterhead receives a scaled signal in next time slot :
Therefore can receive
Similarly, clusterhead receives
where
Since the total duration is 4 time slots, then the routing-rate for the Traditional Intercluster Connection is
2.2. Network Coding-Based Intercluster Connection
The clusterheads and simultaneously transmit in time slot . receives and the variance of it is denoted by
Then following the same optimization approach as above, the gateway node can relay by an optimal amplifying factor :
in complying with the constraints and , that is,
and broadcast it to the clusterheads and at the same time. receives in the next time slot
Since knows its own transmitted signal, it can subtract the back-propagating-self-interference and obtain
which implies that can receive
Similarly, receives
The total duration is 2 time slots in this scheme, so the achieved routing-rate is
3. Intercluster Relaying Based on Reinforcement Learning
Reinforcement learning has been successfully used in cognitive radio network for channel assignment and is shown to be computationally simple and efficient. The signal amplification at the gateway node in a dynamic CogMesh environment can be viewed as a reinforcement learning problem [14]. In this section, we briefly explain the reinforcement learning agent in the Network Coding based Intercluster Connection, and then we present an intelligent approach based on reinforcement learning to solve the signal amplification problem.
3.1. Preliminaries of Reinforcement Learning and Problem Formulation
Hereinafter, we briefly introduce the concept of reinforcement learning. Inspired by psychological theory, reinforcement learning is a subarea of machine learning concerned with how an agent takes actions in an environment in order to maximize a numerical reward [14]. The dynamic environment evaluates every action selected by the agent and a reward is sent back to the agent accordingly. The next action is chosen by the result of learning. The agent is not told which actions to take, but instead must discover which actions yield the most reward by trying them. Reinforcement learning algorithms are designed to find a policy that maps states of the environment to the best actions of an agent. The environment is typically formulated as a finite-state Markov decision process (MDP). Formally, a particular reinforcement learning model consists of [15]
(A)a set of environment states (B)a set of actions ,(C)a set of scalar rewards in .Regarding the intercluster connection, a reinforcement learning agent (gateway node) learns from its interaction with the environment on how to behave in order to achieve the goal of maximum relaying throughput. We consider the PU’s transmit power as the environment state, the selection of transmission power level for data relaying at the gateway node as the agent’s action, and the achieved routing-rate as the reward gained by the gateway node.
The agent and environment interact in a sequence of discrete message exchange rounds, At each round , the agent senses the environment state, , where is the set of PU’s transmit powers; the agent selects an action , where is the set of actions available in state . Corresponding to the CogMesh environment, we specify appropriate transmit power levels: , here . denotes that the PU’s transmit power is , at round , then = And we specify N transmission power levels: , here . denotes that the transmission power level of the gateway node is at round , then . At the next round, in part as a consequence of its action, the agent achieve
where denotes that and denotes that finds itself in a new environment state, . At each round , the agent’s policy is the probability that if .
Formally, the value of a state under a policy is defined as
where denotes the expected value given that the agent follows policy , and is a parameter called the discount rate, . Similarly, we define the value of taking action in state under a policy , denoted as the expected return starting from , taking the action , and thereafter following policy :
For any policy and any state , the following condition holds between the value of and the value of its possible successor state:
where is the transition probability and is the expected value of next received bits.
Solving the task of selecting an appropriate transmission power level means, roughly, finding a policy that achieves maximum relaying throughput over the long run. A policy is defined to be better than or equal to a policy if its expected return is greater than or equal to that of for all states. In other words, if and only if for all . There is always at least one policy that is better than or equal to all other policies, which is an optimal policy. Although there may be more than one, we denote all the optimal policies by They share the same state-value function, called the optimal state-value function, denoted by and defined as
for all . Optimal policies also share the same optimal action-value function, denoted by , and defined as
for all and . For the state-action pair , this function gives the expected return for taking action in state and thereafter following an optimal policy.
3.2. Relaying Signal Amplification Based on Reinforcement Learning
3.2.1. Dynamic Programming (DP)
The reason to compute the value function for a policy is to help find better policies. Suppose that we have determined the value function for an arbitrary deterministic policy . For some state we would like to know whether or not it is better to choose an action . The criterion is whether this is greater than or less than . If it is greater, that is, if it is better to select action once in state and thereafter follow than it always follows , then we would expect that it is better to select once in , and that the new policy would be a better one.
Since policy has been improved to yield a better policy , we can then obtain and improve it again to produce a better policy, We can thus obtain a sequence of monotonically improving policies and value functions [14]:
where denotes a policy evaluation and denotes a policy improvement. This process must converge to an optimal policy and optimal value function in a finite number of iterations, because a finite MDP has only a finite number of policies. This way of finding an optimal policy is called dynamic programming. A complete algorithm is given; see Algorithm 1.
Algorithm 1: Selection of transmission power level based on DP.
3.2.2. -Greedy Policy
The -greedy policy chooses an action that has maximal estimated action value most of the time. However, they will randomly select an action with probability That is, all nongreedy actions are given the minimal probability of selection, , and the remaining probability, , is given to the greedy action [14]. Let be the intelligent policy, then
The algorithm is given, see Algorithm 2.
Algorithm 2: Selection of transmission power level based on -greedy policy.
4. Numerical Results
In this section, we present simulation-based experiments for testing the intercluster connection in Figure 4. First, we compare the performances of TIC (Traditional Intercluster Connection) and NCIC (Network Coding based Intercluster Connection). Secondly, we quantify the performance of our proposed learning algorithms. We assume that the channel coefficients are perfectly known to all nodes in the simulation. The channel coefficients are given by
where is the physical distance between nodes and , and is the path loss exponent. In the simulation, the path loss exponent is assumed to be 4. Rewriting in (5) as
we derive
Since even without any channel output, the MSE in estimating the primary transmitted signal is at most , that is, . If , the SU transmission is no longer constrained by the PU. Therefore, in simulation, the value assigned to must satisfy
4.1. Performance Comparison between TIC and NCIC
In this subsection, we study the performance of TIC and NCIC. We assume that the frequency bandwidth , the transmission power of PU , the variance of AWGN , and Binary Frequency Shift Keying (BFSK) and Binary Phase Shift Keying (BPSK) are chosen as the modulation schemes. We use following metrics to compare NCIC with TIC:
(i)Bit Error Rate (BER): the percentage of erroneous bits in relayed packets.(ii)Routing-Rate: this is the total relayed bits during each time slot.Figure 5 depicts the BERs of TIC and NCIC with different modulation schemes (BPSK and BFSK) versus the transmit power of the gateway node. It can be observed that the BER performance of NCIC is worse than that of TIC. Figure 6 shows the routing-rates of TIC and NCIC whereas NCIC outperforms TIC. Interestingly, the curves in two figures approach constant values no matter how the transmit power at the gateway node increases; for example, the error floors takes place in Figure 6. This is because the interference caused by SUs to PUs increases as the gateway node raises its transmission power such that the MSE constraint by PUs dominates finally, which restricts the available transmission power level of the gateway node.
Figure 5: BER versus
Figure 6: System throughput versus
As illustrated in Figures 5 and 6, in regard to improving the data relaying throughput across the neighboring clusters, NCIC performs substantially well over TIC. Therefore, NCIC is more suitable than TIC, since the relaying throughput is taken more seriously during the data flowing procedure.
On the other hand, concerning the initial cluster setting-up stage for CogMesh networking formation, especially if we want to guarantee reliability for the critical control channel message exchange, TIC is preferable because it provides robust message exchange in the interference-deteriorated channel even though it losses the routing-rate to some extent.
4.2. Impact of Dynamic Environment on Learning Policies
We present numerical results to compare the performances of the intelligent relaying signal amplification based on DP and -greedy policies. During the whole simulation processes, we specify 3 transmission power levels of PU: 20 dBm, 25 dBm, 30 dBm, with the corresponding state set and specify 20 transmission power of the gateway node: 11 dBm, 12 dB, 13 dBm, , 30 dBm, with the corresponding action set The other parameters are set as follows: QoS requirement discount rate and
In Figure 7, we characterize the convergence behavior of the state value functions for DP-based policy. It can be seen that the numbers of iterations are no more than 100. Figure 8 shows convergence behavior of the probabilities of optimal policies in different states for -greedy policy.
Figure 7: State value function versus for DP-based policy.
Figure 8: Probability of optimal policy at different states for ε-greedy-based policy.
The BER dynamics of the DP-based policy and -greedy policy are shown in Figure 9 and the routing-rate dynamics are shown in Figure 10. We can see that the -greedy policy cannot achieve better performance than DP-based policy since it always gives the probability to select the available actions randomly.
Figure 9: BER comparison between DP-based policy and ε-greedy policy.
Figure 10: Relay rate comparison between MDP-based policy and ε-greedy MC-based policy.
5. Conclusion
This paper investigates the intercluster connection issue within the framework of CogMesh networks. Corresponding to the distributed secondary users, all transmissions should satisfy the QoS and interference constraints imposed by the primary users. The Traditional Intercluster Connection scheme cannot achieve scheduling and routing multiple data flows at the same time because they may interfere with each other. Therefore, the Network Coding-based Intercluster Connection scheme, which allows multiple data flows to be transmitted simultaneously across the neighboring clusters under the QoS and interference constraint by PUs, is proposed. Our simulation experiments show that the Network Coding-based Intercluster Connection has a significant advantage over the Traditional Intercluster Connection in the data relaying procedure. However, in the initial cluster formation stage especially concerning the critical control channel message exchange, the Traditional Intercluster Connection is preferable because it provides robust data relaying in the interference-restricted channel even though it losses the routing-rate to some extent.
Moreover, based on reinforcement learning, we address the problem of how to choose the optimal transmission power level at the gateway node for enhancing the data relaying throughput. Two intelligent policies, namely, the DP-based policy and the -greedy policy, are investigated which take the clustering environment status into account. The novel feature of the intelligent policies is that without perfect knowledge of the primary user’s transmit power and QoS requirement the gateway node can optimize the relaying throughput by interacting with the environment in the long run. Due to the fact that it gives a certain opportunity to select the available actions in the environment state, the -greedy policy converges to, but can never achieve, the performance of DP-based policy.
Appendix
Derivation of C1 in (5)
In this section, we introduce a simplified channel model; as shown in Figure 7, the PU receives signal
where denotes the sampled discrete time, and is the AWGN with zero mean and variance .
Let be an unknown random variable, and let be a known random variable. What is the best guess of given in the MMSE sense? That is, we want to find a function such that we can minimize
The expectation is taken over both and . In this paper, we restrict the functional form of to be homogeneous linear; that is, and we want to minimize
Equation (A.3) can be expressed in a compact form
where
The solution for can be found out from , that is,
where and . Thus we get
Combining (A.7) and (A.4), the minimum MSE is given
Following, we present a detailed analysis into the derivations of cross-correlation matrix and autocorrelation matrix . Here, we assume that the transmitted signals are uncorrelated, then
In the same way, we can derive
The inverse of is
Hence, by combining (A.8), (A.9), and (A.11), the minimum MSE can be expressed as
If the PU imposes a QoS requirement on the MMSE, in other words, the PU’s MMSE should not exceed a predefined . Finally, the constraint in(5)
is obtained.