The Scientific World Journal

Review Article

Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

Table 1

RL models with direct application of the traditional RL approach for various schemes in CR networks.


References	Purpose	State	Action	Reward/cost

(A1) Dynamic channel selection (DCS)
Tang et al. [2]	Each SU (agent) selects the operating channel with the least channel utilization level by PUs in order to improve throughput and to reduce end-to-end delay and the number of channel switches	—	Selecting an available channel for data transmission	Fixed positive/negative values to be rewarded/punished for successful/unsuccessful transmission
Li [6]	Each SU (agent) selects different operating channel with other SUs in order to reduce channel contention	—	Selecting an available channel for data transmission	Amount of successful data packet transmission
Yao and Feng [19]	SU base station (agent) selects an available channel and a power level for data transmission in order to improve its SNR. This scheme aims to increase packet delivery rate	Three-tuple information: (i) SU hosts of the SU base station, (ii) transmitting SU hosts, (iii) received power on each channel	Selecting a set of actions (see Section 3.2): (i) available channel for data transmission, (ii) transmission power level	SNR level
Li et al. [18]	Each SU link (agent) aims to maximize its individual SNR level. Note that the agent is a SU link, instead of the SU itself as seen in the other schemes	The availability of a channel for data transmission. States and indicate that channel is idle and busy, respectively	Selecting an available channel for data transmission	SNR level, which takes into account the interference from neighboring SUs

(A2) Channel sensing
Lo and Akyildiz [3]	Each SU (agent) (i) finds a set of neighboring SUs for cooperative channel sensing, (ii) minimizes cooperative channel sensing delay. This scheme aims to increase the probability of PU detection	A set of SU neighbor nodes that may cooperate with the SU agent to perform cooperative channel sensing	Selecting SU neighbor nodes that may cooperate with the SU agent. The SU neighbor nodes cooperate through sending their respective local sensing outcome to the SU agent	The reward (or cost) is dependent on the reporting delay, which is the time between a SU agent requesting for cooperation from a SU neighbor node and the arrival of its sensing outcome

(A4) Energy efficiency enhancement
Zheng and Li [15]	Each SU (agent) selects a suitable action (transmit, idle, sleep, or sense channel) whenever it does not have any packets to send in order to reduce energy consumption	Four-tuple information: (i) operation mode: transmit, idle, and sleep, (ii) number of packets in the buffer, (iii) availability of PU activities, (iv) countdown timer for periodic channel sensing	Selecting an action: transmit, idle, sleep, or sense channel	Amount of energy consumption for each operation mode throughout the duration of the operation mode

(A7) Routing
Peng et al. [4]	Each SU (agent) selects a SU neighbor node (or next hop) for data transmission to SU destination node in order to reduce end-to-end delay and energy consumption	A set of SU next hops	Selecting a SU next hop	Ratio of the residual energy of the SU next hop to energy consumption incurred by sending, receiving, encoding, and decoding data while transmitting data to the SU next hop