Review Article

Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

Table 1

RL models with direct application of the traditional RL approach for various schemes in CR networks.

ReferencesPurposeStateActionReward/cost

(A1) Dynamic channel selection (DCS)
Tang et al. [2]Each SU (agent) selects the operating channel with the least channel utilization level by PUs in order to improve throughput and to reduce end-to-end delay and the number of channel switchesā€”Selecting an available channel for data transmissionFixed positive/negative values to be rewarded/punished for successful/unsuccessful transmission
Li [6]Each SU (agent) selects different operating channel with other SUs in order to reduce channel contentionā€”Selecting an available channel for data transmissionAmount of successful data packet transmission
Yao and
Feng [19]
SU base station (agent) selects an available channel and a power level for data transmission in order to improve its SNR. This scheme aims to increase packet delivery rateThree-tuple information:
(i) SU hosts of the SU base station,
(ii) transmitting SU hosts,
(iii) received power on each channel
Selecting a set of actions (see Section 3.2):
(i) available channel for data transmission,
(ii) transmission power level
SNR level
Li et al. [18]Each SU link (agent) aims to maximize its individual SNR level. Note that the agent is a SU link, instead of the SU itself as seen in the other schemesThe availability of a channel for data transmission. States and indicate that channel is idle and busy, respectivelySelecting an available channel for data transmissionSNR level, which takes into account the interference from neighboring SUs

(A2) Channel sensing
Lo and Akyildiz [3]
Each SU (agent)
(i) finds a set of neighboring SUs for cooperative channel sensing,
(ii) minimizes cooperative channel sensing delay.
This scheme aims to increase the probability of PU detection
A set of SU neighbor nodes that may cooperate with the SU agent to perform cooperative channel sensingSelecting SU neighbor nodes that may cooperate with the SU agent. The SU neighbor nodes cooperate through sending their respective local sensing outcome to the SU agentThe reward (or cost) is dependent on the reporting delay, which is the time between a SU agent requesting for cooperation from a SU neighbor node and the arrival of its sensing outcome

(A4) Energy efficiency enhancement
Zheng and
Li [15]
Each SU (agent) selects a suitable action (transmit, idle, sleep, or sense channel) whenever it does not have any packets to send in order to reduce energy consumptionFour-tuple information:
(i) operation mode: transmit, idle, and sleep,
(ii) number of packets in the buffer,
(iii) availability of PU activities,
(iv) countdown timer for periodic channel sensing
Selecting an action: transmit, idle, sleep, or sense channelAmount of energy consumption for each operation mode throughout the duration of the operation mode

(A7) Routing
Peng et al. [4]Each SU (agent) selects a SU neighbor node (or next hop) for data transmission to SU destination node in order to reduce end-to-end delay and energy consumptionA set of SU next hops
Selecting a SU next hopRatio of the residual energy of the SU next hop to energy consumption incurred by sending, receiving, encoding, and decoding data while transmitting data to the SU next hop