Review Article

Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

Table 2

Summary of RL models and algorithms for various schemes in CR networks.

ModelPurposeReferences

Model with in -function This model uses so that there is lack of dependency on future rewardsLi et al. [10, 17, 18]

Model with a set of -functionsThis model uses a set of distinctive -functions to keep track of the -values of different actions Di Felice et al. [11, 21]

Dual -function ModelThis model updates two -functions for the next and previous states, respectively, simultaneously in order to expedite the learning process Xia et al. [33]

Partial observable modelThis model computes belief state, which is the probability of the environment operating in a particular state, in a dynamic and uncertain operating environmentBkassiny et al. [34]

Actor-critic modelThis model adjusts the delayed reward value using reward corrections in order to expedite the learning processVucevic et al. [13]

Auction modelThis model allows agents to place bids during auctions conducted by a centralized entity so that the winning agents receive rewards Chen and Qiu [16],
Jayaweera et al. [36],
Fu and van der Schaar [37], and
Xiao et al. [38]

Internal self-learning modelThis model enables an agent to exchange its virtualactions continuously with rewards generated by a simulated internal environment within the agent itself in order to expedite the learning processBernardo et al. [27]

Collaborative modelThis model enables an agent to collaborate with its neighbor agents and subsequently make local decisions independently in distributed networks. A local decision is part of an optimal joint action, which is comprised of the actions taken by all the agents in a networkLundén et al. [20]
Liu et al. [39]

Competitive modelThis model enables an agent to compete with its neighbor agents and subsequently make local decisions independently in worst-case scenarios in the presence of competitor agents, which attempt to minimize the accumulated rewards of the agent Wang et al. [14]