The Scientific World Journal

Review Article

Application of Reinforcement Learning in Cognitive Radio Networks: Models and Algorithms

Table 2

Summary of RL models and algorithms for various schemes in CR networks.


Model	Purpose	References

Model with in -function	This model uses so that there is lack of dependency on future rewards	Li et al. [10, 17, 18]

Model with a set of -functions	This model uses a set of distinctive -functions to keep track of the -values of different actions	Di Felice et al. [11, 21]

Dual -function Model	This model updates two -functions for the next and previous states, respectively, simultaneously in order to expedite the learning process	Xia et al. [33]

Partial observable model	This model computes belief state, which is the probability of the environment operating in a particular state, in a dynamic and uncertain operating environment	Bkassiny et al. [34]

Actor-critic model	This model adjusts the delayed reward value using reward corrections in order to expedite the learning process	Vucevic et al. [13]

Auction model	This model allows agents to place bids during auctions conducted by a centralized entity so that the winning agents receive rewards	Chen and Qiu [16], Jayaweera et al. [36], Fu and van der Schaar [37], and Xiao et al. [38]

Internal self-learning model	This model enables an agent to exchange its virtualactions continuously with rewards generated by a simulated internal environment within the agent itself in order to expedite the learning process	Bernardo et al. [27]

Collaborative model	This model enables an agent to collaborate with its neighbor agents and subsequently make local decisions independently in distributed networks. A local decision is part of an optimal joint action, which is comprised of the actions taken by all the agents in a network	Lundén et al. [20] Liu et al. [39]

Competitive model	This model enables an agent to compete with its neighbor agents and subsequently make local decisions independently in worst-case scenarios in the presence of competitor agents, which attempt to minimize the accumulated rewards of the agent	Wang et al. [14]