Research Article
Task Offloading with Power Control for Mobile Edge Computing Using Reinforcement Learning-Based Markov Decision Process
Algorithm 1
Q-learning algorithm with
-greedy policy.
| Initialization: | | ; , ; , and ; | | while ( is the is the maximum number of iterations) | | for each UE in | | if exploration | | chooses an action a arbitrarily with probability , ; | | else exploitation | | chooses an action ; | | end if | | perform a and get a reward and a successor state ; | | update the Q-value function according to equation (15); | | ; | | end for | | t t + 1; | | end while |
|