Research Article

Task Offloading with Power Control for Mobile Edge Computing Using Reinforcement Learning-Based Markov Decision Process

Algorithm 1

Q-learning algorithm with -greedy policy.
Initialization:
; , ; , and ;
while ( is the is the maximum number of iterations)
for each UE in
  if exploration
   chooses an action a arbitrarily with probability , ;
  else exploitation
   chooses an action ;
  end if
  perform a and get a reward and a successor state ;
  update the Q-value function according to equation (15);
  ;
end for
tt + 1;
end while