Research Article
Service Migration Policy Optimization considering User Mobility for E-Healthcare Applications
Algorithm 1
SMPI (policy, env, max_step = 100, tol = 1e-6).
(1) | env = Env ()//Environment Initialization: MDP mode state s, action a, and reward r | (2) | initialize random policy | (3) | for (i = 1, i ++ , i < max_iter) | (4) | V = value_evaluate (policy, env, max_step, tol)//evaluate the random policy | (5) | policy = policy_improvement (env, V)//improve the policy | (6) | mean_values.append (np.mean (V))//store mean value of the policy | (7) | run_times.append (time.time ()-st)//store run time | (8) | if last_V is not None and np.sum (np.abs (V-last_V)) < tol: | (9) | break | (10) | last_V = V//the value function update is small enough, it will stop. | End for | (11) | return V, mean_values, policy, run_times# return state value, mean value, the optimal policy and run time |
|