Research Article

Service Migration Policy Optimization considering User Mobility for E-Healthcare Applications

Algorithm 3

Policy_improvement (env, V).
(1) Initialize policy
(2)  for all s in S:
(3)    Initialize qs
(4)    for a in range (N_ACTIONS):
(5)     n_s = env.P [s, a]
(6)     r = env.R [s, a]
(7)     qs [a] = r + gamma V [n_s [0], n_s [1]]
(8)    p = (np.abs (qs − np.max (qs)) < 1e-6) # greedy strategy
(9)    p = np.array (p, dtype = np.float32)/np.sum (p) #convert to float type and normalization
(10)     policy [i, j] = p
(11)  return policy