Research Article
U-Model-Based Adaptive Sliding Mode Control Using a Deep Deterministic Policy Gradient
Algorithm 1
Deep deterministic policy gradient.
(1) | Initialize policy network , critic network and empty replay buffer | (2) | Set target policy network and target critic network , with , | (3) | repeat | (4) | Observe state s and execute action a = clip , where | (5) | Observe next state s’, reward r, and done signal d to indicate whether s’ is terminal | (6) | Store (s, a, r, s’, d) in the replay buffer | (7) | If s’ is terminal, reset environment state | (8) | if it is time to update then | (9) | for the number of updates do | (10) | Randomly sample a batch of transitions, B = (s, a, r, s’, d) from | (11) | Compute targets | (12) | Update Q-function by one step of gradient descent using | (13) | Update policy by one step of gradient ascent using | (14) | Update target networks with | (15) | end for | (16) | end if | (17) | until convergence |
|