Mathematical Problems in Engineering

Research Article

U-Model-Based Adaptive Sliding Mode Control Using a Deep Deterministic Policy Gradient

Deep deterministic policy gradient.

(1)	Initialize policy network , critic network and empty replay buffer
(2)	Set target policy network and target critic network , with ,
(3)	repeat
(4)	Observe state s and execute action a = clip , where
(5)	Observe next state s’, reward r, and done signal d to indicate whether s’ is terminal
(6)	Store (s, a, r, s’, d) in the replay buffer
(7)	If s’ is terminal, reset environment state
(8)	if it is time to update then
(9)	for the number of updates do
(10)	Randomly sample a batch of transitions, B = (s, a, r, s’, d) from
(11)	Compute targets
(12)	Update Q-function by one step of gradient descent using
(13)	Update policy by one step of gradient ascent using
(14)	Update target networks with
(15)	end for
(16)	end if
(17)	until convergence