6: Add noise on with -greedy on and Gaussian distribution with mean .
7: Get action with exploration variance .
8: Take action , observe reward and next state .
9: Store transition in .
10: Sample a random batch of transitions from .
11: Set .
12: Update the critic with minimizing the loss by Equation (18).
13: According to the loss , update the actor through the continuous part training phase and the discrete part training phase by Equations (15) and (16)