Research Article

The Implementation of Deep Reinforcement Learning in E-Learning and Distance Learning: Remote Practical Work

Algorithm 2

The DQN algorithm and experience replay.
Initialize memory M to capacity C
Initialize value function Q (action) with random weights
For episode = 1, N do (for each episode N)
 Initialize state S1 = (X1) (starting computer screen pixels) at the beginning of each episode
  //Pre-process and feed the computer screen to our DQN,
  //which will regress the Q-values of all possible actions in the state.
 For t = 1, T do
  Chose an action using the epsilon-greedy policy.
   //With the prospect epsilon, we chose a random action a and with probability 1-epsilon,
   //chose an action that has a maximum Q-value, such as
  Execute action Atin emulator and observe reword and image Xt+1
  Set St+1 = St, At, Xt+1and pre-process
  Store the transition in M
  Sample random mini-batch of the transitions from M
  figure the loss function
  //which is just the squared difference between goal Q and predicted Q.
  Do gradient descent with respect to our actual network parameters in order to reduce the loss function.
  After every “” iteration, copy our actual network weights to the goal network weights.
 End for
End for