Research Article

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Algorithm 1

MSR algorithm (with DQN).
Initialize replay memory to capacity
Initialize action-value function Q with random weights
Initialize target action-value function with weights
For episode = 1, do
 Receive initial observation and initialize
For t = 1, do
  With probability select a random action
  otherwise select
  Execute action in an emulator and observe reward
  Set , , and
  Update reward
  Set  = ,
  Store transition in
  Sample random minibatch of transition from
  Set
  Perform a gradient descent step on with respect to the network parameters
  Every steps reset
End For
End For