Research Article
A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters
Algorithm 1
MSR algorithm (with DQN).
| Initialize replay memory to capacity | | Initialize action-value function Q with random weights | | Initialize target action-value function with weights | | For episode = 1, do | | Receive initial observation and initialize | | For t = 1, do | | With probability select a random action | | otherwise select | | Execute action in an emulator and observe reward | | Set , , and | | Update reward | | Set = , | | Store transition in | | Sample random minibatch of transition from | | Set | | Perform a gradient descent step on with respect to the network parameters | | Every steps reset | | End For | | End For |
|