Mathematical Problems in Engineering

Research Article

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

MSR algorithm (with DQN).

	Initialize replay memory to capacity
	Initialize action-value function Q with random weights
	Initialize target action-value function with weights
	For episode = 1, do
	Receive initial observation and initialize
	For t = 1, do
	With probability select a random action
	otherwise select
	Execute action in an emulator and observe reward
	Set , , and
	Update reward
	Set = ,
	Store transition in
	Sample random minibatch of transition from
	Set
	Perform a gradient descent step on with respect to the network parameters
	Every steps reset
	End For
	End For