Mathematical Problems in Engineering

Research Article

Deep Q-Network with Predictive State Models in Partially Observable Domains

RPSR-DQN.

	Input: learning rate
	Output: network parameters
(1)	Randomly select actions to generate N trajectories
(2)	Compute the sufficient features of every trajectory , , , , (n denotes the trajectory)
(3)	Establish the recurrent predictive state representation:
(4)	Initialize PSR: two-stage regression
(5)	Use kernel Bayes rule to estimate
(6)	Apply least squares method to formula (2) to compute
(7)	Set to the average of
(8)	Local optimization:
(9)	for i = 1, N do
(10)	Initialize state
(11)	for t = 1, m do
(12)	Predictive observation
(13)	Perform a gradient descent step on
(14)	Update state
(15)	end for
(16)	end for
(17)	Optimization policy network:
(18)	Initialize reactive policy randomly
(19)	for episode = 1, M do
(20)	Initialize state
(21)	for t = 1, T do
(22)	With probability , select , otherwise random
(23)	Execute action in emulator and observe reward and observation
(24)	Set filter
(25)	Store transition () in
(26)	Sample random mini batch of transitions () from
(27)	Set .
(28)	Perform a gradient descent step on
(29)	end for
(30)	end for