Research Article

Optimal Wireless Information and Power Transfer Using Deep Q-Network

Algorithm 1

Deep Q-network algorithm training process.
(1)Randomly generate the weight parameter for the . The clones the weight parameters . . . . . .
(2)At the beginning of the time slot, randomly generate a probability .
and :
we choose the action as
:
Randomly choose the action from action set .
The transmitter transmits with the selected beam pattern.
(3)Throughout the whole time slot, the RF energy is accumulated in the harvesters’ energy buffer, as , . At the end of each time slot, each harvester feedbacks the energy level to the transmitter and the system state is updated to .
(4). . If reaches the maximum of experience pool, remains constant, , otherwise, . . . .
(5)After experience pool accumulates enough data, from experiences, randomly select experiences to train the neural network . Backpropagation method is applied to minimize the loss function . Clone the weight parameters from to after several time intervals.
(6):
. . . . If , algorithm terminates; otherwise, go back to step 2.
:
go to step 3.