Optimal Wireless Information and Power Transfer Using Deep Q-Network
Algorithm 1
Deep Q-network algorithm training process.
(1)
Randomly generate the weight parameter for the . The clones the weight parameters ......
(2)
At the beginning of the time slot, randomly generate a probability .
and :
we choose the action as
:
Randomly choose the action from action set .
The transmitter transmits with the selected beam pattern.
(3)
Throughout the whole time slot, the RF energy is accumulated in the harvesters’ energy buffer, as ,. At the end of each time slot, each harvester feedbacks the energy level to the transmitter and the system state is updated to .
(4)
.. If reaches the maximum of experience pool, remains constant, , otherwise, ....
(5)
After experience pool accumulates enough data, from experiences, randomly select experiences to train the neural network . Backpropagation method is applied to minimize the loss function . Clone the weight parameters from to after several time intervals.
(6)
:
.... If , algorithm terminates; otherwise, go back to step 2.