Research Article

Energy-Efficient UAV Trajectory Design with Information Freshness Constraint via Deep Reinforcement Learning

Algorithm 1 Deep Q-network-based trajectory design scheme for energy-efficiency maximization.
(1)Initialize learning parameters , memory size, batch size, the maximal episode , observed time , and  = 0;
(2)fordo
(3)Initialize the environment and state .
(4)fordo
(5)If random() , select an action using (20). Otherwise, select an random action from the action set .
(6)Execute the action , compute EE, rest energy, and AoI, and obtain the next position to form the next state . According to (18), compute the reward. Store into the experience-reply memory.
(7)If  = = 0, duplicate the estimate neural network to target neural network.
(8)Train the neural network based on loss function in (23) to optimize the parameter , .
(9)end for
(10)end for