Mobile Information Systems

Research Article

Energy-Efficient UAV Trajectory Design with Information Freshness Constraint via Deep Reinforcement Learning

Algorithm 1 Deep Q-network-based trajectory design scheme for energy-efficiency maximization.

(1)	Initialize learning parameters , memory size, batch size, the maximal episode , observed time , and = 0;
(2)	fordo
(3)	Initialize the environment and state .
(4)	fordo
(5)	If random() , select an action using (20). Otherwise, select an random action from the action set .
(6)	Execute the action , compute EE, rest energy, and AoI, and obtain the next position to form the next state . According to (18), compute the reward. Store into the experience-reply memory.
(7)	If = = 0, duplicate the estimate neural network to target neural network.
(8)	Train the neural network based on loss function in (23) to optimize the parameter , .
(9)	end for
(10)	end for