Research Article

Deep Reinforcement Learning-Based UAV Data Collection and Offloading in NOMA-Enabled Marine IoT Systems

Algorithm 1

TD3-based trajectory optimization algorithm (TTO).
1. Initialize critic networks , , and actor network with random parameters , , and .
2. Initialize target networks , , and .
3. Initialize experience replay buffer .
4. for episode =0 todo
5.  Initialize the environment and state , and the terminated flag .
6.  for epoch todo
7.   Select action , , and observe reward and next state .
8.   if the UAV flies beyond the target area then
9.    . Then cancel the UAV’s action and update
     , based on the current state.
10.   end if
11.   ifthen
12.    let , and start the data offloading.
13.   else
14.    Let , and continue the data collection.
15.   end if
16.   if then
17.    , and let .
18.   end if
19.   Store transition tuple in .
20.   ifthen
21.    Sample mini-batch of transitions from .
22.    , clip.
23.    .
24.    Update critics:
25.    .
26.    Update the actor policy by the deterministic policy gradient:
27.    .
28.    Update target networks:
29.    .
30.    .
31.   end if
32.  end for
33. end for
34. return The UAV trajectory