Wireless Communications and Mobile Computing

Research Article

Deep Reinforcement Learning-Based UAV Data Collection and Offloading in NOMA-Enabled Marine IoT Systems

TD3-based trajectory optimization algorithm (TTO).

1. Initialize critic networks , , and actor network with random parameters , , and .
2. Initialize target networks , , and .
3. Initialize experience replay buffer .
4. for episode =0 todo
5. Initialize the environment and state , and the terminated flag .
6. for epoch todo
7. Select action , , and observe reward and next state .
8. if the UAV flies beyond the target area then
9. . Then cancel the UAV’s action and update
, based on the current state.
10. end if
11. ifthen
12. let , and start the data offloading.
13. else
14. Let , and continue the data collection.
15. end if
16. if then
17. , and let .
18. end if
19. Store transition tuple in .
20. ifthen
21. Sample mini-batch of transitions from .
22. , clip.
23. .
24. Update critics:
25. .
26. Update the actor policy by the deterministic policy gradient:
27. .
28. Update target networks:
29. .
30. .
31. end if
32. end for
33. end for
34. return The UAV trajectory