Research Article
Deep Reinforcement Learning-Based UAV Data Collection and Offloading in NOMA-Enabled Marine IoT Systems
Algorithm 1
TD3-based trajectory optimization algorithm (TTO).
1. Initialize critic networks , , and actor network with random parameters , , and . | 2. Initialize target networks , , and . | 3. Initialize experience replay buffer . | 4. for episode =0 todo | 5. Initialize the environment and state , and the terminated flag . | 6. for epoch todo | 7. Select action , , and observe reward and next state . | 8. if the UAV flies beyond the target area then | 9. . Then cancel the UAV’s action and update | , based on the current state. | 10. end if | 11. ifthen | 12. let , and start the data offloading. | 13. else | 14. Let , and continue the data collection. | 15. end if | 16. if then | 17. , and let . | 18. end if | 19. Store transition tuple in . | 20. ifthen | 21. Sample mini-batch of transitions from . | 22. , clip. | 23. . | 24. Update critics: | 25. . | 26. Update the actor policy by the deterministic policy gradient: | 27. . | 28. Update target networks: | 29. . | 30. . | 31. end if | 32. end for | 33. end for | 34. return The UAV trajectory |
|