Research Article
Reinforcement Learning for Distributed Energy Efficiency Optimization in Underwater Acoustic Communication Networks
Algorithm 1
Q-learning-based UACNs resource allocation algorithm for node
. | Initialization: | (1) | Set , . | (2) | Initialize . | | Repeated Learning: (for each episode) | (3) | Looks up the Q-table and selects the state , i.e., | | | (4) | Execute the -greedy [29] method to select the action | | | (5) | Calculate the reward function based on equation (22). | (6) | Calculate the current Q-value function. | (7) | Update the Q-table according to equation (17). | (8) | Update the state . | (9) | Go back to 3 until the state is the final state. |
|