Research Article

Reinforcement Learning for Distributed Energy Efficiency Optimization in Underwater Acoustic Communication Networks

Algorithm 1

Q-learning-based UACNs resource allocation algorithm for node .
Initialization:
(1)Set , .
(2)Initialize .
Repeated Learning: (for each episode)
(3)Looks up the Q-table and selects the state , i.e.,
(4)Execute the -greedy [29] method to select the action
(5)Calculate the reward function based on equation (22).
(6)Calculate the current Q-value function.
(7)Update the Q-table according to equation (17).
(8)Update the state .
(9)Go back to 3 until the state is the final state.