Wireless Communications and Mobile Computing

Research Article

Reinforcement Learning-Based Routing Algorithm in Satellite-Terrestrial Integrated Networks

Q-learning-based routing algorithm (QLRA).

Input: start_node, end_node, graph, R, users_req_list.
Output: the optimal paths.
1. Initialize Q-table, α, γ, ε, Episodes=M.
2. for user_req in users_req_list:
3. flag = PCA (start_node, end_node, graph).
4. if flag ==1:
5. for i=1 to Episodes:
6. current_state = start_node.
7. while current_state!= end_node:
8. Select the action a based on Eq. (27).
9. Get the corresponding reward value generated by each parameter according to Eq. (18), (19), (20) and (21).
10. Obtain the total reward r based on Eq. (22).
11. The agent move to the next state s'.
12. Update the Q-table based on Eq. (26).
13. Let current_state = s’.
14. end while
15. end for
16. Select the optimal path path from the converged Q-table based on Eq. (25).
17. Update the reward matrix R based on the consumption of link resources.
18. Update the structure of satellite network graph based on the consumption of link resources.
19. end if
20. else:
21. There is no path from start_node to end_node.
22 Break
23. end else
24.end for