Research Article

Reinforcement Learning-Based Routing Algorithm in Satellite-Terrestrial Integrated Networks

Algorithm 2

Q-learning-based routing algorithm (QLRA).
Input: start_node, end_node, graph, R, users_req_list.
Output: the optimal paths.
1. Initialize Q-table, α, γ, ε, Episodes=M.
2. for user_req in users_req_list:
3. flag = PCA (start_node, end_node, graph).
4. if flag ==1:
5.  for i=1 to Episodes:
6.   current_state = start_node.
7.   while current_state!= end_node:
8.    Select the action a based on Eq. (27).
9.    Get the corresponding reward value generated by each parameter according to Eq. (18), (19), (20) and (21).
10.    Obtain the total reward r based on Eq. (22).
11.    The agent move to the next state s'.
12.    Update the Q-table based on Eq. (26).
13.    Let current_state = s’.
14.   end while
15.  end for
16. Select the optimal path path from the converged Q-table based on Eq. (25).
17. Update the reward matrix R based on the consumption of link resources.
18. Update the structure of satellite network graph based on the consumption of link resources.
19. end if
20. else:
21.   There is no path from start_node to end_node.
22   Break
23. end else
24.end for