Input: start_node, end_node, graph, R, users_req_list. |
Output: the optimal paths. |
1. Initialize Q-table, α, γ, ε, Episodes=M. |
2. for user_req in users_req_list: |
3. flag = PCA (start_node, end_node, graph). |
4. if flag ==1: |
5. for i=1 to Episodes: |
6. current_state = start_node. |
7. while current_state!= end_node: |
8. Select the action a based on Eq. (27). |
9. Get the corresponding reward value generated by each parameter according to Eq. (18), (19), (20) and (21). |
10. Obtain the total reward r based on Eq. (22). |
11. The agent move to the next state s'. |
12. Update the Q-table based on Eq. (26). |
13. Let current_state = s’. |
14. end while |
15. end for |
16. Select the optimal path path from the converged Q-table based on Eq. (25). |
17. Update the reward matrix R based on the consumption of link resources. |
18. Update the structure of satellite network graph based on the consumption of link resources. |
19. end if |
20. else: |
21. There is no path from start_node to end_node. |
22 Break |
23. end else |
24.end for |