Research Article
LEO Satellite Channel Allocation Scheme Based on Reinforcement Learning
| Initialize system parameters |
| 1 | Preallocation: Assign M channel to each beam | 2 | for Business request time t = 1 : T | 3 | if Resource is rich; recycle surplus resources | 4 | else resource is poor:Dynamic allocation | 5 | Allocate resources from resource pool | 6 | initialize parameter, learning rate discount factor , initial explore probability , Q table | 7 | Reconstruct state based on business request | 8 | for Episode = 1:max_episode | 9 | while ( is terminal state) | 10 | Confirm initial state | 11 | Update explore probability | 12 | Choose best or Choose randomly | 13 | Execute action, get reward | 14 | Update Q table | 15 | Jump to next state | 16 | End | 17 | End of training, output Q table | 18 | Choose best strategy according to Q table | 19 | Channel allocation | 20 | End | 21 | End |
|
|