(a) |
| -learning parameters | Value |
| Learning rate | 0.2 | Discount factor | 0.995 | Number of states | 46656 | Number of actions | 16 | Initial exploration probability (-greedy) | 1 | Minimum exploration probability (-greedy) | 0.05 | Exploration decay rate (-greedy) | 0.9995 | (-learning) | 0.5 | Number of planning (Dyan -learning) | 100 |
|
|
(b) |
| Factorization parameters | Value |
| Latent factor | 8 | Max iterations | 500 | Optimization algorithm | Stochastic gradient descent | Linear regularization | | Regularization | | Factorization threshold | |
|
|
(c) |
| Network parameters | Value |
| SINR target | 0 dB | UE movement speed | 2 km/h |
|
|