|
Parameter | Values |
|
Episodes | 20 |
The number of time slots in one episode | 5500 |
State history length () | 16 |
Experience-replay pool size | 1000 |
Experience-replay minibatch size | 64 |
Discount factor | 0.9 |
Learning rate | 0.001 |
The maximal exploration probability | 0.8 |
The minimal exploration probability | 0.001 |
The decay factor | 0.001 |
Target network update frequency | 100 |
|