Research Article
Adaptive Traffic Signal Control Model on Intersections Based on Deep Reinforcement Learning
Table 1
Parameter settings for the model.
| Parameter | Value | Meaning |
| Learning rate α | 0.9 | Extent to which new information is covered by old information. | Discount factor γ | 0.9 | Importance of future rewards. | | 0.1 | 90% of the time the agent chooses the optimal strategy, while 10% of the time randomly explores. | replay_memory size N | 1000 | Maximum size of the memory pool. | batch_size | 32 | Size of memory that we extract from the pool for learning each time. | Constant c in reward function | 0.15 | The upper bound of reward. | in reward function | 60 | Threshold value of when reward becomes negative. | Update interval C | 200 | Frequency with which the parameters of the target_net updates. | Observe step n | 100 | Number of steps to observe before training process. | Training time | 40000 | Number of steps the agent trains. | Episode number M | 200 | Maximum number of episodes. | Iteration number T | 200 | Maximum number of iterations in each episode. |
|
|