Research Article

Adaptive Traffic Signal Control Model on Intersections Based on Deep Reinforcement Learning

Table 1

Parameter settings for the model.

ParameterValueMeaning

Learning rate α0.9Extent to which new information is covered by old information.
Discount factor γ0.9Importance of future rewards.
0.190% of the time the agent chooses the optimal strategy, while 10% of the time randomly explores.
replay_memory size N1000Maximum size of the memory pool.
batch_size32Size of memory that we extract from the pool for learning each time.
Constant c in reward function0.15The upper bound of reward.
in reward function60Threshold value of when reward becomes negative.
Update interval C200Frequency with which the parameters of the target_net updates.
Observe step n100Number of steps to observe before training process.
Training time40000Number of steps the agent trains.
Episode number M200Maximum number of episodes.
Iteration number T200Maximum number of iterations in each episode.