Research Article

Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network

Table 2

Max evaluation reward achieved during learning.

EnvironmentAlgorithmMax valueS.D

Ant-v2SAC4536.181425.31
TD34360.792081.05
Ours5151.271600.29

HalfCheetah-v2SAC8308.594298.95
TD3-1.650
Ours9158.214414.13

Hopper-v2SAC2905.151388.34
TD32622.661245.89
Ours2812.761311.20

Walker2d-v2SAC3611.991670.66
TD33513.841635.82
Ours4357.332090.74