Count-Based Exploration via Embedded State Space for Deep Reinforcement Learning
Table 1
Atari 2600: average total reward after training for 50 M time steps. Boldface numbers indicate best results. Italic numbers are the best among count-based exploration methods.