Research Article

IoT-Based Reinforcement Learning Using Probabilistic Model for Determining Extensive Exploration through Computational Intelligence for Next-Generation Techniques

Table 3

Cumulative regret of each algorithm in 10,000 episodes.

Variable settingsDQNBDQNBBDQN

K = 10N = 109914.88599.65649.29
N = 209922.029914.601590.03
N = 309912.369910.965203.65

N = 20K = 109922.029914.021590.03
K = 209922.029912.131901.18
K = 309922.021857.491784.71