Research Article

Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality

Figure 4

Average regret of tuned -comb(-greedy, UCB1-tuned}) and of tuned Exp4 for the distributions of (0.9,0.8,0.8,0.8,0.7,0.7,0.7,0.6,0.6,0.6) and (0.9,0.7,0.7,0.7,0.7,0.7,0.7,0.7,0.7,0.7).