Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality

<div>Average regret of tuned <svg height="9.63795pt" id="M344" style="vertical-align:-3.42779pt" version="1.1" viewbox="-0.0498162 -6.21016 9.32693 9.63795" width="9.32693pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M401 397C401 420 368 448 302 448C245 448 169 416 122 377C62 327 23 254 23 169C23 45 83 -12 181 -12C252 -12 323 29 374 85L358 107C305 62 257 43 210 43C147 43 110 98 110 189V214L313 208L321 256L115 250C132 342 190 405 253 405C291 405 323 389 346 360C356 348 364 348 377 357C392 367 401 384 401 397Z" id="g113-227"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="424" vert-adv-y="424"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,5.406,3.264)"><path d="M329 433H203L239 587L230 596L147 534L123 433H57L30 395L34 388H115L61 129C37 16 59 -12 85 -12C147 -12 222 58 260 98L241 125C212 95 160 62 144 62C132 62 127 71 138 126L192 386L305 394L329 433Z" id="g50-117"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="347" vert-adv-y="347"></glyph.data></g></svg>-comb(<svg height="13.1057pt" id="M345" style="vertical-align:-3.427799pt" version="1.1" viewbox="-0.0498162 -9.6779 14.0458 13.1057" width="14.0458pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.0135,0,0,-0.0135,0,0)"><path d="M293 -169V-141C218 -131 209 -90 209 -44C209 19 222 85 222 152C222 207 198 255 139 269V273C199 286 222 334 222 388C222 454 209 523 209 588C209 632 218 671 293 681V709C234 709 148 695 148 577C147 513 155 438 155 372C155 337 152 291 64 285V256C152 250 155 204 155 169C156 105 148 31 148 -41C148 -157 234 -169 293 -169Z" id="g113-124"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="347" vert-adv-y="347"></glyph.data></g><g transform="matrix(.0135,0,0,-0.0135,4.701,0)"><path d="M401 397C401 420 368 448 302 448C245 448 169 416 122 377C62 327 23 254 23 169C23 45 83 -12 181 -12C252 -12 323 29 374 85L358 107C305 62 257 43 210 43C147 43 110 98 110 189V214L313 208L321 256L115 250C132 342 190 405 253 405C291 405 323 389 346 360C356 348 364 348 377 357C392 367 401 384 401 397Z" id="g113-227"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="424" vert-adv-y="424"></glyph.data></g><g transform="matrix(.0095,0,0,-0.0095,10.107,3.264)"><path d="M329 433H203L239 587L230 596L147 534L123 433H57L30 395L34 388H115L61 129C37 16 59 -12 85 -12C147 -12 222 58 260 98L241 125C212 95 160 62 144 62C132 62 127 71 138 126L192 386L305 394L329 433Z" id="g50-117"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="347" vert-adv-y="347"></glyph.data></g></svg>-greedy, UCB1-tuned}) and of tuned Exp4 for the distributions of (0.9,0.8,0.8,0.8,0.7,0.7,0.7,0.6,0.6,0.6) and (0.9,0.7,0.7,0.7,0.7,0.7,0.7,0.7,0.7,0.7).</div>

Journal of Control Science and Engineering

fig4

Figure 4

Figure 4: Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality