Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

<div>Local model planning (<svg height="12.7178pt" id="M201" style="vertical-align:-3.42947pt" version="1.1" viewbox="-0.0498162 -9.28833 103.131 12.7178" width="103.131pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M475 507C475 612 440 712 326 712C139 712 23 420 23 215C23 96 58 -12 180 -12C369 -12 475 293 475 507ZM391 522C391 486 387 448 379 394H126C155 538 222 677 310 677C386 677 391 571 391 522ZM373 346C344 193 283 22 189 22C126 22 106 114 106 196C106 243 111 293 118 346H373Z" id="g113-230"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="498" vert-adv-y="498"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,6.475,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z" id="g113-45"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="228" vert-adv-y="228"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,11.618,0)"><path d="M558 587C558 666 497 712 432 712C379 712 330 691 284 650C212 586 178 508 148 348L71 -65C49 -185 35 -229 23 -235L27 -261C49 -259 101 -251 124 -224C131 -216 138 -178 159 -24C171 66 197 200 227 356C264 550 295 668 393 668C443 668 479 632 479 575C479 516 446 458 383 418C361 404 344 398 318 397L296 350C395 338 460 281 460 192C460 110 411 40 300 40C258 40 215 55 192 69L181 51C191 18 222 -16 266 -16C308 -16 351 1 397 26C471 67 545 142 545 221C545 315 486 365 401 395C469 437 558 498 558 587Z" id="g113-224"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="581" vert-adv-y="581"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,19.172,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z" id="g113-45"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="228" vert-adv-y="228"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,24.316,0)"><path d="M485 501C480 550 475 615 475 650H42V622C119 616 127 606 127 525V126C127 44 119 34 41 28V0H311V28C220 34 212 43 212 126V584C212 612 216 615 244 615H318C382 615 404 608 420 587C435 568 445 541 455 497L485 501Z" id="g113-132"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="507" vert-adv-y="507"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,30.911,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z" id="g113-45"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="228" vert-adv-y="228"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,36.054,0)"><path d="M391 364C391 409 353 448 295 448C249 448 198 426 152 393C65 331 23 225 23 139C23 14 96 -12 146 -12C198 -12 280 9 367 101L351 124C300 78 242 48 194 48C129 48 109 107 109 162V191C208 213 391 266 391 364ZM313 350C313 305 268 261 113 223C132 334 187 381 217 398C227 404 244 405 261 405C290 405 313 385 313 350Z" id="g113-102"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="414" vert-adv-y="414"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,41.436,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z" id="g113-45"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="228" vert-adv-y="228"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,46.579,0)"><path d="M524 0V26C466 32 460 36 460 104V297C460 393 411 449 331 449C302 449 276 437 248 419C223 402 201 387 181 372V451C137 432 90 420 42 411V388C96 378 102 374 102 310V104C102 38 97 33 29 26V0H246V26C187 32 181 36 181 104V339C211 365 250 390 290 390C357 390 381 345 381 276V109C381 40 374 32 315 26V0H524Z" id="g121-108"></path><glyph.data ascent="989" descent="-360" horiz-adv-x="547" vert-adv-y="547"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,53.442,0)"><path d="M518 50L491 51C452 54 444 60 444 110V444C429 441 405 437 377 434C348 430 314 427 287 426V403L323 397C356 392 365 386 365 333V99C331 66 293 51 257 51C212 51 169 75 169 164V299C169 366 169 413 172 444C156 441 128 437 101 433C75 430 50 427 29 426V403L57 397C82 391 90 386 90 333V137C90 29 147 -12 214 -12C241 -12 262 -4 291 13S342 48 365 65V-6L371 -12C390 -7 415 1 441 8C468 15 496 21 518 24V50Z" id="g121-115"></path><glyph.data ascent="989" descent="-360" horiz-adv-x="531" vert-adv-y="531"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,60.345,0)"><path d="M797 0V26C739 32 732 36 732 103V296C732 394 682 449 605 449C576 449 550 437 529 423C504 407 475 389 446 366C425 418 382 449 334 449C303 449 279 437 253 421C222 403 201 385 180 371V452C135 432 85 419 41 411V388C99 379 102 374 102 310V103C102 38 93 32 27 26V0H238V26C189 32 180 38 180 103V338C210 363 250 390 289 390C351 390 377 348 377 275V103C377 37 368 32 306 26V0H520V26C465 32 456 38 456 101V296C456 314 455 326 453 338C491 369 529 390 565 390C628 390 653 345 653 274V107C653 36 642 32 583 26V0H797Z" id="g121-107"></path><glyph.data ascent="989" descent="-360" horiz-adv-x="819" vert-adv-y="819"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,70.862,0)"><path d="M152 404V712C115 698 54 683 7 677V654C71 648 73 642 73 579V24C128 -2 179 -12 220 -12C353 -12 471 92 471 238C471 357 381 449 274 449C262 449 249 446 233 439L152 404ZM152 374C170 384 202 393 230 393C313 393 382 326 382 213C382 97 330 26 246 26C194 26 165 62 158 81C154 91 152 101 152 116V374Z" id="g121-96"></path><glyph.data ascent="989" descent="-360" horiz-adv-x="508" vert-adv-y="508"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,77.583,0)"><path d="M380 106C343 72 306 56 265 56C195 56 116 112 115 248C235 252 361 262 377 265C396 269 400 277 400 297C400 374 333 449 250 449H249C198 449 144 421 103 376S37 269 37 201C37 88 109 -12 232 -12C263 -12 332 6 395 84L380 106ZM225 412C281 412 315 364 314 312C314 297 308 292 290 292C232 290 176 289 120 289C135 370 180 412 225 412Z" id="g121-99"></path><glyph.data ascent="989" descent="-360" horiz-adv-x="425" vert-adv-y="425"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,83.108,0)"><path d="M181 342V451C133 431 89 419 40 411V388C98 381 102 377 102 311V104C102 38 95 32 33 26V0H263V26C186 32 181 38 181 104V287C203 343 235 372 261 372C277 372 289 366 304 352C310 346 318 345 330 350C349 359 362 379 362 399C362 422 338 449 304 449C256 449 213 393 183 342H181Z" id="g121-112"></path><glyph.data ascent="989" descent="-360" horiz-adv-x="371" vert-adv-y="371"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,87.931,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z" id="g113-45"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="228" vert-adv-y="228"></glyph.data></g><g transform="matrix(.013,0,0,-0.013,93.074,0)"><path d="M600 480C600 590 528 650 384 650H143L137 622C222 614 225 607 210 531L130 127C113 41 106 36 23 28L17 0H294L300 28C204 36 195 42 212 127L243 284L314 263C327 263 339 263 352 264C465 271 600 337 600 480ZM508 481C508 351 402 304 329 304C289 304 265 311 250 317L295 559C302 594 310 606 323 611C335 616 350 619 367 619C455 619 508 573 508 481Z" id="g113-81"></path><glyph.data ascent="3473" descent="-2876" horiz-adv-x="617" vert-adv-y="617"></glyph.data></g><g transform="matrix(.0091,0,0,-0.0091,99.73,3.132)"><path d="M238 675C244 702 241 710 231 710C220 710 155 682 74 673L70 643H101C147 643 151 635 142 594L39 111C18 11 36 -12 55 -12C89 -12 170 38 223 103L207 129C176 98 130 68 118 68C113 68 108 72 114 102L238 675Z" id="g50-109"></path><glyph.data ascent="3443" descent="-2856" horiz-adv-x="271" vert-adv-y="271"></glyph.data></g></svg>).</div>

Computational Intelligence and Neuroscience

alg3

Algorithm 3

Algorithm 3: Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning