Dynamical Motor Control Learned with Deep Deterministic Policy Gradient

<div>The HF learning of the RNN actor with the deterministic policy gradient. The RNN actor was unfolded in time to show the updates of the neural activity <span class="nowrap"><svg height="11.5564pt" id="M82" style="vertical-align:-2.26807pt" version="1.1" viewbox="-0.0498162 -9.28833 20.1295 11.5564" width="20.1295pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M495 86L479 114C446 82 419 66 409 66C401 66 401 72 406 97C420 166 436 231 453 297C489 435 454 448 428 448C406 448 384 439 354 422C305 394 222 327 161 247H159L183 345C200 415 194 448 173 448C143 448 82 410 23 351L38 325C64 349 95 371 105 371C111 371 116 365 109 336L25 -4L31 -12C50 -4 77 3 107 9C119 69 132 122 145 168C197 254 321 381 370 381C387 381 393 374 378 305L329 95C309 17 320 -12 345 -12C372 -12 430 19 495 86Z" id="g113-111"></path></g><g transform="matrix(.013,0,0,-0.013,6.526,0)"><path d="M300 -147C201 -63 143 98 143 270S200 602 300 686L282 710C136 610 70 450 70 271V270C70 89 136 -72 282 -170L300 -147Z" id="g113-41"></path></g><g transform="matrix(.013,0,0,-0.013,11.024,0)"><path d="M324 430H196L233 583L223 592L145 529L120 430H54L29 396L31 388H111L56 126C33 15 54 -12 77 -12C137 -12 214 57 250 95L233 119C208 92 155 59 138 59C126 59 120 70 131 125L186 390L298 394L324 430Z" id="g113-117"></path></g><g transform="matrix(.013,0,0,-0.013,15.458,0)"><path d="M275 270C275 450 212 609 64 710L45 686C145 604 203 442 203 270S147 -63 45 -147L64 -170C213 -68 275 89 275 270Z" id="g113-42"></path></g></svg>,</span> action <span class="nowrap"><svg height="11.5564pt" id="M83" style="vertical-align:-2.26807pt" version="1.1" viewbox="-0.0498162 -9.28833 20.1817 11.5564" width="20.1817pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M483 97L471 123C436 91 401 65 392 65C388 65 384 74 390 106C414 239 444 378 457 429L455 433C444 433 429 436 416 439C392 444 368 448 344 448C281 448 204 415 152 376C71 315 23 205 23 103C23 21 57 -12 85 -12C114 -12 149 6 185 34C231 70 285 119 329 183H331L309 81C292 0 308 -12 326 -12C350 -12 421 24 483 97ZM374 387C370 363 356 291 345 261C315 193 181 50 139 50C124 50 110 71 110 118C110 224 153 331 218 379C238 394 271 402 301 402C329 402 359 394 374 387Z" id="g113-98"></path></g><g transform="matrix(.013,0,0,-0.013,6.58,0)"><path d="M300 -147C201 -63 143 98 143 270S200 602 300 686L282 710C136 610 70 450 70 271V270C70 89 136 -72 282 -170L300 -147Z" id="g113-41"></path></g><g transform="matrix(.013,0,0,-0.013,11.078,0)"><path d="M324 430H196L233 583L223 592L145 529L120 430H54L29 396L31 388H111L56 126C33 15 54 -12 77 -12C137 -12 214 57 250 95L233 119C208 92 155 59 138 59C126 59 120 70 131 125L186 390L298 394L324 430Z" id="g113-117"></path></g><g transform="matrix(.013,0,0,-0.013,15.511,0)"><path d="M275 270C275 450 212 609 64 710L45 686C145 604 203 442 203 270S147 -63 45 -147L64 -170C213 -68 275 89 275 270Z" id="g113-42"></path></g></svg>,</span> and gradient propagation through time. The gradients of network weights <svg height="9.39034pt" id="M84" style="vertical-align:-3.42943pt" version="1.1" viewbox="-0.0498162 -5.96091 16.6751 9.39034" width="16.6751pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M546 430L539 434C529 434 505 438 495 440C473 444 450 448 430 448C352 448 265 412 213 366C145 306 96 203 96 103C96 22 135 -12 160 -12C190 -12 238 14 262 32C310 68 368 120 411 184H413C403 117 396 75 384 21C353 -118 325 -158 291 -184C270 -200 241 -205 208 -205C133 -205 90 -164 74 -110C70 -98 58 -100 49 -107C34 -119 23 -140 23 -155C23 -190 74 -261 166 -261C219 -261 280 -233 314 -208C383 -157 446 -79 470 81C491 223 529 388 546 430ZM456 386C452 357 433 283 420 252C402 216 366 174 325 129C288 88 239 56 212 56C192 56 182 77 182 120C182 165 199 242 226 292C256 348 281 377 311 389C327 395 353 402 375 402C408 402 436 394 456 386Z" id="g113-104"></path></g><g transform="matrix(.0091,0,0,-0.0091,7.072,3.132)"><path d="M1013 650H775L768 618L793 615C853 608 859 601 837 545C792 433 708 266 639 131H637L537 639H503L251 132H249L190 552C183 602 190 607 235 614L262 618L269 650H25L18 618C92 610 95 608 107 525L184 -9H221L475 494H479L576 -9H613L884 526C928 607 934 609 1005 618L1013 650Z" id="g50-88"></path></g></svg> were acquired with gradients of the value function <svg height="11.5564pt" id="M85" style="vertical-align:-2.26807pt" version="1.1" viewbox="-0.0498162 -9.28833 47.8711 11.5564" width="47.8711pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M471 379C471 433 467 511 427 578C379 658 305 712 208 712C154 712 98 699 70 684C70 668 95 634 111 634C141 651 180 665 234 665C306 665 350 627 372 575C391 530 393 480 393 442V415C371 436 344 448 313 448C280 448 227 437 176 411C100 372 23 279 23 148C23 67 64 -12 180 -12C293 -12 360 46 407 129C450 205 471 310 471 379ZM382 335C382 142 289 50 238 35C226 31 213 28 200 28C139 28 110 91 110 164C110 310 191 382 223 395C245 404 259 409 280 409C305 409 332 404 359 388C379 376 382 362 382 335Z" id="g113-222"></path></g><g transform="matrix(.013,0,0,-0.013,6.424,0)"><path d="M699 368C699 549 574 666 407 666C186 666 23 488 23 277C23 113 129 -3 288 -13L307 -26C431 -111 501 -139 533 -147C559 -154 613 -163 658 -164L666 -141C597 -111 507 -66 430 -11L416 -1C580 42 699 190 699 368ZM601 371C601 227 518 54 381 22L354 40L278 24C175 47 120 145 120 269C120 451 235 631 398 631C540 631 601 521 601 371Z" id="g113-82"></path></g><g transform="matrix(.013,0,0,-0.013,15.81,0)"><path d="M368 703H309L44 -163H104L368 703Z" id="g113-48"></path></g><g transform="matrix(.013,0,0,-0.013,21.172,0)"><path d="M471 379C471 433 467 511 427 578C379 658 305 712 208 712C154 712 98 699 70 684C70 668 95 634 111 634C141 651 180 665 234 665C306 665 350 627 372 575C391 530 393 480 393 442V415C371 436 344 448 313 448C280 448 227 437 176 411C100 372 23 279 23 148C23 67 64 -12 180 -12C293 -12 360 46 407 129C450 205 471 310 471 379ZM382 335C382 142 289 50 238 35C226 31 213 28 200 28C139 28 110 91 110 164C110 310 191 382 223 395C245 404 259 409 280 409C305 409 332 404 359 388C379 376 382 362 382 335Z" id="g113-222"></path></g><g transform="matrix(.013,0,0,-0.013,27.596,0)"><path d="M483 97L471 123C436 91 401 65 392 65C388 65 384 74 390 106C414 239 444 378 457 429L455 433C444 433 429 436 416 439C392 444 368 448 344 448C281 448 204 415 152 376C71 315 23 205 23 103C23 21 57 -12 85 -12C114 -12 149 6 185 34C231 70 285 119 329 183H331L309 81C292 0 308 -12 326 -12C350 -12 421 24 483 97ZM374 387C370 363 356 291 345 261C315 193 181 50 139 50C124 50 110 71 110 118C110 224 153 331 218 379C238 394 271 402 301 402C329 402 359 394 374 387Z" id="g113-98"></path></g><g transform="matrix(.013,0,0,-0.013,34.176,0)"><path d="M300 -147C201 -63 143 98 143 270S200 602 300 686L282 710C136 610 70 450 70 271V270C70 89 136 -72 282 -170L300 -147Z" id="g113-41"></path></g><g transform="matrix(.013,0,0,-0.013,38.674,0)"><path d="M324 430H196L233 583L223 592L145 529L120 430H54L29 396L31 388H111L56 126C33 15 54 -12 77 -12C137 -12 214 57 250 95L233 119C208 92 155 59 138 59C126 59 120 70 131 125L186 390L298 394L324 430Z" id="g113-117"></path></g><g transform="matrix(.013,0,0,-0.013,43.107,0)"><path d="M275 270C275 450 212 609 64 710L45 686C145 604 203 442 203 270S147 -63 45 -147L64 -170C213 -68 275 89 275 270Z" id="g113-42"></path></g></svg> propagated from the critic in DDPG (see (<a href="https://static.hindawi.com/articles/cin/volume-2018/8535429/figures/#EEq6">6</a>)). <span class="nowrap"><svg height="11.927pt" id="M86" style="vertical-align:-3.291101pt" version="1.1" viewbox="-0.0498162 -8.6359 13.9727 11.927" width="13.9727pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M962 650H739L732 622L760 619C812 613 819 604 798 552C760 457 671 267 606 131H604L511 638H480L237 131H233L183 554C177 607 183 614 226 619L248 622L257 650H24L17 622C88 615 92 611 103 524L173 -11H203L450 491H453L543 -11H575L839 529C882 609 886 613 953 622L962 650Z" id="g113-88"></path></g><g transform="matrix(.0091,0,0,-0.0091,10.777,3.132)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z" id="g50-106"></path></g></svg>,</span> <span class="nowrap"><svg height="11.927pt" id="M87" style="vertical-align:-3.291101pt" version="1.1" viewbox="-0.0498162 -8.6359 15.3154 11.927" width="15.3154pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M962 650H739L732 622L760 619C812 613 819 604 798 552C760 457 671 267 606 131H604L511 638H480L237 131H233L183 554C177 607 183 614 226 619L248 622L257 650H24L17 622C88 615 92 611 103 524L173 -11H203L450 491H453L543 -11H575L839 529C882 609 886 613 953 622L962 650Z" id="g113-88"></path></g><g transform="matrix(.0091,0,0,-0.0091,10.777,3.132)"><path d="M397 380C406 395 404 411 396 425S369 451 350 451C302 451 239 372 192 294H189L199 338C214 407 207 451 180 451C152 451 83 405 30 345L48 318C87 354 117 372 125 372S135 362 127 324L55 -5L61 -12C87 -5 117 3 139 5C154 87 168 162 179 207C198 250 240 310 260 332C281 355 297 366 307 366C321 366 333 360 347 346C351 342 360 342 369 348C379 355 389 366 397 380Z" id="g50-115"></path></g></svg>,</span> and <svg height="11.927pt" id="M88" style="vertical-align:-3.291101pt" version="1.1" viewbox="-0.0498162 -8.6359 15.763 11.927" width="15.763pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M962 650H739L732 622L760 619C812 613 819 604 798 552C760 457 671 267 606 131H604L511 638H480L237 131H233L183 554C177 607 183 614 226 619L248 622L257 650H24L17 622C88 615 92 611 103 524L173 -11H203L450 491H453L543 -11H575L839 529C882 609 886 613 953 622L962 650Z" id="g113-88"></path></g><g transform="matrix(.0091,0,0,-0.0091,10.777,3.132)"><path d="M452 282C452 397 389 451 302 451C268 451 227 441 188 422C95 377 24 281 24 154C24 65 70 -12 173 -12C240 -12 297 16 342 52C418 113 452 201 452 282ZM359 282C359 163 308 66 247 39C237 35 225 32 212 32C151 32 114 79 114 157C114 301 180 380 223 398C240 405 250 408 265 408C315 408 359 369 359 282Z" id="g50-112"></path></g></svg> denoted the input, the recurrent, and the output weights, respectively. Note that only the task information (the start and target state) was fed to the network at the initial time step.</div>

Computational Intelligence and Neuroscience

fig3

Figure 3

Figure 3: Dynamical Motor Control Learned with Deep Deterministic Policy Gradient