Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

<div>(a) The relation between sparsity level of demonstrations in stage-one and the number of feedbacks needed to reach “<svg height="8.70527pt" id="M336" style="vertical-align:-0.1802902pt" version="1.1" viewbox="-0.0498162 -8.52498 16.528 8.70527" width="16.528pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M517 162C503 123 484 88 467 68C445 42 417 34 341 34C291 34 256 34 237 47C219 59 213 81 213 128V317H308C395 317 402 311 415 240H444V431H415C403 364 398 356 307 356H213V584C213 613 215 616 246 616H322C394 616 421 609 435 587C448 566 458 544 467 502L496 506C493 557 488 625 488 650H42V622C120 616 128 612 128 523V125C128 43 120 35 29 28V0H511C520 31 540 125 546 158L517 162Z"></path></g><g transform="matrix(.013,0,0,-0.013,7.227,0)"><path d="M687 650H462V622C543 612 549 605 530 547C498 447 422 252 372 126H370C302 298 229 492 204 563C188 607 191 615 262 622V650H17V622C77 616 93 608 122 534C180 389 262 172 329 -11H360C436 196 541 450 568 516C606 605 619 614 687 622V650Z"></path></g></svg>” score equal to 1.17 using <span class="nowrap"><svg height="8.98583pt" id="M337" style="vertical-align:-0.2324905pt" version="1.1" viewbox="-0.0498162 -8.75334 38.1759 8.98583" width="38.1759pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M303 0V28C221 34 213 39 213 125V525C213 610 221 616 303 622V650H38V622C120 616 128 610 128 525V125C128 40 120 34 38 28V0H303Z"></path></g><g transform="matrix(.013,0,0,-0.013,4.433,0)"><path d="M631 18C609 24 585 35 559 65C534 91 514 117 478 169C448 214 406 281 389 313C462 346 516 399 516 485C516 545 490 590 449 616C412 641 363 650 290 650H42V622C120 615 128 612 128 527V125C128 40 120 34 38 28V0H300V28C221 34 212 40 212 125V284H244C295 284 312 272 329 244C359 195 395 133 430 84C475 19 516 -3 592 -7C603 -8 615 -8 627 -8L631 18ZM212 316V563C212 591 215 602 223 607C231 613 248 617 277 617C352 617 423 577 423 469C423 415 407 375 368 345C343 324 310 316 260 316H212Z"></path></g><g transform="matrix(.013,0,0,-0.013,12.506,0)"><path d="M495 163C480 117 462 85 444 65C421 39 387 34 332 34C290 34 256 36 236 47C218 57 213 77 213 131V526C213 612 222 616 301 622V650H40V622C122 616 128 611 128 526V126C128 41 120 34 36 28V0H489C498 31 519 126 525 157L495 163Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.5,0)"><path d="M43 650V622C120 616 128 612 128 526V124C128 39 120 33 34 27V0H270C392 0 492 25 567 83C643 141 690 230 690 350C690 444 655 517 605 565C543 625 450 650 323 650H43ZM213 547C213 587 217 598 226 604C236 612 262 617 304 617C371 617 429 604 474 576C554 529 592 439 592 336C592 176 505 36 319 36C246 36 213 55 213 131V547Z"></path></g><g transform="matrix(.013,0,0,-0.013,29.29,0)"><path d="M614 175C564 76 510 21 408 21C256 21 146 149 146 336C146 488 235 629 402 629C510 629 570 586 597 480L626 488C620 541 614 582 606 638C578 643 510 665 429 665C206 665 44 527 44 316C44 157 153 -15 402 -15C474 -15 558 5 586 11C604 45 629 119 643 165L614 175Z"></path></g></svg>.</span> (b)<svg height="8.98583pt" id="M338" style="vertical-align:-0.2324905pt" version="1.1" viewbox="-0.0498162 -8.75334 38.1759 8.98583" width="38.1759pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M303 0V28C221 34 213 39 213 125V525C213 610 221 616 303 622V650H38V622C120 616 128 610 128 525V125C128 40 120 34 38 28V0H303Z"></path></g><g transform="matrix(.013,0,0,-0.013,4.433,0)"><path d="M631 18C609 24 585 35 559 65C534 91 514 117 478 169C448 214 406 281 389 313C462 346 516 399 516 485C516 545 490 590 449 616C412 641 363 650 290 650H42V622C120 615 128 612 128 527V125C128 40 120 34 38 28V0H300V28C221 34 212 40 212 125V284H244C295 284 312 272 329 244C359 195 395 133 430 84C475 19 516 -3 592 -7C603 -8 615 -8 627 -8L631 18ZM212 316V563C212 591 215 602 223 607C231 613 248 617 277 617C352 617 423 577 423 469C423 415 407 375 368 345C343 324 310 316 260 316H212Z"></path></g><g transform="matrix(.013,0,0,-0.013,12.506,0)"><path d="M495 163C480 117 462 85 444 65C421 39 387 34 332 34C290 34 256 36 236 47C218 57 213 77 213 131V526C213 612 222 616 301 622V650H40V622C122 616 128 611 128 526V126C128 41 120 34 36 28V0H489C498 31 519 126 525 157L495 163Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.5,0)"><path d="M43 650V622C120 616 128 612 128 526V124C128 39 120 33 34 27V0H270C392 0 492 25 567 83C643 141 690 230 690 350C690 444 655 517 605 565C543 625 450 650 323 650H43ZM213 547C213 587 217 598 226 604C236 612 262 617 304 617C371 617 429 604 474 576C554 529 592 439 592 336C592 176 505 36 319 36C246 36 213 55 213 131V547Z"></path></g><g transform="matrix(.013,0,0,-0.013,29.29,0)"><path d="M614 175C564 76 510 21 408 21C256 21 146 149 146 336C146 488 235 629 402 629C510 629 570 586 597 480L626 488C620 541 614 582 606 638C578 643 510 665 429 665C206 665 44 527 44 316C44 157 153 -15 402 -15C474 -15 558 5 586 11C604 45 629 119 643 165L614 175Z"></path></g></svg>’s stage-two performance in face of optimal and different demonstration sparsity levels in stage-one (point “B1”, …, “B10” in Figure <a href="../fig3/">3</a>) and the number of evaluative feedbacks. The black curve has no initial demonstration (point “C” in Figure <a href="../fig3/">3</a>).</div>

Journal of Robotics

fig6

Figure 6

Figure 6: Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach 

Figure 6 | Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach