Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

<div>The performance of <svg height="8.98583pt" id="M347" style="vertical-align:-0.2324905pt" version="1.1" viewbox="-0.0498162 -8.75334 38.1759 8.98583" width="38.1759pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M303 0V28C221 34 213 39 213 125V525C213 610 221 616 303 622V650H38V622C120 616 128 610 128 525V125C128 40 120 34 38 28V0H303Z"></path></g><g transform="matrix(.013,0,0,-0.013,4.433,0)"><path d="M631 18C609 24 585 35 559 65C534 91 514 117 478 169C448 214 406 281 389 313C462 346 516 399 516 485C516 545 490 590 449 616C412 641 363 650 290 650H42V622C120 615 128 612 128 527V125C128 40 120 34 38 28V0H300V28C221 34 212 40 212 125V284H244C295 284 312 272 329 244C359 195 395 133 430 84C475 19 516 -3 592 -7C603 -8 615 -8 627 -8L631 18ZM212 316V563C212 591 215 602 223 607C231 613 248 617 277 617C352 617 423 577 423 469C423 415 407 375 368 345C343 324 310 316 260 316H212Z"></path></g><g transform="matrix(.013,0,0,-0.013,12.506,0)"><path d="M495 163C480 117 462 85 444 65C421 39 387 34 332 34C290 34 256 36 236 47C218 57 213 77 213 131V526C213 612 222 616 301 622V650H40V622C122 616 128 611 128 526V126C128 41 120 34 36 28V0H489C498 31 519 126 525 157L495 163Z"></path></g><g transform="matrix(.013,0,0,-0.013,19.5,0)"><path d="M43 650V622C120 616 128 612 128 526V124C128 39 120 33 34 27V0H270C392 0 492 25 567 83C643 141 690 230 690 350C690 444 655 517 605 565C543 625 450 650 323 650H43ZM213 547C213 587 217 598 226 604C236 612 262 617 304 617C371 617 429 604 474 576C554 529 592 439 592 336C592 176 505 36 319 36C246 36 213 55 213 131V547Z"></path></g><g transform="matrix(.013,0,0,-0.013,29.29,0)"><path d="M614 175C564 76 510 21 408 21C256 21 146 149 146 336C146 488 235 629 402 629C510 629 570 586 597 480L626 488C620 541 614 582 606 638C578 643 510 665 429 665C206 665 44 527 44 316C44 157 153 -15 402 -15C474 -15 558 5 586 11C604 45 629 119 643 165L614 175Z"></path></g></svg> under different exploration policy types used in the interactive phase (stage-two) when 100 state-action pairs of 60% optimal demonstrations are given (see Table <a href="../tab2/">2</a>).</div>

Journal of Robotics

fig8

Figure 8

Figure 8: Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach 

Figure 8 | Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach