Research Article

Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Table 4

Results of the learned navigation experiment by the E-puck robot in two different environments, with interaction steps with the environment before the learner policy is updated. Here, in experiments 1 and 3, demonstrations and feedbacks are provided in the same environment, and then, the performance of the learned reward function is examined in the second environment. Experiment 2 is done by providing sparse demonstrations in one environment and feedbacks in another.

ExperimentDemonstrationEnvironment 1Environment 2
Number of feedbacksEV of EV of teacher policyNumber of feedbacksEV of EV of teacher policy
TypeNumber of All feedbacksNegative feedbacksAll feedbacksNegative feedbacks

1Abundant, nonoptimal100135512.7352.8243.1453.279
2Sparse, near-optimal222.7492.824239703.1573.279
3No demo3111812.7612.8243.1883.279