Research Article

Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Table 2

Setting used to study different aspects of our framework () in the simulated navigation domain.

AspectDemonstrationFeedback error Learner policy type
Step numberOptimality degree

Comparison (5.1.1)100 60% (point A5)0Probabilistic
20 100% (point B2)0Probabilistic
Nonoptimal demonstration effect (5.1.2)100 Different (points: A1–A8, B10, C)0Probabilistic
Sparse demonstration effect (5.1.3)Different (points: B1–B10, C)100%0Probabilistic
Learn only from feedback (5.1.4)No demo (point C)0Probabilistic
Effect of the policy execution (5.1.5)100 60% (point A5)0Probabilistic, greedy, random
Effect of feedback error (5.1.6)100 60% (point A5)ε = 0, 0.1, 0.5Probabilistic

Note: learning from the collected feedbacks is done in the batch learning mode.