Research Article
Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
Table 2
Setting used to study different aspects of our framework (
) in the simulated navigation domain.
| Aspect | Demonstration | Feedback error | Learner policy type | Step number | Optimality degree |
| Comparison (5.1.1) | 100 | 60% (point A5) | 0 | Probabilistic | 20 | 100% (point B2) | 0 | Probabilistic | Nonoptimal demonstration effect (5.1.2) | 100 | Different (points: A1–A8, B10, C) | 0 | Probabilistic | Sparse demonstration effect (5.1.3) | Different (points: B1–B10, C) | 100% | 0 | Probabilistic | Learn only from feedback (5.1.4) | No demo (point C) | — | 0 | Probabilistic | Effect of the policy execution (5.1.5) | 100 | 60% (point A5) | 0 | Probabilistic, greedy, random | Effect of feedback error (5.1.6) | 100 | 60% (point A5) | ε = 0, 0.1, 0.5 | Probabilistic |
|
|
Note: learning from the collected feedbacks is done in the batch learning mode.
|