Research Article
Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach
Figure 8
The performance of under different exploration policy types used in the interactive phase (stage-two) when 100 state-action pairs of 60% optimal demonstrations are given (see Table 2).