Journal of Robotics

Research Article

Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Table 4

Results of the learned navigation experiment by the E-puck robot in two different environments, with interaction steps with the environment before the learner policy is updated. Here, in experiments 1 and 3, demonstrations and feedbacks are provided in the same environment, and then, the performance of the learned reward function is examined in the second environment. Experiment 2 is done by providing sparse demonstrations in one environment and feedbacks in another.


Experiment	Demonstration		Environment 1				Environment 2
	Demonstration		Number of feedbacks		EV of	EV of teacher policy	Number of feedbacks		EV of	EV of teacher policy
	Type	Number of	All feedbacks	Negative feedbacks	EV of	EV of teacher policy	All feedbacks	Negative feedbacks	EV of	EV of teacher policy

1	Abundant, nonoptimal	100	135	51	2.735	2.824	—	—	3.145	3.279
2	Sparse, near-optimal	22	—	—	2.749	2.824	239	70	3.157	3.279
3	No demo	—	311	181	2.761	2.824	—	—	3.188	3.279