Research Article

A Study of Continuous Maximum Entropy Deep Inverse Reinforcement Learning

Algorithm 1

Nonlinear IRL with gradient descent.

Input:
Output: optimal reward function weight
)
) for iteration k = 1 to K  do
) Sample demonstration batch
) Sample background batch
) Append demonstration batch to background
batch:
) set and Estimate using
and
) Update parameters θ using gradient
) end for
) return optimized parameters