Smooth Imitation Learning via Smooth Costs and Smooth Policies

Published in "5th Joint International Conference on Data Science & Management of Data (CODS-COMAD 2022)"
Sapana Chauhary , B Ravindran

Imitation learning (IL) is a popular approach in the continuous control setting as among other reasons it circumvents the problems of reward mis-specification and exploration in reinforcement learning (RL). In IL from demonstrations, an important challenge is to obtain agent policies that are smooth with respect to the inputs. Learning through imitation a policy that is smooth as a function of a large state-action ($s$-$a$)space (typical of high dimensional continuous control environments) can be challenging. We take a first step towards tackling this issue by using smoothness inducing regularizers on \textit{both} the policy and the cost models of adversarial imitation learning. Our regularizers work by ensuring that the cost function changes in a controlled manner as a function of $s$-$a$ space; and the agent policy is well behaved with respect to the state space. We call our new smooth IL algorithm \textit{Smooth Policy and Cost Imitation Learning} (SPaCIL, pronounced ``Special''). We introduce a novel metric to quantify the smoothness of the learned policies. We demonstrate SPaCIL’s superior performance on continuous control tasks from MuJoCo. The algorithm not just outperforms the state-of-the-art IL algorithm on our proposed smoothness metric, but, enjoys added benefits of faster learning and substantially higher average return.