Cost-Sensitive Trees for Interpretable Reinforcement Learning

Published in "To appear in CODS-COMAD 2024"
Siddharth Nishtala , Balaraman Ravindran

Trees have emerged as the most popular choice of intrinsically interpretable models to represent policies in reinforcement learning. However, directly learning a tree policy poses challenges, prompting existing approaches to employ neural network policies to generate datasets for training tree-based models in a supervised manner. Nonetheless, these approaches assume that the action suggested by the neural network policy represents the sole optimal action, with all other actions being equally sub-optimal. This work presents a novel perspective by associating different costs with the prediction of different actions. By adopting a cost-sensitive approach to tree construction, we demonstrate that policies generated using this methodology exhibit improved performance. To validate our findings, we develop cost-sensitive variants of two established methods, VIPER and MoET, and provide empirical evidence showcasing their superiority over the original methods across diverse environments.