Topic: Demystifying Approximate Value-based RL with ε-greedy exploration: a Differential Inclusion View
Q-learning and SARSA(0) with ε-greedy exploration are leading reinforcement learning methods, and their tabular forms are known to converge to the optimal Q-function under reasonable conditions. However, with function approximation, they exhibit unexpected behaviors, such as i.) policy oscillation and chattering, ii.) multiple attractors, and iii.) worst policy convergence, apart from the textbook instability. Accordingly, a theory to explain these phenomena, even for basic linear function approximation, has been a long-standing open problem (Sutton, 1999). In this talk, we will use the theory of differential inclusions to provide the first framework for resolving this problem. We will also discuss numerical examples to illustrate how this framework helps identify and interpret these algorithms' asymptotic behaviors. This is a joint work with Aditya Gopalan (IISc).
Gugan Thoppe is an Asst. Professor at the Dept. of Computer Science and Automation, Indian Institute of Science (IISc). He is also an Associate Researcher at the Robert Bosh Centre, IIT Madras. His M.S. and Ph.D. are from TIFR Mumbai, India. He has also done two postdocs: one at Technion, Israel, and the other at Duke University, USA. His work has been recognized with the Pratiksha Trust Young Investigator award, the TIFR award for best Ph.D. thesis, and two IBM Ph.D. fellowships. He is also the winner of the IISc award for excellence in teaching. His research interests include stochastic approximation and random topology and their applications to reinforcement learning and data analysis.