To join via Zoom: To join this seminar virtually, please request Zoom connection details from headsec [at] stat.ubc.ca
Title: Optimal methods for reinforcement learning: Efficient algorithms with instance-dependent guarantees
Abstract: Reinforcement learning (RL) is a pillar for modern artificial intelligence and data-driven decision making. Compared to classical statistical learning, several new statistical phenomena arise from RL problems, leading to different trade-offs in the choice of the estimators, tuning of their parameters, and the design of computational algorithms. In many settings, asymptotic and/or worst-case theory fails to provide the relevant guidance.
In this talk, I present recent advances in optimal algorithms for reinforcement learning. The bulk of this talk focuses on function approximation methods for policy evaluation. I establish a novel class of optimal and instance-dependent oracle inequalities for projected Bellman equations, as well as efficient computational algorithms achieving them under different settings. Among other results, I will highlight how the instance-dependent guarantees guide the selection of tuning parameters in temporal different methods. Drawing on this perspective, I will also discuss a novel class of stochastic approximation methods, yielding optimal statistical guarantees for solving the Bellman optimality equation. At the end of this talk, I will discuss additional works on optimal and instance-dependent guarantees for functional estimation with off-policy data.