To join via Zoom: To join this seminar virtually, please request Zoom connection details from headsec [at] stat.ubc.ca
Title: Statistically efficient offline reinforcement learning
Abstract: Despite the empirical success of reinforcement learning (RL) in gaming, such as for AlphaGo and OpenAI Five, we have not seen this level of successful application of RL in many scientific domains. This is because running experiments involving human interaction is often costly and risky. Thus, statistically efficient offline RL (i.e., sequential decision-making in a sample-efficient manner using offline data) is key to solving this limitation. In this talk, I will showcase my research on statistically efficient offline RL. Mostly, I will explain our unified “double minimax RL framework” for offline policy evaluation, which satisfies several desiderata such as (1) it can integrate any rich function approximation such as deep neural networks, (2) it is statistically efficient (i.e., attaining the semiparametric efficiency bound). For the remainder of the time, I will discuss model-based offline RL with general function approximation. I present a new algorithm named constrained pessimistic policy optimization (CPPO) to address the most challenging problem in offline RL known as “distributional shift,” which occurs when the offline data coverage is not sufficient. Our CPPO algorithm is able to learn high-quality policies even if the coverage of offline data is not sufficient.