News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA

Enter the characters shown in the image.

User menu

You are here

Optimal methods for reinforcement learning: Efficient algorithms with instance-dependent guarantees

Thursday, March 2, 2023 - 10:30 to 11:30
Wenlong Mou, PhD student, Department of Electrical Engineering and Computer Sciences, University of California Berkeley
Statistics Seminar
ESB 4192 / Zoom

To join via Zoom: To join this seminar virtually, please request Zoom connection details from headsec [at] stat.ubc.ca

Title: Optimal methods for reinforcement learning: Efficient algorithms with instance-dependent guarantees

Abstract: Reinforcement learning (RL) is a pillar for modern artificial intelligence and data-driven decision making. Compared to classical statistical learning, several new statistical phenomena arise from RL problems, leading to different trade-offs in the choice of the estimators, tuning of their parameters, and the design of computational algorithms. In many settings, asymptotic and/or worst-case theory fails to provide the relevant guidance.

In this talk, I present recent advances in optimal algorithms for reinforcement learning. The bulk of this talk focuses on function approximation methods for policy evaluation. I establish a novel class of optimal and instance-dependent oracle inequalities for projected Bellman equations, as well as efficient computational algorithms achieving them under different settings. Among other results, I will highlight how the instance-dependent guarantees guide the selection of tuning parameters in temporal different methods. Drawing on this perspective, I will also discuss a novel class of stochastic approximation methods, yielding optimal statistical guarantees for solving the Bellman optimality equation. At the end of this talk, I will discuss additional works on optimal and instance-dependent guarantees for functional estimation with off-policy data.