This tutorial will give an overview of the theoretical foundations of interactive decision making (high-dimensional/contextual bandits, reinforcement learning, and beyond), a promising paradigm for developing AI systems capable of intelligently exploring unknown environments. The tutorial will focus on connections and parallels between supervised learning/estimation and decision making, and will build on recent research which provides (i) sample complexity measures for interactive decision making that are necessary and sufficient for sample-efficient learning, and (ii) unified algorithm design principles that achieve optimal sample complexity. Using this unified approach as a foundation, the main aim of the tutorial will be to give a bird’s-eye view of the statistical landscape of reinforcement learning (e.g., what modeling assumptions lead to sample-efficient algorithms). Topics covered will range from basic challenges and solutions (exploration in tabular RL, contextual bandits) to the current frontier of understanding. We will also highlight practical algorithms.
Recording (ICML)
Slides