Iliad Intensive Curriculum

Overview

We provide a brief introduction to Reinforcement Learning from the fundamentals, covering tabular RL (chapters 2-4 of Sutton and Barto) in two streams. The empirical stream directly follows Day 1 of ARENA and covers implementing policy iteration/evaluation, Q-learning, and SARSA for toy gridworld environments in Python. The theory stream proves a series of results including the Bellman equations, the convergence of policy iteration and its rate, and the convergence of Q-learning, and derives an analytic solution to the Bellman equation.

Theory Workshop material produced by Leon Lang and David Quarel
Lecture slides taken from the ARENA program (authored by David Quarel)
Empirical exercises taken from Day 1 of the ARENA program
Content was delivered by David Quarel

Prerequisites

No prerequisites are assumed. All the RL material is self-contained.
The theory track requires familiarity with proofs.

Content

Fast track

Pretty hard to fast track it, best approach I would give is to read and understand the lecture slides, and:

Empirical track: Run the code, and understand the solutions for policy iteration/improvement (not the exact form), and read/run Q-learning
Theory track: Just read the solutions for Problems 1 and 4 and make sure you understand each step

Main content

Self-contained lecture notes
- This has everything that both empirical and theory track should need for reference.
Empirical track: Work through ARENA RL Day 1
Theory track works through a problem sheet

(static compiled versions of lecture slides + theory exercises here)

Learn more

Sutton&Barto
- Chapter 3: Sections 3.1, 3.2, 3.3, 3.4, 3.5, 3.6
- Chapter 4: Sections 4.1, 4.2, 4.3, 4.4
- Chapter 6, Section 6.1, 6.3 (Especially Example 6.4)
  - Note that Section 6.1 talks about temporal difference (TD) updates for the value function V . We will instead be using TD updates for the Q-value Q .
  - Don't worry about the references to Monte Carlo in Chapter 5.
Q-Learning : The original paper where Q-learning is first described. Kind of a hard read. Probably not worth bothering with.
David Silver’s RL Lecture series
- Parts 1 and 2
David Quarel’s RL lecture for ARENA (I (David) delivered this exact same lecture on the day)
The entire RL chapter of ARENA
A good summary of SOTA algorithms for RL