Iliad

Reinforcement Learning

Cluster DDownload .md

Foundations of reinforcement learning — Markov decision processes, the Bellman equation, policy improvement, and the SARSA/Q-learning algorithms, with parallel empirical and theory tracks.

By David Quarel (Australian National University), Leon Lang (Iliad)

What you’ll learn

  • Understand Markov Decision Processes and the goal of the agent, for known environments.
  • Understand the Bellman equation.
  • Understand the policy improvement theorem, and how we can use it to iteratively solve for an optimal policy.
  • Empirical track: Understand the anatomy of a gym.Env, so that you feel comfortable using them and writing your own.
  • Empirical track: Understand SARSA and Q-learning and the difference between on-policy and off-policy methods.
  • Empirical track: Implement SARSA and Q-Learning, and compare them on different environments.
  • Empirical track: Understand the TD(λ) algorithm, and how it can we used to mix over short and long timescale updates.
  • Theory track: Prove properties involving the Bellman equations, including the existence of optimal policies, the policy improvement theorem, rate of convergence of Bellman updates, and the convergence of Q-learning.

Overview

We provide a brief introduction to Reinforcement Learning from the fundamentals, covering tabular RL (chapters 2-4 of Sutton and Barto) in two streams. The empirical stream directly follows Day 1 of ARENA and covers implementing policy iteration/evaluation, Q-learning, and SARSA for toy gridworld environments in Python. The theory stream proves a series of results including the Bellman equations, the convergence of policy iteration and its rate, and the convergence of Q-learning, and derives an analytic solution to the Bellman equation.

  • Theory Workshop material produced by Leon Lang and David Quarel

  • Lecture slides taken from the ARENA program (authored by David Quarel)

  • Empirical exercises taken from Day 1 of the ARENA program

  • Content was delivered by David Quarel

Prerequisites

  • No prerequisites are assumed. All the RL material is self-contained.

  • The theory track requires familiarity with proofs.

Content

Fast track

Pretty hard to fast track it, best approach I would give is to read and understand the lecture slides, and:

  • Empirical track: Run the code, and understand the solutions for policy iteration/improvement (not the exact form), and read/run Q-learning

  • Theory track: Just read the solutions for Problems 1 and 4 and make sure you understand each step

Main content

(static compiled versions of lecture slides + theory exercises here)

Learn more