Iliad Intensive Curriculum

Overview

This day is split into two parts.

Part (1) on AIXI closely follows An Introduction to Universal Artificial Intelligence (UAI). This theory module is based on a set of self-contained exercises that introduce the history-based RL framework that UAI works in. We prove some results about interaction measures over histories and Bayesian mixtures, we define the optimal Bayesian agent AIXI, we prove well-defineness of the optimal value and existence of optimal policies, and we build up to three main results: The Bayesian mixture converges on-policy to the true environment, AIXI cannot be fooled by deterministic environments, and AIXI can learn to perform well in an environment in which learning is possible (the self-optimizing property).

Part (2) introduces the notion of preferences as a general way to encode goals and/or desires of general agents. Then, it presents which are the necessary and sufficient assumptions (in the form of axioms) to represent preferences as maximising (i) an utility function, (ii) an expected utility, and (iii) an expected discounted future reward. It also explores what are the consequences of dropping different axioms. It also briefly discusses the difference between these results related with representation of preferences, with stronger results pertaining to coherence and selection of agents.

David Quarel wrote the exercises/slides for the AIXI section, taking heavily from the book.
Fernando Rosas wrote notes related to preferences and rewards, which combine ideas from this LW post and this paper.

Prerequisites

Some material from the reinforcement learning module: Agent-environment interaction loop; definitions of return, reward, value, policy, the Bellman equation, optimal, better. Don't need to have done any coding. Don't need Q-learning.
Everything is redefined in the AIXI worksheet, it's very self-contained.

Content

Fast track

For AIXI: Best to read the solution sheet and try to understand each statement and the proofs. Hard to speedrun this section any faster than just doing the exercises. Stop at Problem 6, skip problem 5 and every exercise marked with (*).
For preferences and rewards: read the notes, skip the math and the consequences of dropping the axioms.

Main content

AIXI, all content. Lecture slides are self contained. Students work together in pairs on the AIXI exercises.
From preferences to rewards lecture notes.

Learn more

See the references in the slides and the worksheets, for AIXI.
Introduction to Universal Artificial Intelligence: Chapter 2 just covers background; Chapter 3.1, 3.2, 3.9; Chapter 6.1, 6.2, 6.6; Chapter 7.
For 'From preferences to rewards':
- LessWrong post on the independence axiom
- Talk about the fifth axiom to turn vNM utility into rewards (includes also other good discussions regarding RL)
- Coherent decisions imply consistent utility
- Selection theorems
- You may also take a look at the references at the end of the lecture notes.

Idealised Agency