Iliad Intensive Curriculum

Overview

Human values are expressed in terms of latent variables in our world models rather than in terms of low-level physical states. Transferring values to an AI therefore requires establishing a correspondence between the human's and the AI's internal representations of the world, which is tractable only if different agents converge on the same abstractions. This module develops the formal theory of natural latents — latent variables pinned down by mediation and redundancy conditions — and the complementary condensation framework, which explains how world models decompose into discrete, interpretable conceptual structure. Together they provide agreement and translatability theorems grounding the hope that value-relevant concepts are shared across agents.

Daniel Chiang and Satya Benson created this day’s content. Daniel taught it in-person.

Prerequisites

Background in statistical mechanics is useful for understanding the motivations for abstractions in general.
Familiarity with basic information theory properties and properties about KL divergence are assumed.
Familiarity with bayesian networks is helpful for understanding the diagrammatic proofs in natural latents.
Some background in measure theory is helpful for understanding condensation rigorously.

Content

Fast track

To achieve the most important learning outcomes within one hour, read the following three pieces in order:

The pointers problem: This motivates why abstractions matter for alignment: human values are expressed in terms of latent variables in our world models, and transferring those values to an AI requires establishing a correspondence between the two agents' internal representations of the world.
Natural latents: the concepts: This introduces the two core conditions (mediation and redundancy) that pin down a unique natural latent, and the guaranteed translatability theorem that makes value translation tractable.
Condensation by Abram Demski: This covers the complementary question of how a world model should be organised into discrete, interpretable concepts rather than merely compressed, and the agreement result showing that different agents will decompose their models into approximately the same conceptual pieces.

Main content

The content for abstractions day is also hosted on the website. In particular, the website contains a 'Why abstractions' section which gives the motivation and introduction to abstractions. It also contains some refreshers to mathematical prerequisites such as information theory. These aren’t included in the links below, which otherwise contain all the website’s abstractions content, and additional slides that are not found on the website.

Start with the lecture slides. The lecture slides give a bird's-eye view of the abstractions landscape, the key open problems, and why abstractions matter for alignment. Continue with the motivation readings below:

Motivation

Each of the following posts explains a different aspect of abstractions and why we care about them. Pick the post that seems most interesting to you and read it.

The pointers problem
Condensation motivation
Natural abstractions: Key claims, theorems, critiques – Read the Introduction section, Key high-level claims, and How is the natural abstractions agenda relevant to alignment?
Ontology Identification
Why care about natural latents
What is abstraction
Understanding abstraction as a robust bottleneck – Read section 3 (How is understanding abstraction a bottleneck to any alignment approach at all?)

Natural latent and condensation

Natural latents and condensation are two different frameworks for formalising abstractions. Read the posts under both, and think about how the two frameworks relate to each other — where they overlap, where they differ, and what each captures that the other doesn't. After understanding some of the motivations and intuitions, you may also jump straight to the exercises underneath, which are mathematically self-contained.

Natural latents

Condensation

Next, do the exercises to test your understanding of the readings.

Read the takeaways for exercise 2 after the exercises.

Learn more

To go deeper, each of the following frameworks is relevant to different aspects of abstractions but may not have been developed with an alignment motivation. Pick one of the optional readings and think about its relationship with foundational questions in abstractions more broadly.

Each of the following posts extend the readings in the main content or provide underlying motivation for the frameworks. For instance, 'A Solomonoff inductor walks into a bar' reformulates natural latents using algorithmic information theory instead of Shannon information theory.

Abstractions and Latents

What you’ll learn