Iliad

Abstractions and Latents

Cluster CDownload .md

Formal frameworks for abstraction in alignment — the pointers problem, natural latents (mediation and redundancy), and condensation, with uniqueness and agreement guarantees.

By Daniel Chiang (Independent), Satya Benson (Williams College)

What you’ll learn

  • Explain the alignment motivation for a mathematical theory of abstraction: why human values, as latent variables in world models, require solving the pointers problem to transfer goals to AI systems.
  • Explain why convergence of different agents onto the same abstractions (uniqueness/agreement) is important for making value translation tractable.
  • Explain the mediation and redundancy conditions that define a natural latent, and the role that they play.
  • Explain intuitively why the combination of mediation and redundancy pins down a unique natural latent (up to isomorphism), rather than leaving many candidates.
  • Explain the difference between condensation as organising knowledge into interpretable modular structure versus mere compression.
  • State the perfect condensation agreement result and explain its connection to the natural latent uniqueness guarantee.

Overview

Human values are expressed in terms of latent variables in our world models rather than in terms of low-level physical states. Transferring values to an AI therefore requires establishing a correspondence between the human's and the AI's internal representations of the world, which is tractable only if different agents converge on the same abstractions. This module develops the formal theory of natural latents — latent variables pinned down by mediation and redundancy conditions — and the complementary condensation framework, which explains how world models decompose into discrete, interpretable conceptual structure. Together they provide agreement and translatability theorems grounding the hope that value-relevant concepts are shared across agents.

Daniel Chiang and Satya Benson created this day’s content. Daniel taught it in-person.

Prerequisites

  • Background in statistical mechanics is useful for understanding the motivations for abstractions in general.

  • Familiarity with basic information theory properties and properties about KL divergence are assumed.

  • Familiarity with bayesian networks is helpful for understanding the diagrammatic proofs in natural latents.

  • Some background in measure theory is helpful for understanding condensation rigorously.

Content

Fast track

To achieve the most important learning outcomes within one hour, read the following three pieces in order:

  • The pointers problem: This motivates why abstractions matter for alignment: human values are expressed in terms of latent variables in our world models, and transferring those values to an AI requires establishing a correspondence between the two agents' internal representations of the world.

  • Natural latents: the concepts: This introduces the two core conditions (mediation and redundancy) that pin down a unique natural latent, and the guaranteed translatability theorem that makes value translation tractable.

  • Condensation by Abram Demski: This covers the complementary question of how a world model should be organised into discrete, interpretable concepts rather than merely compressed, and the agreement result showing that different agents will decompose their models into approximately the same conceptual pieces.

Main content

The content for abstractions day is also hosted on the website. In particular, the website contains a 'Why abstractions' section which gives the motivation and introduction to abstractions. It also contains some refreshers to mathematical prerequisites such as information theory. These aren’t included in the links below, which otherwise contain all the website’s abstractions content, and additional slides that are not found on the website.

Start with the lecture slides. The lecture slides give a bird's-eye view of the abstractions landscape, the key open problems, and why abstractions matter for alignment. Continue with the motivation readings below:

Motivation

Each of the following posts explains a different aspect of abstractions and why we care about them. Pick the post that seems most interesting to you and read it.

Natural latent and condensation

Natural latents and condensation are two different frameworks for formalising abstractions. Read the posts under both, and think about how the two frameworks relate to each other — where they overlap, where they differ, and what each captures that the other doesn't. After understanding some of the motivations and intuitions, you may also jump straight to the exercises underneath, which are mathematically self-contained.

Natural latents

Condensation

Next, do the exercises to test your understanding of the readings.

Read the takeaways for exercise 2 after the exercises.

Learn more

To go deeper, each of the following frameworks is relevant to different aspects of abstractions but may not have been developed with an alignment motivation. Pick one of the optional readings and think about its relationship with foundational questions in abstractions more broadly.

Each of the following posts extend the readings in the main content or provide underlying motivation for the frameworks. For instance, 'A Solomonoff inductor walks into a bar' reformulates natural latents using algorithmic information theory instead of Shannon information theory.