Iliad

Iliad Intensive Curriculum

The Iliad Intensive is a month-long, full-time AI alignment course for students with strong mathematics, physics, or theoretical-CS backgrounds. These are the materials from the April 2026 cohort — mathematical exercises, self-contained lecture notes on topics from singular learning theory to debate, and pointers for further study. About 20 contributors developed them. We share them to invite feedback and enable independent study.

Read the full overview →

Foundations

  • Prerequisites

    A curated reading list of background worldview material and technical prerequisites (math, CS, deep learning) recommended before the Iliad Intensive.

A — Alignment

  • AI Alignment Introduction

    An opinionated tour of the AI alignment problem: alignment targets, problem decompositions, goal-directedness and instrumental convergence, the risk landscape, and high-level solution approaches.

  • Alignment in practice

    A tour of how frontier LLMs are attempted to be aligned in practice — interventions at pre-training, post-training, and three stages of deployment.

  • Reward Learning Theory

    The theoretical foundations of reward learning — how RLHF can under strong assumptions recover an aligned objective, why underspecification and misspecification break that story, and how reward learning can be embedded into the framework of assistance games.

B — Learning

  • Principles of Learning

    A broad overview of the three fundamental barriers to universal learning — approximation, generalization, and optimization — and how Solomonoff induction solves too of them but fails at the third.

  • Mysteries of Deep Learning

    A survey of the empirical mysteries of deep learning — generalization despite overparameterization, optimization on non-convex landscapes, representational alignment, and in-context learning — and the program-synthesis hypothesis as a potential explanation.

  • Singular Learning Theory

    An invitation to singular learning theory — parameter–function map degeneracy, the local learning coefficient via volume scaling, and Watanabe's free energy formula for Bayesian inference.

  • Training Dynamics

    Implicit regularization and emergence in deep learning — loss landscapes of deep linear networks, lazy vs rich regimes, grokking, and dynamical mean field theory and training-time phase transitions.

  • Data Attribution

    Data attribution for alignment via three connected frameworks — influence functions, Bayesian influence functions, and unrolling — for measuring how reweighting training points counterfactually affects model behavior.

C — Abstractions, Representations, and Interpretability

  • Intro to ML Engineering

    Hands-on ML engineering for AI safety experiments — PyTorch tensors, training loops, optimizers, model architectures, LLM lifecycle, and the practical tooling (Runpod, W&B) around empirical research.

  • Mechanistic Interpretability

    Mechanistic interpretability for neural networks — features and circuits, methods for discovery, the frontier of knowledge, and critiques of the field.

  • Computational Mechanics

    Computational mechanics for AI safety — hidden Markov models, generalised HMMs, belief states, mixed-state presentations, and evidence that transformers represent belief geometry in their residual streams.

  • Abstractions and Latents

    Formal frameworks for abstraction in alignment — the pointers problem, natural latents (mediation and redundancy), and condensation, with uniqueness and agreement guarantees.

D — Agency

  • Reinforcement Learning

    Foundations of reinforcement learning — Markov decision processes, the Bellman equation, policy improvement, and the SARSA/Q-learning algorithms, with parallel empirical and theory tracks.

  • Idealised Agency

    Idealised agency through AIXI and decision theory — history-based RL, the self-optimizing theorem, and how preferences relate to utility and reward.

  • Agent Foundations

    An introduction to agent foundations — embedded agency, reflective stability, Löb's obstacle, and the prescriptive vs descriptive research agendas for understanding agents.

  • World Models

    Theoretical and practical foundations of world models — from control theory, neuroscience, and RL framings (POMDPs, belief MDPs, transducers) to modern systems (Ha & Schmidhuber, Dreamer, Genie, JEPA) and the role of symmetries in building abstractions.

E — Safety Guarantees and their Limits

  • Debate

    AI safety via debate as scalable oversight — proof checkers, the PSPACE debate protocol, cross-examination up to NEXP, obfuscated arguments, prover-estimator debate, and the AISI safety case.

  • Steganography & Backdoors

    Hidden communication in LLM outputs — perfect vs computational steganography, a practical PRF-based scheme, and relations with chain-of-thought monitoring; furthermore, hidden backdoors.

  • Worst-Case Interpretability

    The limits of average-case interpretability evaluation, compact proofs as an idealised alternative, and ARC's heuristic arguments agenda as a potentially tractable middle path.