Iliad Intensive Curriculum
The Iliad Intensive is a month-long, full-time AI alignment course for students with strong mathematics, physics, or theoretical-CS backgrounds. These are the materials from the April 2026 cohort — mathematical exercises, self-contained lecture notes on topics from singular learning theory to debate, and pointers for further study. About 20 contributors developed them. We share them to invite feedback and enable independent study.
Foundations
- Prerequisites
A curated reading list of background worldview material and technical prerequisites (math, CS, deep learning) recommended before the Iliad Intensive.
A — Alignment
- AI Alignment Introduction
An opinionated tour of the AI alignment problem: alignment targets, problem decompositions, goal-directedness and instrumental convergence, the risk landscape, and high-level solution approaches.
- Alignment in practice
A tour of how frontier LLMs are attempted to be aligned in practice — interventions at pre-training, post-training, and three stages of deployment.
- Reward Learning Theory
The theoretical foundations of reward learning — how RLHF can under strong assumptions recover an aligned objective, why underspecification and misspecification break that story, and how reward learning can be embedded into the framework of assistance games.
B — Learning
- Principles of Learning
A broad overview of the three fundamental barriers to universal learning — approximation, generalization, and optimization — and how Solomonoff induction solves too of them but fails at the third.
- Mysteries of Deep Learning
A survey of the empirical mysteries of deep learning — generalization despite overparameterization, optimization on non-convex landscapes, representational alignment, and in-context learning — and the program-synthesis hypothesis as a potential explanation.
- Singular Learning Theory
An invitation to singular learning theory — parameter–function map degeneracy, the local learning coefficient via volume scaling, and Watanabe's free energy formula for Bayesian inference.
- Training Dynamics
Implicit regularization and emergence in deep learning — loss landscapes of deep linear networks, lazy vs rich regimes, grokking, and dynamical mean field theory and training-time phase transitions.
- Data Attribution
Data attribution for alignment via three connected frameworks — influence functions, Bayesian influence functions, and unrolling — for measuring how reweighting training points counterfactually affects model behavior.
C — Abstractions, Representations, and Interpretability
- Intro to ML Engineering
Hands-on ML engineering for AI safety experiments — PyTorch tensors, training loops, optimizers, model architectures, LLM lifecycle, and the practical tooling (Runpod, W&B) around empirical research.
- Mechanistic Interpretability
Mechanistic interpretability for neural networks — features and circuits, methods for discovery, the frontier of knowledge, and critiques of the field.
- Computational Mechanics
Computational mechanics for AI safety — hidden Markov models, generalised HMMs, belief states, mixed-state presentations, and evidence that transformers represent belief geometry in their residual streams.
- Abstractions and Latents
Formal frameworks for abstraction in alignment — the pointers problem, natural latents (mediation and redundancy), and condensation, with uniqueness and agreement guarantees.
D — Agency
- Reinforcement Learning
Foundations of reinforcement learning — Markov decision processes, the Bellman equation, policy improvement, and the SARSA/Q-learning algorithms, with parallel empirical and theory tracks.
- Idealised Agency
Idealised agency through AIXI and decision theory — history-based RL, the self-optimizing theorem, and how preferences relate to utility and reward.
- Agent Foundations
An introduction to agent foundations — embedded agency, reflective stability, Löb's obstacle, and the prescriptive vs descriptive research agendas for understanding agents.
- World Models
Theoretical and practical foundations of world models — from control theory, neuroscience, and RL framings (POMDPs, belief MDPs, transducers) to modern systems (Ha & Schmidhuber, Dreamer, Genie, JEPA) and the role of symmetries in building abstractions.
E — Safety Guarantees and their Limits
- Debate
AI safety via debate as scalable oversight — proof checkers, the PSPACE debate protocol, cross-examination up to NEXP, obfuscated arguments, prover-estimator debate, and the AISI safety case.
- Steganography & Backdoors
Hidden communication in LLM outputs — perfect vs computational steganography, a practical PRF-based scheme, and relations with chain-of-thought monitoring; furthermore, hidden backdoors.
- Worst-Case Interpretability
The limits of average-case interpretability evaluation, compact proofs as an idealised alternative, and ARC's heuristic arguments agenda as a potentially tractable middle path.