Principles of Learning
A broad overview of the three fundamental barriers to universal learning — approximation, generalization, and optimization — and how Solomonoff induction solves too of them but fails at the third.
By Zach Furman (The University of Melbourne)
What you’ll learn
- Students can explain the three fundamental barriers to building a universal learning machine: approximation, generalization, and optimization, and articulate why they are sometimes/naively in tension with each other
- Students can explain at a high level what Solomonoff induction is, how it resolves the approximation and generalization barriers via a universal hypothesis class and simplicity prior, and why it fails on optimization
- Students are aware of the optimality result for Solomonoff induction
- Students can explain the bias-variance tradeoff and why it might suggest universal learning is impossible, and how the simplicity prior in Solomonoff induction sidesteps this
- Students can state the no free lunch theorem and explain what it does and does not imply about practical learning
- Students can explain, at a high level, why even polynomial-time hypothesis classes can be hard to learn over efficiently (the PRG/cryptographic hardness argument)
- Students can articulate why learning requires structure in the world, and what this implies about what we should expect from powerful learning machines
- Students understand the relevance of this material to AGI safety: if AGI is a learning machine, the three barriers constrain what it can look like, and Solomonoff induction serves as a theoretical reference point for reasoning about its capabilities
Overview
We ask: if AGI is a learning machine, what can we say about it in principle? We draw on the fields of statistical learning theory and algorithmic information theory. We first explore three fundamental barriers that any powerful learning system must overcome — approximation (can the hypothesis class express the truth?), generalization (can finite data distinguish truth from noise?), and optimization (can the learner find good hypotheses efficiently?) — and examine why these barriers are in tension with each other. We then introduce Solomonoff induction, which elegantly resolves the first two barriers via a universal hypothesis class and a simplicity prior, but completely fails on the third. Along the way, we cover the bias-variance tradeoff, the no free lunch theorem, and cryptographic hardness arguments for why efficient universal learning is impossible in the worst case. This is necessarily a lightning tour of large fields; the goal is to establish a shared conceptual vocabulary and theoretical reference point that the rest of the deep learning theory cluster builds on.
Prerequisites
-
Probability theory basics: probability distributions, conditional probability, Bayes' theorem
-
Comfort with mathematical proofs and formal definitions
-
Some familiarity with the concept of Turing machines and computability (at the level of "know what a Turing machine is, why it's a universal model of computation, and what it means for a function to be computable")
-
No machine learning background assumed; no coding required
-
Helpful but not required: basic statistical concepts (bias, variance, estimators)
Content
Fast track
The content is already a very high-level overview, but to get the absolute basics, read the lecture slides and Sections 2-2.8 of Shane Legg's PhD thesis. Ideally this would still involve a decent amount of discussion (possibly with LLMs).
Main content
Main lecture:
Readings and exercises (in order):
-
Introduction to Solomonoff induction
- Shane Legg's PhD thesis (Sections 2-2.8)
-
Solomonoff induction exercises
- Solutions here
-
Understanding Machine Learning: From Theory to Algorithms
- Section 8.4 (Hardness of learning)
Learn more
-
A Semitechnical Introductory Dialogue on Solomonoff Induction
-
A Technical Introduction to Solomonoff Induction without K-Complexity
-
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions
-
An Introduction to Kolmogorov Complexity and Its Applications
- Standard AIT textbook
-
An Introduction to Universal Artificial Intelligence
- Hutter's SI / AIXI book