---
title: Principles of Learning
cluster: B
contributors:
  - Zach Furman (The University of Melbourne)
summary: A broad overview of the three fundamental barriers to universal
  learning — approximation, generalization, and optimization — and how
  Solomonoff induction solves too of them but fails at the third.
learningOutcomes:
  - "Students can explain the three fundamental barriers to building a universal
    learning machine: approximation, generalization, and optimization, and
    articulate why they are sometimes/naively in tension with each other"
  - Students can explain at a high level what Solomonoff induction is, how it
    resolves the approximation and generalization barriers via a universal
    hypothesis class and simplicity prior, and why it fails on optimization
  - Students are aware of the optimality result for Solomonoff induction
  - Students can explain the bias-variance tradeoff and why it might suggest
    universal learning is impossible, and how the simplicity prior in Solomonoff
    induction sidesteps this
  - Students can state the no free lunch theorem and explain what it does and
    does not imply about practical learning
  - Students can explain, at a high level, why even polynomial-time hypothesis
    classes can be hard to learn over efficiently (the PRG/cryptographic
    hardness argument)
  - Students can articulate why learning requires structure in the world, and
    what this implies about what we should expect from powerful learning
    machines
  - "Students understand the relevance of this material to AGI safety: if AGI is
    a learning machine, the three barriers constrain what it can look like, and
    Solomonoff induction serves as a theoretical reference point for reasoning
    about its capabilities"
---
## Overview

We ask: if AGI is a learning machine, what can we say about it in principle? We draw on the fields of statistical learning theory and algorithmic information theory. We first explore three fundamental barriers that any powerful learning system must overcome — approximation (can the hypothesis class express the truth?), generalization (can finite data distinguish truth from noise?), and optimization (can the learner find good hypotheses efficiently?) — and examine why these barriers are in tension with each other. We then introduce Solomonoff induction, which elegantly resolves the first two barriers via a universal hypothesis class and a simplicity prior, but completely fails on the third. Along the way, we cover the bias-variance tradeoff, the no free lunch theorem, and cryptographic hardness arguments for why efficient universal learning is impossible in the worst case. This is necessarily a lightning tour of large fields; the goal is to establish a shared conceptual vocabulary and theoretical reference point that the rest of the deep learning theory cluster builds on.​​​​​​​​​​​​​​​​

## Prerequisites

-   Probability theory basics: probability distributions, conditional probability, Bayes' theorem
    
-   Comfort with mathematical proofs and formal definitions
    
-   Some familiarity with the concept of Turing machines and computability (at the level of "know what a Turing machine is, why it's a universal model of computation, and what it means for a function to be computable")
    
-   No machine learning background assumed; no coding required
    
-   Helpful but not required: basic statistical concepts (bias, variance, estimators)
    

## Content

### Fast track

The content is already a very high-level overview, but to get the absolute basics, read the lecture slides and Sections 2-2.8 of Shane Legg's PhD thesis. Ideally this would still involve a decent amount of discussion (possibly with LLMs).

### Main content

Main lecture:

-   [PDF](/uploads/principles/Slides.pdf)
    

Readings and exercises (in order):

-   [Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)
    
-   Introduction to Solomonoff induction
    
    -   [Shane Legg's PhD thesis](https://www.vetta.org/documents/Machine_Super_Intelligence.pdf) (Sections 2-2.8)
        
-   [Solomonoff induction exercises](/uploads/principles/Iliad_Intensive_Solomonoff_Exercises.pdf)
    
    -   Solutions [here](/uploads/principles/Iliad_Intensive_Solomonoff_Exercises_Solutions.pdf)
        
-   [Understanding the Bias-Variance tradeoff](https://scott.fortmann-roe.com/docs/BiasVariance.html)
    
-   [The No Free Lunch theorems and their Razor](https://www.lesswrong.com/posts/sdrCBWpqyvNBJSZH5/the-no-free-lunch-theorems-and-their-razor)
    
-   [Understanding Machine Learning: From Theory to Algorithms](https://www.cambridge.org/core/books/understanding-machine-learning/3059695661405D25673058E43C8BE2A6)
    
    -   Section 8.4 (Hardness of learning)
        

### Learn more

-   [An intuitive explanation of Solomonoff induction](https://www.lesswrong.com/posts/Kyc5dFDzBg4WccrbK/an-intuitive-explanation-of-solomonoff-induction)
    
-   [A Semitechnical Introductory Dialogue on Solomonoff Induction](https://www.lesswrong.com/posts/EL4HNa92Z95FKL9R2/a-semitechnical-introductory-dialogue-on-solomonoff-1)
    
-   [A Technical Introduction to Solomonoff Induction without K-Complexity](https://www.lesswrong.com/posts/HSDumToH57nSRdLST/a-technical-introduction-to-solomonoff-induction-without-k)
    
-   [The Parable of Predict-O-Matic](https://www.lesswrong.com/posts/SwcyMEgLyd4C3Dern/the-parable-of-predict-o-matic)
    
-   [The Solomonoff Prior is Malign](https://www.lesswrong.com/posts/Tr7tAyt5zZpdTwTQK/the-solomonoff-prior-is-malign)
    
-   [The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions](/uploads/principles/2002-schmidhuber.pdf)
    
-   [An Introduction to Kolmogorov Complexity and Its Applications](https://link.springer.com/book/10.1007/978-0-387-49820-1)
    
    -   Standard AIT textbook
        
-   [An Introduction to Universal Artificial Intelligence](/uploads/principles/uaibook2.pdf)
    
    -   Hutter's SI / AIXI book
        
-   [Understanding Machine Learning: From Theory to Algorithms](https://www.cambridge.org/core/books/understanding-machine-learning/3059695661405D25673058E43C8BE2A6)