Mysteries of Deep Learning
A survey of the empirical mysteries of deep learning — generalization despite overparameterization, optimization on non-convex landscapes, representational alignment, and in-context learning — and the program-synthesis hypothesis as a potential explanation.
By Zach Furman (The University of Melbourne)
What you’ll learn
- Students can explain how the three barriers from the day on the Principles of Learning (approximation, generalization, optimization) manifest specifically in the context of deep learning, and why classical theoretical frameworks (e.g. bias-variance tradeoff, universal approximation) actively counterpredicted deep learning's success
- Students are aware of the key empirical mysteries of deep learning: data-dependent generalization despite overparameterization, effectiveness of SGD on non-convex landscapes, representational alignment across architectures, and in-context learning
- Students have encountered at least one candidate explanation for each mystery and can articulate what it does and doesn't explain
- Students understand the "program synthesis" hypothesis as one proposed framework connecting deep learning to Solomonoff induction, and can evaluate its strengths and limitations
- Students can articulate why solving these mysteries matters for AI safety: understanding the basic mechanisms by which deep learning works is necessary for any systematic (generalizing OOD) alignment interventions or measurements to even be possible
Overview
We turn from learning in principle to learning in practice. Deep learning appears to overcome all three barriers from the Principles of Learning module in ways that classical statistical learning theory could not predict or even actively counterpredicted — overparameterized networks generalize despite having the capacity to memorize, SGD finds good solutions on non-convex landscapes, and networks compactly represent functions in millions of dimensions. Beyond these, deep learning exhibits additional empirical phenomena that the classical framework doesn’t even address: learned representations converge across different architectures and training setups, models display in-context learning abilities that were never explicitly trained, etc. We survey these mysteries and explore candidate explanations, including the hypothesis that deep learning may be overcoming these barriers with similar mechanisms to Solomonoff induction. As with Day B.1, this is a broad lightning overview; the subsequent case study days (SLT, training dynamics, data attribution) each develop one specific line of attack on these mysteries in depth.
Prerequisites
-
Module on Principles of Learning: the three barriers (approximation, generalization, optimization), Solomonoff induction and the simplicity prior, no free lunch, bias-variance tradeoff
-
Basic understanding of deep learning, sufficient to read non-specialist ML papers
-
Knowledge of mechanistic interpretability (Day C.2) is very helpful motivation but not logically necessary
Content
Fast track
Read the lecture slides for the overall framing, then read "Deep Learning as Program Synthesis" (skipping the background section on Solomonoff induction, which was covered on the Principles of Learning day, and optionally deferring the "path forward" section). This gives a high level overview of various empirical mysteries. Then skim as many papers on the list as you have time/interest (possibly none).
Main content
Lecture:
Core readings (likely cover a smaller subset depending on time / audience interest):
-
Overview
-
Deep Learning as Program Synthesis
- Note that this post presents an opinionated hypothesis (deep learning is performing something analogous to Solomonoff induction) alongside relatively consensus discussion of empirical mysteries. The post is largely being shared for the latter, though students may find the hypothesis itself useful pedagogically
-
-
Approximation
-
Generalization
-
Optimization
-
Representational alignment
-
In-context learning
Learn more
-
- Quite polemical. Nevertheless, very influential "ideas piece"
-
Getting aligned on representational alignment
- A broader overview of the representational alignment phenomenon
-
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
- Classic paper which finds stagewise learning in neural networks with linear activation function; will be covered later in the course
-
Stagewise Development in Neural Networks
- A nice paper investigating stagewise learning in small LLMs. Requires some SLT knowledge
-
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
- Training on "evil" data in narrow domains (like, code with vulnerabilities) generalizes to "evil" more broadly (praising Hitler, etc)