---
title: Singular Learning Theory
cluster: B
contributors:
  - Kai Ogden (University of Oxford)
  - Matthew Farrugia-Roberts (University of Oxford)
  - Zach Furman (University of Melbourne)
summary: An invitation to singular learning theory — parameter–function map
  degeneracy, the local learning coefficient via volume scaling, and Watanabe's
  free energy formula for Bayesian inference.
learningOutcomes:
  - Define degeneracy of parameter–function maps and identify degenerate
    directions in simple models (including deep linear networks and multi-layer
    perceptrons).
  - Define the local learning coefficient via volume scaling asymptotics,
    compute learning coefficients in simple examples, and compare to the regular
    and minimally singular cases.
  - State Watanabe's free energy formula and analyse toy examples of Bayesian
    phase transitions.
---
## Overview

Within a neural network architecture, certain weight vectors correspond to structurally simpler neural networks. These *degeneracies* complicate the relationship between the neural network’s parameter space and the resulting space of functions. In turn, learning in neural networks is substantially richer than learning in classical statistical models. Singular learning theory (SLT) is a theory of learning that places degeneracies at the center. This module explores qualitative definitions of degeneracy in terms of the parameter–function map, the Fisher information matrix, and curvature of the loss landscape. We then introduce SLT’s central quantitative definition of degeneracy, the local learning coefficient, from the perspective of volume scaling asymptotics. Finally, we consider Watanabe’s free energy formula for Bayesian inference as a case study on the implications of degeneracy for learning.

Teaching plan and lecture notes by Kai Ogden, Matthew Farrugia-Roberts, and Zach Furman. Lecture slides by Zach Furman.

<Exercise difficulty={1}>
  This is a wonderful exercise.
</Exercise>

<Solution>
  This is a beautiful solution.
</Solution>

## Prerequisites

The following is an indicative list of mathematical concepts that will be helpful for reading the tutorial and completing the exercises.

-   **Linear algebra:** vectors, matrices, rank, orthogonal matrices, rank–nullity, positive definiteness, eigenvalues, spectral decomposition.
-   **Calculus:** partial derivatives, gradient, directional derivative, chain rule, Hessian, second-order Taylor expansion and remainder.
-   **Integration and analysis:** multivariate integrals, change of variables, volume in R^d, asymptotic notation (big-O, little-o), computing basic limits and integrals.
-   **Probability:** probability simplex, conditional probability, probability density functions, independence, expectation, Bayes' rule, Gaussians, law of large numbers.

Resources for most of these prerequisites are listed in either the [ARENA Prerequisites: Core concepts](https://learn.arena.education/chapter0_fundamentals/00_prereqs/1-core-concepts-knowledge/) or the prerequisites module.

We also require some specific terminology and notation regarding deep learning, optimisation, and statistical inference:

-   **Deep learning:** parametric function approximation or statistical inference (parameter–function maps, deep linear networks, multi-layer perceptrons, loss functions, likelihood).
-   **Bayesian statistics:** prior, posterior, partition function, Bayesian free energy.

These topics are also mentioned in the prerequisites module or the ARENA materials. The lecture notes for this module also review everything we need, and introduce our notation, in Section 1 ("Preliminaries").

<Callout type="note">
  Readers with more advanced backgrounds may appreciate occasional references throughout the lecture notes on topics from algebraic geometry, fractal geometry, or statistical physics. Readers without these backgrounds can safely skip them.
</Callout>

## Content

### Fast track

To paraphrase Euclid, there is no royal road to algebro-geometric learning theory. However, it is possible to get a bird's-eye view and the most important intuitions in a comparably short time. Assuming you are already somewhat comfortable with deep learning and Bayesian inference, skip Section 1 of the lecture notes, and refer back only as needed. Then, proceed as follows:

1.  **To understand parameter–function map versus loss landscape degeneracy:** Read Section 2.1 and complete Exercises 2.1 and 2.2. Complete either Exercise 2.9 or 2.10. Read Section 2.5 and complete your choice of Exercises 2.14 and/or 2.15.
2.  **To understand the local learning coefficient via volume scaling:** Read Section 3.1 and complete Exercises 3.1, 3.2, 3.4, and 3.7. Read Section 3.3 and Exercise 3.10.
3.  **To understand the relation between degeneracy and learning in the Bayesian case:** Read all of Section 4 (it is shorter). Complete Exercise 4.2.

### Main content

The main content for today centres around the lecture notes: [*'Degeneracy in Deep Learning — An invitation to singular learning theory (Pilot, Spring 2026)'*](/uploads/singular-learning-theory/Ogden_2026_Degeneracy_in_Deep_Learning_-_An_invitation_to_singular_learning_theory__Pilot__Spring_2026_.pdf) by Ogden et al. 2026, plus a framing lecture (slides: [*'Furman2026 Introduction to Singular Learning Theory.pdf'*](/uploads/singular-learning-theory/Furman2026_Introduction_to_Singular_Learning_Theory.pdf) / [Keynote](/uploads/singular-learning-theory/Furman2026_Introduction_to_Singular_Learning_Theory.key)).

### Learn more

See Section 5 of the lecture notes for a list of other introductory readings on singular learning theory and a survey of recent work on singular deep learning. [Timaeus](https://timaeus.co/learn), a non-profit research organisation working on SLT for alignment, maintains a list of introductory resources.