Iliad Intensive Curriculum

Overview

This module equips participants with the practical skills needed to follow the rest of the intensive and run their own empirical experiments. Split into two parallel morning tracks by prior experience, participants learn the essentials of training neural networks in PyTorch: tensor operations, training loops, autograd, optimizers, loss functions, and core architectures like CNNs and transformers, alongside the engineering realities of empirical work, including renting GPUs via Runpod, tracking experiments with WandB, lightweight experimentation through fine-tuning APIs and Tinker, and managing datasets and checkpoints on Huggingface. The afternoon turns to LLMs specifically: tokenization, the transformer architecture, and the full lifecycle of a modern language model from pretraining and scaling laws to RLHF and reasoning training, with hands-on exercises implementing an MLP and attention step by step.

Julian Schulz was responsible for Lecture 1a and 2. Slides of Lecture 2 are taken from ML4good. Adam Newgas was responsible for Lecture 1b

Prerequisites

You should have some knowledge of coding, and be prepared to lean heavily on AI assistance.
Bring your own laptop. You may wish to run through some of the setup instructions before the course.
Have some AI coding agent. Options include Cursor (free), Claude Code, or Codex.
You should know linear algebra, understand gradient descent.

Content

There are two parallel lectures/courses in the morning until lunch, separated by prior knowledge, and a single lecture in the afternoon.

Course Selection Quiz

Lecture 1a — ML foundations

Training loop — forward/backward pass, gradient descent, stochastic batching, optimizer role
PyTorch tensors — basic operations, einops, batching & data loading, autograd/computational graph, devices & GPU
Loss functions — classification losses, RL/human-rater losses, sparsity, train vs. test loss, over/underfitting
Parameters, activations & hyperparameters — terminology + optimizers (momentum, RMSProp)
Architectures — activation functions, universal approximation, over/underparameterization, symmetries in architecture design, CNNs, transformers, residual streams
Hyperparameter optimization — sweeps, scaling laws

Slides

Short exercises in between about:

Lecture 1b — practical ML

Short opener about the changing dividing lines of research, engineering, and agents. Modular selection of self-study/exercise topics from github.com/BorisTheBrave/illiad-ml-eng-track-b/blob/main/docs/:

Lecture 2 — LLMs

Tokenization — autoregressive generation
Transformer Architecture
Lifecycle of an LLM: Pretraining; Scaling laws / compute-optimal training; RLHF / constitutional training; RLVR / reasoning training

Slides: Lecture about Architecture and Lifecycle

Exercises:

Learn more

RLHF Exercise
ARENA Week 0 contains more exercises to try out.
Tips and Code for Empirical Research Workflows
PyTorch tutorial — this explains the code in the setup exercise.
Most tools and libraries come with extensive docs that are quite readable: Python; PyTorch; TRL; Tinker; datasets; transformerlens; Inspect; git.
Deeper on the engineering side: Parallelism (ARENA, types of parallelism); Performance; Architecture; CS Lecture notes (e.g. big-O complexity).

Intro to ML Engineering

What you’ll learn