Intro to ML Engineering
Hands-on ML engineering for AI safety experiments — PyTorch tensors, training loops, optimizers, model architectures, LLM lifecycle, and the practical tooling (Runpod, W&B) around empirical research.
By Julian Schulz (Meridian Research), Adam Newgas (Timaeus)
What you’ll learn
- Minimal: Be able to follow the rest of the intensive course content and complete its exercises; understand the considerations and constraints of running empirical AI safety experiments when working with an ML engineer.
- Use PyTorch and tensor operations.
- Write training loops and reason about hyperparameters and optimizers (including WandB).
- Understand basic concepts in model architectures.
- Ambitious: Run your own ML experiments with AI assistance and have a working command of the common tools/frameworks.
- Run hyperparameter optimisations.
- Implement model architectures in PyTorch.
- Rent external compute (via Runpod) and run an experiment on it.
Overview
This module equips participants with the practical skills needed to follow the rest of the intensive and run their own empirical experiments. Split into two parallel morning tracks by prior experience, participants learn the essentials of training neural networks in PyTorch: tensor operations, training loops, autograd, optimizers, loss functions, and core architectures like CNNs and transformers, alongside the engineering realities of empirical work, including renting GPUs via Runpod, tracking experiments with WandB, lightweight experimentation through fine-tuning APIs and Tinker, and managing datasets and checkpoints on Huggingface. The afternoon turns to LLMs specifically: tokenization, the transformer architecture, and the full lifecycle of a modern language model from pretraining and scaling laws to RLHF and reasoning training, with hands-on exercises implementing an MLP and attention step by step.
Julian Schulz was responsible for Lecture 1a and 2. Slides of Lecture 2 are taken from ML4good. Adam Newgas was responsible for Lecture 1b
Prerequisites
-
You should have some knowledge of coding, and be prepared to lean heavily on AI assistance.
-
Bring your own laptop. You may wish to run through some of the setup instructions before the course.
-
Have some AI coding agent. Options include Cursor (free), Claude Code, or Codex.
-
You should know linear algebra, understand gradient descent.
Content
There are two parallel lectures/courses in the morning until lunch, separated by prior knowledge, and a single lecture in the afternoon.
Lecture 1a — ML foundations
-
Training loop — forward/backward pass, gradient descent, stochastic batching, optimizer role
-
PyTorch tensors — basic operations, einops, batching & data loading, autograd/computational graph, devices & GPU
-
Loss functions — classification losses, RL/human-rater losses, sparsity, train vs. test loss, over/underfitting
-
Parameters, activations & hyperparameters — terminology + optimizers (momentum, RMSProp)
-
Architectures — activation functions, universal approximation, over/underparameterization, symmetries in architecture design, CNNs, transformers, residual streams
-
Hyperparameter optimization — sweeps, scaling laws
Short exercises in between about:
Lecture 1b — practical ML
Short opener about the changing dividing lines of research, engineering, and agents. Modular selection of self-study/exercise topics from github.com/BorisTheBrave/illiad-ml-eng-track-b/blob/main/docs/:
Lecture 2 — LLMs
-
Tokenization — autoregressive generation
-
Transformer Architecture
-
Lifecycle of an LLM: Pretraining; Scaling laws / compute-optimal training; RLHF / constitutional training; RLVR / reasoning training
Slides: Lecture about Architecture and Lifecycle
Exercises:
Learn more
-
ARENA Week 0 contains more exercises to try out.
-
PyTorch tutorial — this explains the code in the setup exercise.
-
Most tools and libraries come with extensive docs that are quite readable: Python; PyTorch; TRL; Tinker; datasets; transformerlens; Inspect; git.
-
Deeper on the engineering side: Parallelism (ARENA, types of parallelism); Performance; Architecture; CS Lecture notes (e.g. big-O complexity).