diff --git a/README.md b/README.md index 8459259..6f21194 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,8 @@ The repo is deliberately kept small and only really has a three files that matte By design, training runs for a **fixed 5-minute time budget** (wall clock, excluding startup/compilation), regardless of the details of your compute. The metric is **val_bpb** (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared. +If you are new to neural networks, this ["Dummy's Guide"](https://x.com/hooeem/status/2030720614752039185) looks pretty good for a lot more context. + ## Quick start **Requirements:** A single NVIDIA GPU (tested on H100), Python 3.10+, [uv](https://docs.astral.sh/uv/).