8 Commits

Author SHA1 Message Date
Andrej 32a1460f62 Merge pull request #301 from indianspeedster/master
add AMD ROCm fork to notable forks section
2026-03-16 11:43:26 -07:00
indianspeedster 513fe6fcee add AMD ROCm fork to notable forks section 2026-03-16 11:28:48 -07:00
Andrej c2450add72 Guard against infinite loop when no training shards exist, fix README typo 2026-03-10 22:32:17 -07:00
Andrej 0be1e4fdf9 fix NaN loss not caught by fast-fail check 2026-03-10 22:31:43 -07:00
Contributor ebf357841b fix(train): make NaN fast-fail check explicit 2026-03-11 04:28:08 +00:00
Hugh Brown 09ebea439d Guard against infinite loop when no training shards exist, fix README typo
Add assertion after filtering val_path from parquet_paths for the "train"
split so an empty list fails fast instead of spinning in a silent infinite
loop. Also remove stray article "a" in README ("a three files" → "three
files").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 21:34:40 -06:00
Andrej c12eef778e Include beginner's guide to neural networks
Added a resource link for beginners in neural networks.
2026-03-09 16:00:55 -07:00
haosenwang1018 b5ba8ac00d fix NaN loss not caught by fast-fail check
`train_loss_f > 100` silently passes on NaN because IEEE 754 NaN
comparisons always return False. When an agent experiment produces
NaN (e.g. from an aggressive LR change), the run wastes the full
5-minute budget instead of failing fast.

`not (x <= 100)` catches both >100 and NaN with no added complexity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:51:02 +08:00
3 changed files with 8 additions and 3 deletions
+4 -1
View File
@@ -8,7 +8,7 @@ The idea: give an AI agent a small but real LLM training setup and let it experi
## How it works
The repo is deliberately kept small and only really has a three files that matter:
The repo is deliberately kept small and only really has three files that matter:
- **`prepare.py`** — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.
- **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. **This file is edited and iterated on by the agent**.
@@ -16,6 +16,8 @@ The repo is deliberately kept small and only really has a three files that matte
By design, training runs for a **fixed 5-minute time budget** (wall clock, excluding startup/compilation), regardless of the details of your compute. The metric is **val_bpb** (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared.
If you are new to neural networks, this ["Dummy's Guide"](https://x.com/hooeem/status/2030720614752039185) looks pretty good for a lot more context.
## Quick start
**Requirements:** A single NVIDIA GPU (tested on H100), Python 3.10+, [uv](https://docs.astral.sh/uv/).
@@ -83,6 +85,7 @@ I think these would be the reasonable hyperparameters to play with. Ask your fav
- [miolini/autoresearch-macos](https://github.com/miolini/autoresearch-macos) (MacOS)
- [trevin-creator/autoresearch-mlx](https://github.com/trevin-creator/autoresearch-mlx) (MacOS)
- [jsegov/autoresearch-win-rtx](https://github.com/jsegov/autoresearch-win-rtx) (Windows)
- [andyluo7/autoresearch](https://github.com/andyluo7/autoresearch) (AMD)
## License
+1
View File
@@ -258,6 +258,7 @@ def _document_batches(split, tokenizer_batch_size=128):
val_path = os.path.join(DATA_DIR, VAL_FILENAME)
if split == "train":
parquet_paths = [p for p in parquet_paths if p != val_path]
assert len(parquet_paths) > 0, "No training shards found."
else:
parquet_paths = [val_path]
epoch = 1
+3 -2
View File
@@ -9,6 +9,7 @@ os.environ["PYTORCH_ALLOC_CONF"] = "expandable_segments:True"
os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"
import gc
import math
import time
from dataclasses import dataclass, asdict
@@ -565,8 +566,8 @@ while True:
train_loss_f = train_loss.item()
# Fast fail: abort if loss is exploding
if train_loss_f > 100:
# Fast fail: abort if loss is exploding or NaN
if math.isnan(train_loss_f) or train_loss_f > 100:
print("FAIL")
exit(1)