Merge pull request #301 from indianspeedster/master

add AMD ROCm fork to notable forks section
2026-03-16 11:43:26 -07:00 · 2026-03-16 11:28:48 -07:00 · 2026-03-10 22:32:17 -07:00 · 2026-03-10 22:31:43 -07:00 · 2026-03-11 04:28:08 +00:00 · 2026-03-10 21:34:40 -06:00
3 changed files with 8 additions and 3 deletions
@@ -8,7 +8,7 @@ The idea: give an AI agent a small but real LLM training setup and let it experi

 ## How it works

-The repo is deliberately kept small and only really has a three files that matter:
+The repo is deliberately kept small and only really has three files that matter:

 - **`prepare.py`** — fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). Not modified.
 - **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. **This file is edited and iterated on by the agent**.
@@ -16,6 +16,8 @@ The repo is deliberately kept small and only really has a three files that matte

 By design, training runs for a **fixed 5-minute time budget** (wall clock, excluding startup/compilation), regardless of the details of your compute. The metric is **val_bpb** (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared.

+If you are new to neural networks, this ["Dummy's Guide"](https://x.com/hooeem/status/2030720614752039185) looks pretty good for a lot more context.
+
 ## Quick start

 **Requirements:** A single NVIDIA GPU (tested on H100), Python 3.10+, [uv](https://docs.astral.sh/uv/).
@@ -83,6 +85,7 @@ I think these would be the reasonable hyperparameters to play with. Ask your fav
 - [miolini/autoresearch-macos](https://github.com/miolini/autoresearch-macos) (MacOS)
 - [trevin-creator/autoresearch-mlx](https://github.com/trevin-creator/autoresearch-mlx) (MacOS)
 - [jsegov/autoresearch-win-rtx](https://github.com/jsegov/autoresearch-win-rtx) (Windows)
+- [andyluo7/autoresearch](https://github.com/andyluo7/autoresearch) (AMD)

 ## License

@@ -258,6 +258,7 @@ def _document_batches(split, tokenizer_batch_size=128):
    val_path = os.path.join(DATA_DIR, VAL_FILENAME)
    if split == "train":
        parquet_paths = [p for p in parquet_paths if p != val_path]
+        assert len(parquet_paths) > 0, "No training shards found."
    else:
        parquet_paths = [val_path]
    epoch = 1
@@ -9,6 +9,7 @@ os.environ["PYTORCH_ALLOC_CONF"] = "expandable_segments:True"
 os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"

 import gc
+import math
 import time
 from dataclasses import dataclass, asdict

@@ -565,8 +566,8 @@ while True:

    train_loss_f = train_loss.item()

-    # Fast fail: abort if loss is exploding
-    if train_loss_f > 100:
+    # Fast fail: abort if loss is exploding or NaN
+    if math.isnan(train_loss_f) or train_loss_f > 100:
        print("FAIL")
        exit(1)
Author	SHA1	Message	Date
Andrej	32a1460f62	Merge pull request #301 from indianspeedster/master add AMD ROCm fork to notable forks section	2026-03-16 11:43:26 -07:00
indianspeedster	513fe6fcee	add AMD ROCm fork to notable forks section	2026-03-16 11:28:48 -07:00
Andrej	c2450add72	Guard against infinite loop when no training shards exist, fix README typo	2026-03-10 22:32:17 -07:00
Andrej	0be1e4fdf9	fix NaN loss not caught by fast-fail check	2026-03-10 22:31:43 -07:00
Contributor	ebf357841b	fix(train): make NaN fast-fail check explicit	2026-03-11 04:28:08 +00:00
Hugh Brown	09ebea439d	Guard against infinite loop when no training shards exist, fix README typo Add assertion after filtering val_path from parquet_paths for the "train" split so an empty list fails fast instead of spinning in a silent infinite loop. Also remove stray article "a" in README ("a three files" → "three files"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 21:34:40 -06:00
Andrej	c12eef778e	Include beginner's guide to neural networks Added a resource link for beginners in neural networks.	2026-03-09 16:00:55 -07:00
haosenwang1018	b5ba8ac00d	fix NaN loss not caught by fast-fail check `train_loss_f > 100` silently passes on NaN because IEEE 754 NaN comparisons always return False. When an agent experiment produces NaN (e.g. from an aggressive LR change), the run wastes the full 5-minute budget instead of failing fast. `not (x <= 100)` catches both >100 and NaN with no added complexity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 23:51:02 +08:00