Files
autoresearch/README.md
T
Andrej Karpathy b11d6f283f initial commit
2026-03-06 21:58:52 +00:00

62 lines
3.1 KiB
Markdown

# autoresearch
Autonomous LLM pretraining research, driven by AI agents.
The idea: give an AI agent a small but real LLM training setup and let it run experiments overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model.
This particular implementation is trying to be the least fancy baseline, but it's clear how one would adjust the `program.md` file to run more sophisticated research programs with more elaborate instructions. For example, the agent can actively do little experiments on research while the job is running.
## How it works
The repo is deliberately small and only has a few files:
- **`constants.py`** — fixed rules: sequence length, time budget, eval tokens. Not modified.
- **`prepare.py`** — one-time data prep (downloads training data, trains a BPE tokenizer) and runtime utilities (dataloader, evaluation). Not modified.
- **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc.
- **`program.md`** — instructions for the agent. Point your agent here and let it go.
Training runs for a **fixed 5-minute time budget** (wall clock, excluding startup/compilation). The metric is **val_bpb** (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared.
## Quick start
**Requirements:** A single NVIDIA GPU (tested on H100), Python 3.10+, [uv](https://docs.astral.sh/uv/).
```bash
# 1. Install dependencies
uv sync
# 2. Download data and train tokenizer (one-time, ~5 min)
uv run prepare.py
# 3. Run a single training experiment (5 min + startup)
uv run train.py
```
## Running the agent
Simply spin up your Claude/Codex or whatever you want in this repo, then you can something like:
```
Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.
```
The `program.md` file is essentially a super lightweight "skill".
## Project structure
```
constants.py — fixed constants (do not modify)
prepare.py — data prep + runtime utilities (do not modify)
train.py — model, optimizer, training loop (agent modifies this)
program.md — agent instructions
spawn.sh — multi-agent launcher
pyproject.toml — dependencies
```
## Design choices
- **Single file to modify.** The agent only touches `train.py`. This keeps the scope manageable and diffs reviewable.
- **Fixed time budget.** Training always runs for exactly 5 minutes. This makes experiments directly comparable regardless of what the agent changes (model size, batch size, architecture, etc).
- **BPB metric.** Bits per byte is independent of tokenizer vocabulary size, so the agent could in principle change the vocab size and still get a fair comparison.
- **Self-contained.** No external dependencies beyond PyTorch and a few small packages. No distributed training, no complex configs. One GPU, one file, one metric.