This website requires JavaScript.
32a1460f62
Merge pull request #301 from indianspeedster/master
master
Andrej
2026-03-16 11:43:26 -07:00
513fe6fcee
add AMD ROCm fork to notable forks section
indianspeedster
2026-03-16 11:28:48 -07:00
c2450add72
Guard against infinite loop when no training shards exist, fix README typo
Andrej
2026-03-10 22:32:17 -07:00
0be1e4fdf9
fix NaN loss not caught by fast-fail check
Andrej
2026-03-10 22:31:43 -07:00
ebf357841b
fix(train): make NaN fast-fail check explicit
Contributor
2026-03-11 04:28:08 +00:00
09ebea439d
Guard against infinite loop when no training shards exist, fix README typo
Hugh Brown
2026-03-10 21:34:40 -06:00
c12eef778e
Include beginner's guide to neural networks
Andrej
2026-03-09 16:00:55 -07:00
7004de03bd
hmmm
agenthub
Andrej Karpathy
2026-03-09 19:29:59 +00:00
b5ba8ac00d
fix NaN loss not caught by fast-fail check
haosenwang1018
2026-03-09 23:51:02 +08:00
068d93da75
clarify that results.tsv should not be committed, leave untracked
Andrej Karpathy
2026-03-09 05:11:07 +00:00
c92bee55eb
some docs on what to play with to make autoresearch better on smaller computers
Andrej Karpathy
2026-03-09 04:49:15 +00:00
2224cd7cae
reshuffle readme a bit and link to tiny stories for apple silicon guidance
Andrej Karpathy
2026-03-08 23:25:53 +00:00
f16ece488f
clarification to baseline run instruction, there was some language from a previous version that wasn't fully cleaned up
Andrej Karpathy
2026-03-08 23:16:30 +00:00
9264224a3c
add notable fork mlx
Andrej Karpathy
2026-03-08 17:06:29 +00:00
6c087cb5e2
revert interrupted softcap experiment (restore softcap=15)
exp/H100/mar8
autoresearch
2026-03-08 16:27:48 +00:00
fedfef398b
add experiment results log (125 experiments, best val_bpb=0.969686)
autoresearch
2026-03-08 16:22:01 +00:00
216eeb8d6e
softcap 15 to 17
autoresearch
2026-03-08 15:50:16 +00:00
438a26e2c3
warmdown 0.7 to 0.75
autoresearch
2026-03-08 15:03:36 +00:00
b1d50048d9
embedding LR 0.8 to 0.9 (re-test with WD)
autoresearch
2026-03-08 14:23:02 +00:00
637f82f215
VE WD 0.002 to 0.003
autoresearch
2026-03-08 13:54:07 +00:00
73c77cad75
VE WD 0.001 to 0.002
autoresearch
2026-03-08 13:48:25 +00:00
1a85362b26
tiny VE WD 0.001
autoresearch
2026-03-08 13:42:34 +00:00
ece91016aa
tiny embedding WD 0.001
autoresearch
2026-03-08 13:30:56 +00:00
97dda8577f
init scale 0.7 to 0.68
autoresearch
2026-03-08 12:44:03 +00:00
f5979a7464
init scale 0.7x
autoresearch
2026-03-08 10:14:06 +00:00
41d50a8539
reduce transformer init scale by 0.8x
autoresearch
2026-03-08 10:02:16 +00:00
264a05b48f
add small WD 0.01 to lm_head (AdamW)
autoresearch
2026-03-08 07:23:33 +00:00
7f63c17076
unembedding LR 0.006 to 0.005
autoresearch
2026-03-08 07:05:48 +00:00
a7aa309dd4
muon momentum warmup 300 to 200 steps
autoresearch
2026-03-08 05:29:56 +00:00
aa8f408f39
unembedding LR 0.004 to 0.006
autoresearch
2026-03-08 04:48:14 +00:00
772dada6cc
FINAL_LR_FRAC 0.0 to 0.05 (small LR floor)
autoresearch
2026-03-08 04:36:38 +00:00
0640555a1f
x0_lambda init 0.1 to 0.05
autoresearch
2026-03-08 04:30:50 +00:00
7d047e42f4
embedding LR 0.6 to 0.8
autoresearch
2026-03-08 04:19:15 +00:00
59e9dd9aab
RoPE base frequency 10K to 200K
autoresearch
2026-03-08 04:13:30 +00:00
7da0b673a1
short window 1/8 context (256 tokens instead of 1024)
autoresearch
2026-03-08 04:07:44 +00:00
8363d52e8d
SSSSL window pattern (5:1 short:long ratio)
autoresearch
2026-03-08 04:01:58 +00:00
4e6697f68d
warmdown 0.5 to 0.7 (more cooldown)
autoresearch
2026-03-08 03:56:11 +00:00
7f2a65c9a5
depth 9 aspect_ratio 57 (extra layer, dim ~512)
autoresearch
2026-03-08 03:44:34 +00:00
bea057bc08
halve batch size 524K to 262K for more steps in 5 min
autoresearch
2026-03-08 03:38:47 +00:00
500114a035
Honor --download-workers instead of hardcoding 8 download workers
Andrej
2026-03-07 14:17:45 -08:00
7043095a18
add macos fork
Andrej Karpathy
2026-03-07 22:15:52 +00:00
777e443790
fix(prepare): honor --download-workers
Dipesh Babu
2026-03-07 15:39:17 -05:00
6fdefa7265
instruct the agent to also read README, should be good context
Andrej Karpathy
2026-03-07 20:09:51 +00:00
b0d047425f
clarify note on platforms
Andrej Karpathy
2026-03-07 19:46:27 +00:00
8a5c4869bd
bunch of small changes to docs and files, and a teaser figure with a blooper :)
Andrej Karpathy
2026-03-07 19:00:04 +00:00
032d203695
minor tweaks, pin val shard
Andrej Karpathy
2026-03-07 17:59:52 +00:00
47ec1ade0a
tweaks to docs for both humans and agents
Andrej Karpathy
2026-03-07 17:02:43 +00:00
ada84e5247
soften the language just a bit
Andrej Karpathy
2026-03-07 16:29:59 +00:00
bd75534494
Fix agent crash blindspot by forcing it to read traceback
Andrej
2026-03-07 08:23:51 -08:00
bdf0c0d520
Allow agent to diagnose crashes by reading the python stack trace
dumko2001
2026-03-07 14:46:43 +05:30
bb54287479
Merge pull request #2 from marcinbogdanski/fix/fa3-non-hopper-fallback
Andrej
2026-03-06 21:59:49 -08:00
17b480aa65
add fallback FA3 kernel for non-Hopper GPUs
Marcin Bogdanski
2026-03-07 01:31:48 +00:00
9c383a8c94
add analysis notebook for convenience
Andrej Karpathy
2026-03-06 22:36:37 +00:00
69eb7f9b99
cleanup more references to spawn.sh
Andrej Karpathy
2026-03-06 22:36:20 +00:00
ae81d55904
remove spawn.sh reference from README (file was deleted)
Andrej Karpathy
2026-03-06 22:08:45 +00:00
4ab35a919b
also ref twitter
Andrej Karpathy
2026-03-06 22:06:12 +00:00
1e207aaf21
dam, erase experimental file from before that snuck through in my purge
Andrej Karpathy
2026-03-06 22:03:27 +00:00
2a70301b10
small tweak readme
Andrej Karpathy
2026-03-06 22:02:44 +00:00
b11d6f283f
initial commit
Andrej Karpathy
2026-03-06 21:58:52 +00:00