Commit Graph

36 Commits

Author SHA1 Message Date
autoresearch 97dda8577f init scale 0.7 to 0.68 2026-03-08 12:44:03 +00:00
autoresearch f5979a7464 init scale 0.7x 2026-03-08 10:14:06 +00:00
autoresearch 41d50a8539 reduce transformer init scale by 0.8x 2026-03-08 10:02:16 +00:00
autoresearch 264a05b48f add small WD 0.01 to lm_head (AdamW) 2026-03-08 07:23:33 +00:00
autoresearch 7f63c17076 unembedding LR 0.006 to 0.005 2026-03-08 07:05:48 +00:00
autoresearch a7aa309dd4 muon momentum warmup 300 to 200 steps 2026-03-08 05:29:56 +00:00
autoresearch aa8f408f39 unembedding LR 0.004 to 0.006 2026-03-08 04:48:14 +00:00
autoresearch 772dada6cc FINAL_LR_FRAC 0.0 to 0.05 (small LR floor) 2026-03-08 04:36:38 +00:00
autoresearch 0640555a1f x0_lambda init 0.1 to 0.05 2026-03-08 04:30:50 +00:00
autoresearch 7d047e42f4 embedding LR 0.6 to 0.8 2026-03-08 04:19:15 +00:00
autoresearch 59e9dd9aab RoPE base frequency 10K to 200K 2026-03-08 04:13:30 +00:00
autoresearch 7da0b673a1 short window 1/8 context (256 tokens instead of 1024) 2026-03-08 04:07:44 +00:00
autoresearch 8363d52e8d SSSSL window pattern (5:1 short:long ratio) 2026-03-08 04:01:58 +00:00
autoresearch 4e6697f68d warmdown 0.5 to 0.7 (more cooldown) 2026-03-08 03:56:11 +00:00
autoresearch 7f2a65c9a5 depth 9 aspect_ratio 57 (extra layer, dim ~512) 2026-03-08 03:44:34 +00:00
autoresearch bea057bc08 halve batch size 524K to 262K for more steps in 5 min 2026-03-08 03:38:47 +00:00
Andrej 500114a035 Honor --download-workers instead of hardcoding 8 download workers 2026-03-07 14:17:45 -08:00
Andrej Karpathy 7043095a18 add macos fork 2026-03-07 22:15:52 +00:00
Dipesh Babu 777e443790 fix(prepare): honor --download-workers 2026-03-07 15:39:17 -05:00
Andrej Karpathy 6fdefa7265 instruct the agent to also read README, should be good context 2026-03-07 20:09:51 +00:00
Andrej Karpathy b0d047425f clarify note on platforms 2026-03-07 19:46:27 +00:00
Andrej Karpathy 8a5c4869bd bunch of small changes to docs and files, and a teaser figure with a blooper :) 2026-03-07 19:00:04 +00:00
Andrej Karpathy 032d203695 minor tweaks, pin val shard 2026-03-07 17:59:52 +00:00
Andrej Karpathy 47ec1ade0a tweaks to docs for both humans and agents 2026-03-07 17:02:43 +00:00
Andrej Karpathy ada84e5247 soften the language just a bit 2026-03-07 16:29:59 +00:00
Andrej bd75534494 Fix agent crash blindspot by forcing it to read traceback 2026-03-07 08:23:51 -08:00
dumko2001 bdf0c0d520 Allow agent to diagnose crashes by reading the python stack trace 2026-03-07 14:46:43 +05:30
Andrej bb54287479 Merge pull request #2 from marcinbogdanski/fix/fa3-non-hopper-fallback
add fallback FA3 kernel for non-Hopper GPUs
2026-03-06 21:59:49 -08:00
Marcin Bogdanski 17b480aa65 add fallback FA3 kernel for non-Hopper GPUs 2026-03-07 01:31:48 +00:00
Andrej Karpathy 9c383a8c94 add analysis notebook for convenience 2026-03-06 22:36:37 +00:00
Andrej Karpathy 69eb7f9b99 cleanup more references to spawn.sh 2026-03-06 22:36:20 +00:00
Andrej Karpathy ae81d55904 remove spawn.sh reference from README (file was deleted)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:08:45 +00:00
Andrej Karpathy 4ab35a919b also ref twitter 2026-03-06 22:06:12 +00:00
Andrej Karpathy 1e207aaf21 dam, erase experimental file from before that snuck through in my purge 2026-03-06 22:03:27 +00:00
Andrej Karpathy 2a70301b10 small tweak readme 2026-03-06 22:02:44 +00:00
Andrej Karpathy b11d6f283f initial commit 2026-03-06 21:58:52 +00:00