Commit Graph

  • 32a1460f62 Merge pull request #301 from indianspeedster/master master Andrej 2026-03-16 11:43:26 -07:00
  • 513fe6fcee add AMD ROCm fork to notable forks section indianspeedster 2026-03-16 11:28:48 -07:00
  • c2450add72 Guard against infinite loop when no training shards exist, fix README typo Andrej 2026-03-10 22:32:17 -07:00
  • 0be1e4fdf9 fix NaN loss not caught by fast-fail check Andrej 2026-03-10 22:31:43 -07:00
  • ebf357841b fix(train): make NaN fast-fail check explicit Contributor 2026-03-11 04:28:08 +00:00
  • 09ebea439d Guard against infinite loop when no training shards exist, fix README typo Hugh Brown 2026-03-10 21:34:40 -06:00
  • c12eef778e Include beginner's guide to neural networks Andrej 2026-03-09 16:00:55 -07:00
  • 7004de03bd hmmm agenthub Andrej Karpathy 2026-03-09 19:29:59 +00:00
  • b5ba8ac00d fix NaN loss not caught by fast-fail check haosenwang1018 2026-03-09 23:51:02 +08:00
  • 068d93da75 clarify that results.tsv should not be committed, leave untracked Andrej Karpathy 2026-03-09 05:11:07 +00:00
  • c92bee55eb some docs on what to play with to make autoresearch better on smaller computers Andrej Karpathy 2026-03-09 04:49:15 +00:00
  • 2224cd7cae reshuffle readme a bit and link to tiny stories for apple silicon guidance Andrej Karpathy 2026-03-08 23:25:53 +00:00
  • f16ece488f clarification to baseline run instruction, there was some language from a previous version that wasn't fully cleaned up Andrej Karpathy 2026-03-08 23:16:30 +00:00
  • 9264224a3c add notable fork mlx Andrej Karpathy 2026-03-08 17:06:29 +00:00
  • 6c087cb5e2 revert interrupted softcap experiment (restore softcap=15) exp/H100/mar8 autoresearch 2026-03-08 16:27:48 +00:00
  • fedfef398b add experiment results log (125 experiments, best val_bpb=0.969686) autoresearch 2026-03-08 16:22:01 +00:00
  • 216eeb8d6e softcap 15 to 17 autoresearch 2026-03-08 15:50:16 +00:00
  • 438a26e2c3 warmdown 0.7 to 0.75 autoresearch 2026-03-08 15:03:36 +00:00
  • b1d50048d9 embedding LR 0.8 to 0.9 (re-test with WD) autoresearch 2026-03-08 14:23:02 +00:00
  • 637f82f215 VE WD 0.002 to 0.003 autoresearch 2026-03-08 13:54:07 +00:00
  • 73c77cad75 VE WD 0.001 to 0.002 autoresearch 2026-03-08 13:48:25 +00:00
  • 1a85362b26 tiny VE WD 0.001 autoresearch 2026-03-08 13:42:34 +00:00
  • ece91016aa tiny embedding WD 0.001 autoresearch 2026-03-08 13:30:56 +00:00
  • 97dda8577f init scale 0.7 to 0.68 autoresearch 2026-03-08 12:44:03 +00:00
  • f5979a7464 init scale 0.7x autoresearch 2026-03-08 10:14:06 +00:00
  • 41d50a8539 reduce transformer init scale by 0.8x autoresearch 2026-03-08 10:02:16 +00:00
  • 264a05b48f add small WD 0.01 to lm_head (AdamW) autoresearch 2026-03-08 07:23:33 +00:00
  • 7f63c17076 unembedding LR 0.006 to 0.005 autoresearch 2026-03-08 07:05:48 +00:00
  • a7aa309dd4 muon momentum warmup 300 to 200 steps autoresearch 2026-03-08 05:29:56 +00:00
  • aa8f408f39 unembedding LR 0.004 to 0.006 autoresearch 2026-03-08 04:48:14 +00:00
  • 772dada6cc FINAL_LR_FRAC 0.0 to 0.05 (small LR floor) autoresearch 2026-03-08 04:36:38 +00:00
  • 0640555a1f x0_lambda init 0.1 to 0.05 autoresearch 2026-03-08 04:30:50 +00:00
  • 7d047e42f4 embedding LR 0.6 to 0.8 autoresearch 2026-03-08 04:19:15 +00:00
  • 59e9dd9aab RoPE base frequency 10K to 200K autoresearch 2026-03-08 04:13:30 +00:00
  • 7da0b673a1 short window 1/8 context (256 tokens instead of 1024) autoresearch 2026-03-08 04:07:44 +00:00
  • 8363d52e8d SSSSL window pattern (5:1 short:long ratio) autoresearch 2026-03-08 04:01:58 +00:00
  • 4e6697f68d warmdown 0.5 to 0.7 (more cooldown) autoresearch 2026-03-08 03:56:11 +00:00
  • 7f2a65c9a5 depth 9 aspect_ratio 57 (extra layer, dim ~512) autoresearch 2026-03-08 03:44:34 +00:00
  • bea057bc08 halve batch size 524K to 262K for more steps in 5 min autoresearch 2026-03-08 03:38:47 +00:00
  • 500114a035 Honor --download-workers instead of hardcoding 8 download workers Andrej 2026-03-07 14:17:45 -08:00
  • 7043095a18 add macos fork Andrej Karpathy 2026-03-07 22:15:52 +00:00
  • 777e443790 fix(prepare): honor --download-workers Dipesh Babu 2026-03-07 15:39:17 -05:00
  • 6fdefa7265 instruct the agent to also read README, should be good context Andrej Karpathy 2026-03-07 20:09:51 +00:00
  • b0d047425f clarify note on platforms Andrej Karpathy 2026-03-07 19:46:27 +00:00
  • 8a5c4869bd bunch of small changes to docs and files, and a teaser figure with a blooper :) Andrej Karpathy 2026-03-07 19:00:04 +00:00
  • 032d203695 minor tweaks, pin val shard Andrej Karpathy 2026-03-07 17:59:52 +00:00
  • 47ec1ade0a tweaks to docs for both humans and agents Andrej Karpathy 2026-03-07 17:02:43 +00:00
  • ada84e5247 soften the language just a bit Andrej Karpathy 2026-03-07 16:29:59 +00:00
  • bd75534494 Fix agent crash blindspot by forcing it to read traceback Andrej 2026-03-07 08:23:51 -08:00
  • bdf0c0d520 Allow agent to diagnose crashes by reading the python stack trace dumko2001 2026-03-07 14:46:43 +05:30
  • bb54287479 Merge pull request #2 from marcinbogdanski/fix/fa3-non-hopper-fallback Andrej 2026-03-06 21:59:49 -08:00
  • 17b480aa65 add fallback FA3 kernel for non-Hopper GPUs Marcin Bogdanski 2026-03-07 01:31:48 +00:00
  • 9c383a8c94 add analysis notebook for convenience Andrej Karpathy 2026-03-06 22:36:37 +00:00
  • 69eb7f9b99 cleanup more references to spawn.sh Andrej Karpathy 2026-03-06 22:36:20 +00:00
  • ae81d55904 remove spawn.sh reference from README (file was deleted) Andrej Karpathy 2026-03-06 22:08:45 +00:00
  • 4ab35a919b also ref twitter Andrej Karpathy 2026-03-06 22:06:12 +00:00
  • 1e207aaf21 dam, erase experimental file from before that snuck through in my purge Andrej Karpathy 2026-03-06 22:03:27 +00:00
  • 2a70301b10 small tweak readme Andrej Karpathy 2026-03-06 22:02:44 +00:00
  • b11d6f283f initial commit Andrej Karpathy 2026-03-06 21:58:52 +00:00