autoresearch
|
ece91016aa
|
tiny embedding WD 0.001
|
2026-03-08 13:30:56 +00:00 |
|
autoresearch
|
97dda8577f
|
init scale 0.7 to 0.68
|
2026-03-08 12:44:03 +00:00 |
|
autoresearch
|
f5979a7464
|
init scale 0.7x
|
2026-03-08 10:14:06 +00:00 |
|
autoresearch
|
41d50a8539
|
reduce transformer init scale by 0.8x
|
2026-03-08 10:02:16 +00:00 |
|
autoresearch
|
264a05b48f
|
add small WD 0.01 to lm_head (AdamW)
|
2026-03-08 07:23:33 +00:00 |
|
autoresearch
|
7f63c17076
|
unembedding LR 0.006 to 0.005
|
2026-03-08 07:05:48 +00:00 |
|
autoresearch
|
a7aa309dd4
|
muon momentum warmup 300 to 200 steps
|
2026-03-08 05:29:56 +00:00 |
|
autoresearch
|
aa8f408f39
|
unembedding LR 0.004 to 0.006
|
2026-03-08 04:48:14 +00:00 |
|
autoresearch
|
772dada6cc
|
FINAL_LR_FRAC 0.0 to 0.05 (small LR floor)
|
2026-03-08 04:36:38 +00:00 |
|
autoresearch
|
0640555a1f
|
x0_lambda init 0.1 to 0.05
|
2026-03-08 04:30:50 +00:00 |
|
autoresearch
|
7d047e42f4
|
embedding LR 0.6 to 0.8
|
2026-03-08 04:19:15 +00:00 |
|
autoresearch
|
59e9dd9aab
|
RoPE base frequency 10K to 200K
|
2026-03-08 04:13:30 +00:00 |
|
autoresearch
|
7da0b673a1
|
short window 1/8 context (256 tokens instead of 1024)
|
2026-03-08 04:07:44 +00:00 |
|
autoresearch
|
8363d52e8d
|
SSSSL window pattern (5:1 short:long ratio)
|
2026-03-08 04:01:58 +00:00 |
|
autoresearch
|
4e6697f68d
|
warmdown 0.5 to 0.7 (more cooldown)
|
2026-03-08 03:56:11 +00:00 |
|
autoresearch
|
7f2a65c9a5
|
depth 9 aspect_ratio 57 (extra layer, dim ~512)
|
2026-03-08 03:44:34 +00:00 |
|
autoresearch
|
bea057bc08
|
halve batch size 524K to 262K for more steps in 5 min
|
2026-03-08 03:38:47 +00:00 |
|
Andrej
|
500114a035
|
Honor --download-workers instead of hardcoding 8 download workers
|
2026-03-07 14:17:45 -08:00 |
|
Andrej Karpathy
|
7043095a18
|
add macos fork
|
2026-03-07 22:15:52 +00:00 |
|
Dipesh Babu
|
777e443790
|
fix(prepare): honor --download-workers
|
2026-03-07 15:39:17 -05:00 |
|
Andrej Karpathy
|
6fdefa7265
|
instruct the agent to also read README, should be good context
|
2026-03-07 20:09:51 +00:00 |
|
Andrej Karpathy
|
b0d047425f
|
clarify note on platforms
|
2026-03-07 19:46:27 +00:00 |
|
Andrej Karpathy
|
8a5c4869bd
|
bunch of small changes to docs and files, and a teaser figure with a blooper :)
|
2026-03-07 19:00:04 +00:00 |
|
Andrej Karpathy
|
032d203695
|
minor tweaks, pin val shard
|
2026-03-07 17:59:52 +00:00 |
|
Andrej Karpathy
|
47ec1ade0a
|
tweaks to docs for both humans and agents
|
2026-03-07 17:02:43 +00:00 |
|
Andrej Karpathy
|
ada84e5247
|
soften the language just a bit
|
2026-03-07 16:29:59 +00:00 |
|
Andrej
|
bd75534494
|
Fix agent crash blindspot by forcing it to read traceback
|
2026-03-07 08:23:51 -08:00 |
|
dumko2001
|
bdf0c0d520
|
Allow agent to diagnose crashes by reading the python stack trace
|
2026-03-07 14:46:43 +05:30 |
|
Andrej
|
bb54287479
|
Merge pull request #2 from marcinbogdanski/fix/fa3-non-hopper-fallback
add fallback FA3 kernel for non-Hopper GPUs
|
2026-03-06 21:59:49 -08:00 |
|
Marcin Bogdanski
|
17b480aa65
|
add fallback FA3 kernel for non-Hopper GPUs
|
2026-03-07 01:31:48 +00:00 |
|
Andrej Karpathy
|
9c383a8c94
|
add analysis notebook for convenience
|
2026-03-06 22:36:37 +00:00 |
|
Andrej Karpathy
|
69eb7f9b99
|
cleanup more references to spawn.sh
|
2026-03-06 22:36:20 +00:00 |
|
Andrej Karpathy
|
ae81d55904
|
remove spawn.sh reference from README (file was deleted)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 22:08:45 +00:00 |
|
Andrej Karpathy
|
4ab35a919b
|
also ref twitter
|
2026-03-06 22:06:12 +00:00 |
|
Andrej Karpathy
|
1e207aaf21
|
dam, erase experimental file from before that snuck through in my purge
|
2026-03-06 22:03:27 +00:00 |
|
Andrej Karpathy
|
2a70301b10
|
small tweak readme
|
2026-03-06 22:02:44 +00:00 |
|
Andrej Karpathy
|
b11d6f283f
|
initial commit
|
2026-03-06 21:58:52 +00:00 |
|