autoresearch
|
59e9dd9aab
|
RoPE base frequency 10K to 200K
|
2026-03-08 04:13:30 +00:00 |
|
autoresearch
|
7da0b673a1
|
short window 1/8 context (256 tokens instead of 1024)
|
2026-03-08 04:07:44 +00:00 |
|
autoresearch
|
8363d52e8d
|
SSSSL window pattern (5:1 short:long ratio)
|
2026-03-08 04:01:58 +00:00 |
|
autoresearch
|
4e6697f68d
|
warmdown 0.5 to 0.7 (more cooldown)
|
2026-03-08 03:56:11 +00:00 |
|
autoresearch
|
7f2a65c9a5
|
depth 9 aspect_ratio 57 (extra layer, dim ~512)
|
2026-03-08 03:44:34 +00:00 |
|
autoresearch
|
bea057bc08
|
halve batch size 524K to 262K for more steps in 5 min
|
2026-03-08 03:38:47 +00:00 |
|
Andrej Karpathy
|
8a5c4869bd
|
bunch of small changes to docs and files, and a teaser figure with a blooper :)
|
2026-03-07 19:00:04 +00:00 |
|
Marcin Bogdanski
|
17b480aa65
|
add fallback FA3 kernel for non-Hopper GPUs
|
2026-03-07 01:31:48 +00:00 |
|
Andrej Karpathy
|
b11d6f283f
|
initial commit
|
2026-03-06 21:58:52 +00:00 |
|