autoresearch

Author	SHA1	Message	Date
autoresearch	97dda8577f	init scale 0.7 to 0.68	2026-03-08 12:44:03 +00:00
autoresearch	f5979a7464	init scale 0.7x	2026-03-08 10:14:06 +00:00
autoresearch	41d50a8539	reduce transformer init scale by 0.8x	2026-03-08 10:02:16 +00:00
autoresearch	264a05b48f	add small WD 0.01 to lm_head (AdamW)	2026-03-08 07:23:33 +00:00
autoresearch	7f63c17076	unembedding LR 0.006 to 0.005	2026-03-08 07:05:48 +00:00
autoresearch	a7aa309dd4	muon momentum warmup 300 to 200 steps	2026-03-08 05:29:56 +00:00
autoresearch	aa8f408f39	unembedding LR 0.004 to 0.006	2026-03-08 04:48:14 +00:00
autoresearch	772dada6cc	FINAL_LR_FRAC 0.0 to 0.05 (small LR floor)	2026-03-08 04:36:38 +00:00
autoresearch	0640555a1f	x0_lambda init 0.1 to 0.05	2026-03-08 04:30:50 +00:00
autoresearch	7d047e42f4	embedding LR 0.6 to 0.8	2026-03-08 04:19:15 +00:00
autoresearch	59e9dd9aab	RoPE base frequency 10K to 200K	2026-03-08 04:13:30 +00:00
autoresearch	7da0b673a1	short window 1/8 context (256 tokens instead of 1024)	2026-03-08 04:07:44 +00:00
autoresearch	8363d52e8d	SSSSL window pattern (5:1 short:long ratio)	2026-03-08 04:01:58 +00:00
autoresearch	4e6697f68d	warmdown 0.5 to 0.7 (more cooldown)	2026-03-08 03:56:11 +00:00
autoresearch	7f2a65c9a5	depth 9 aspect_ratio 57 (extra layer, dim ~512)	2026-03-08 03:44:34 +00:00
autoresearch	bea057bc08	halve batch size 524K to 262K for more steps in 5 min	2026-03-08 03:38:47 +00:00
Andrej Karpathy	8a5c4869bd	bunch of small changes to docs and files, and a teaser figure with a blooper :)	2026-03-07 19:00:04 +00:00
Marcin Bogdanski	17b480aa65	add fallback FA3 kernel for non-Hopper GPUs	2026-03-07 01:31:48 +00:00
Andrej Karpathy	b11d6f283f	initial commit	2026-03-06 21:58:52 +00:00

19 Commits