autoresearch

MediaMetz/autoresearch

Fork 0

Commit Graph

Select branches

Hide Pull Requests

agenthub

exp/H100/mar8

master

#1

#10

#101

#102

#102

#103

#104

#106

#107

#109

#11

#110

#110

#111

#112

#112

#114

#114

#115

#116

#116

#117

#117

#12

#12

#123

#123

#129

#13

#130

#130

#132

#133

#134

#136

#138

#138

#139

#14

#14

#141

#142

#142

#143

#143

#145

#145

#147

#148

#149

#15

#151

#151

#152

#153

#153

#154

#154

#157

#158

#159

#16

#16

#160

#160

#161

#161

#162

#166

#166

#167

#17

#170

#171

#173

#174

#175

#175

#176

#176

#177

#177

#18

#182

#182

#185

#185

#186

#186

#187

#187

#188

#188

#189

#189

#190

#190

#191

#191

#192

#193

#193

#194

#196

#198

#199

#199

#2

#20

#201

#201

#202

#202

#203

#204

#204

#205

#205

#207

#207

#21

#211

#213

#214

#214

#216

#216

#217

#217

#218

#218

#219

#223

#223

#227

#227

#228

#23

#23

#230

#230

#231

#232

#235

#236

#237

#237

#238

#24

#241

#241

#242

#244

#244

#245

#246

#247

#248

#248

#25

#252

#252

#253

#253

#254

#255

#256

#257

#258

#258

#259

#26

#26

#260

#263

#263

#264

#264

#265

#265

#268

#268

#269

#27

#27

#270

#272

#272

#273

#274

#276

#276

#277

#279

#279

#28

#28

#280

#282

#282

#283

#286

#287

#287

#289

#290

#290

#291

#291

#297

#299

#299

#30

#300

#301

#302

#302

#303

#303

#305

#306

#306

#31

#310

#310

#312

#313

#33

#33

#34

#39

#40

#44

#45

#48

#51

#52

#53

#53

#56

#58

#59

#62

#62

#63

#63

#65

#70

#70

#71

#73

#73

#75

#76

#77

#77

#78

#78

#79

#79

#80

#80

#82

#82

#83

#83

#84

#86

#9

#90

#90

#91

#92

#92

#93

#93

#94

#94

#95

#95

#96

#97

32a1460f62 Merge pull request #301 from indianspeedster/master master Andrej 2026-03-16 11:43:26 -07:00
513fe6fcee add AMD ROCm fork to notable forks section indianspeedster 2026-03-16 11:28:48 -07:00
c2450add72 Guard against infinite loop when no training shards exist, fix README typo Andrej 2026-03-10 22:32:17 -07:00
0be1e4fdf9 fix NaN loss not caught by fast-fail check Andrej 2026-03-10 22:31:43 -07:00
ebf357841b fix(train): make NaN fast-fail check explicit Contributor 2026-03-11 04:28:08 +00:00
09ebea439d Guard against infinite loop when no training shards exist, fix README typo Hugh Brown 2026-03-10 21:34:40 -06:00
c12eef778e Include beginner's guide to neural networks Andrej 2026-03-09 16:00:55 -07:00
7004de03bd hmmm agenthub Andrej Karpathy 2026-03-09 19:29:59 +00:00
b5ba8ac00d fix NaN loss not caught by fast-fail check haosenwang1018 2026-03-09 23:51:02 +08:00
068d93da75 clarify that results.tsv should not be committed, leave untracked Andrej Karpathy 2026-03-09 05:11:07 +00:00
c92bee55eb some docs on what to play with to make autoresearch better on smaller computers Andrej Karpathy 2026-03-09 04:49:15 +00:00
2224cd7cae reshuffle readme a bit and link to tiny stories for apple silicon guidance Andrej Karpathy 2026-03-08 23:25:53 +00:00
f16ece488f clarification to baseline run instruction, there was some language from a previous version that wasn't fully cleaned up Andrej Karpathy 2026-03-08 23:16:30 +00:00
9264224a3c add notable fork mlx Andrej Karpathy 2026-03-08 17:06:29 +00:00
6c087cb5e2 revert interrupted softcap experiment (restore softcap=15) exp/H100/mar8 autoresearch 2026-03-08 16:27:48 +00:00
fedfef398b add experiment results log (125 experiments, best val_bpb=0.969686) autoresearch 2026-03-08 16:22:01 +00:00
216eeb8d6e softcap 15 to 17 autoresearch 2026-03-08 15:50:16 +00:00
438a26e2c3 warmdown 0.7 to 0.75 autoresearch 2026-03-08 15:03:36 +00:00
b1d50048d9 embedding LR 0.8 to 0.9 (re-test with WD) autoresearch 2026-03-08 14:23:02 +00:00
637f82f215 VE WD 0.002 to 0.003 autoresearch 2026-03-08 13:54:07 +00:00
73c77cad75 VE WD 0.001 to 0.002 autoresearch 2026-03-08 13:48:25 +00:00
1a85362b26 tiny VE WD 0.001 autoresearch 2026-03-08 13:42:34 +00:00
ece91016aa tiny embedding WD 0.001 autoresearch 2026-03-08 13:30:56 +00:00
97dda8577f init scale 0.7 to 0.68 autoresearch 2026-03-08 12:44:03 +00:00
f5979a7464 init scale 0.7x autoresearch 2026-03-08 10:14:06 +00:00
41d50a8539 reduce transformer init scale by 0.8x autoresearch 2026-03-08 10:02:16 +00:00
264a05b48f add small WD 0.01 to lm_head (AdamW) autoresearch 2026-03-08 07:23:33 +00:00
7f63c17076 unembedding LR 0.006 to 0.005 autoresearch 2026-03-08 07:05:48 +00:00
a7aa309dd4 muon momentum warmup 300 to 200 steps autoresearch 2026-03-08 05:29:56 +00:00
aa8f408f39 unembedding LR 0.004 to 0.006 autoresearch 2026-03-08 04:48:14 +00:00
772dada6cc FINAL_LR_FRAC 0.0 to 0.05 (small LR floor) autoresearch 2026-03-08 04:36:38 +00:00
0640555a1f x0_lambda init 0.1 to 0.05 autoresearch 2026-03-08 04:30:50 +00:00
7d047e42f4 embedding LR 0.6 to 0.8 autoresearch 2026-03-08 04:19:15 +00:00
59e9dd9aab RoPE base frequency 10K to 200K autoresearch 2026-03-08 04:13:30 +00:00
7da0b673a1 short window 1/8 context (256 tokens instead of 1024) autoresearch 2026-03-08 04:07:44 +00:00
8363d52e8d SSSSL window pattern (5:1 short:long ratio) autoresearch 2026-03-08 04:01:58 +00:00
4e6697f68d warmdown 0.5 to 0.7 (more cooldown) autoresearch 2026-03-08 03:56:11 +00:00
7f2a65c9a5 depth 9 aspect_ratio 57 (extra layer, dim ~512) autoresearch 2026-03-08 03:44:34 +00:00
bea057bc08 halve batch size 524K to 262K for more steps in 5 min autoresearch 2026-03-08 03:38:47 +00:00
500114a035 Honor --download-workers instead of hardcoding 8 download workers Andrej 2026-03-07 14:17:45 -08:00
7043095a18 add macos fork Andrej Karpathy 2026-03-07 22:15:52 +00:00
777e443790 fix(prepare): honor --download-workers Dipesh Babu 2026-03-07 15:39:17 -05:00
6fdefa7265 instruct the agent to also read README, should be good context Andrej Karpathy 2026-03-07 20:09:51 +00:00
b0d047425f clarify note on platforms Andrej Karpathy 2026-03-07 19:46:27 +00:00
8a5c4869bd bunch of small changes to docs and files, and a teaser figure with a blooper :) Andrej Karpathy 2026-03-07 19:00:04 +00:00
032d203695 minor tweaks, pin val shard Andrej Karpathy 2026-03-07 17:59:52 +00:00
47ec1ade0a tweaks to docs for both humans and agents Andrej Karpathy 2026-03-07 17:02:43 +00:00
ada84e5247 soften the language just a bit Andrej Karpathy 2026-03-07 16:29:59 +00:00
bd75534494 Fix agent crash blindspot by forcing it to read traceback Andrej 2026-03-07 08:23:51 -08:00
bdf0c0d520 Allow agent to diagnose crashes by reading the python stack trace dumko2001 2026-03-07 14:46:43 +05:30
bb54287479 Merge pull request #2 from marcinbogdanski/fix/fa3-non-hopper-fallback Andrej 2026-03-06 21:59:49 -08:00
17b480aa65 add fallback FA3 kernel for non-Hopper GPUs Marcin Bogdanski 2026-03-07 01:31:48 +00:00
9c383a8c94 add analysis notebook for convenience Andrej Karpathy 2026-03-06 22:36:37 +00:00
69eb7f9b99 cleanup more references to spawn.sh Andrej Karpathy 2026-03-06 22:36:20 +00:00
ae81d55904 remove spawn.sh reference from README (file was deleted) Andrej Karpathy 2026-03-06 22:08:45 +00:00
4ab35a919b also ref twitter Andrej Karpathy 2026-03-06 22:06:12 +00:00
1e207aaf21 dam, erase experimental file from before that snuck through in my purge Andrej Karpathy 2026-03-06 22:03:27 +00:00
2a70301b10 small tweak readme Andrej Karpathy 2026-03-06 22:02:44 +00:00
b11d6f283f initial commit Andrej Karpathy 2026-03-06 21:58:52 +00:00

1 2 3 4 5 ...