You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,8 +79,21 @@ Happy training!
79
79
80
80
| Run | Score | Author | Summary | Date | Info |
81
81
|-----|------:|--------|---------|------|------|
82
-
| 1 Bit Quantization | 1.1239 | Ciprian-Florin Ifrim | 106M params quantized to 1 bit + misc arch changes + 2hr training | 2026-03-24 |[info](records/track_non_record_16mb/2026-03-24_106M_Binary_Asymmetric_UNet_FP8_15L_8192BPE_YaRN_NeoMuon_Smear/README.md)|
82
+
| 1 Bit Quantization | 1.1239 | CiprianFlorin-Ifrim | 106M params quantized to 1 bit + misc arch changes + 2hr training | 2026-03-24 |[info](records/track_non_record_16mb/2026-03-24_106M_Binary_Asymmetric_UNet_FP8_15L_8192BPE_YaRN_NeoMuon_Smear/README.md)|
83
+
| MDLM Text Diffusion | 1.1465 | agalimova | On PR #1106: masked diffusion LM with absorbing-mask ELBO-style eval, validated on 2xH100 as a discrete diffusion language-model entry | 2026-03-29 |[info](records/track_non_record_16mb/2026-03-29_LLaDA_MDLM_Diffusion/README.md)|
84
+
| Hymba-8L + Sliding Attention at 32K | 1.1467 | mkenney2 | On PR #1245: hybrid Mamba SSM + sliding-window attention with 32K context, 3-seed under-16MB result, and score-first TTT | 2026-04-01 |[info](https://github.com/openai/parameter-golf/pull/1245)|
85
+
| Mamba-3 Hybrid SSM + SP8192 + Legal TTT | 1.1473 | mradassaad | On PR #1644: 7-layer Mamba-3 hybrid with 5 SSM blocks, 2 attention layers, SP8192, AR GPTQ, chunk score-first TTT, and stateful-overlap eval | 2026-04-15 |[info](records/track_non_record_16mb/2026-04-15_Mamba3Hybrid_SP8192_GPTQ_TTT/README.md)|
86
+
| Differential-Gated Attention | 1.1898 | ddavidgao | On PR #542: alternative attention mechanism carrying novelty/differential payloads in deep layers, with analysis of depth-dependent redundancy | 2026-03-23 |[info](records/track_non_record_16mb/2026-03-23_DGAttention_DavidGao/README.md)|
87
+
| Learned Adapters on Random Linear Maps | 1.1971 | pranavxiyer | On PR #2058: random-seeded adapter MLPs with rank-160 learned LoRA-style adapters, 12 layers, 3x MLP, mixed int6/int8 compression, and 3-seed 10-minute evidence | 2026-04-30 |[info](records/track_non_record_16mb/2026-04-30_Random_Linear_Adapter/README.md)|
88
+
| JEPA + Mamba2 LeWorldModel | 1.2064 | CiprianFlorin-Ifrim | On PR #903: ambitious SSM + JEPA latent-prediction submission with long-compute evidence and 10-minute logs | 2026-03-26 |[info](records/track_non_record_16mb/2026-03-26_37M_LeWM_Jepa_Mamba2_10L_UNet_INT4FP8QAT_Brotli/README.md)|
83
89
| 4-Hour Baseline | 1.2074 | Will DePue | Testing unlimited compute, 4 hours on 8xH100 | 2026-03-18 |[info](records/track_non_record_16mb/2026-03-18_Quasi10Bfrom50B_SP1024_9x512_KV4_4h_pgut3/README.md)|
90
+
| Universal Transformer with Iteration Embeddings | 1.2249 | gowtham0992 | On PR #1110: 3 unique blocks looped 4 times with iteration embeddings, giving 12 effective layers in a 4.95MB artifact | 2026-03-30 |[info](https://github.com/openai/parameter-golf/pull/1110)|
91
+
| LegendreGPT Depth Parameterization | 1.2266 | sergimichi | On PR #1337: generates transformer weights as smooth Legendre-polynomial functions of layer depth, producing 24 virtual layers in 15.7MB after post-hoc quantization | 2026-04-04 |[info](records/track_non_record_16mb/2026-03-31_LegendreGPT/README.md)|
92
+
| ByteJEPA | 1.3496 | hardik-bhadani-git | On PR #1443: byte-level JEPA with latent prediction, SIGReg anti-collapse, no tokenizer, and an auxiliary CE head for BPB evaluation | 2026-04-07 |[info](https://github.com/openai/parameter-golf/pull/1443)|
93
+
| Byte-Level H-Net Dynamic Chunking | 1.3595 | DariusFeher | On PR #1104: systematic H-Net byte-vs-subword study with learned whitespace/word-like boundaries, plus 10-minute and 4-hour evidence | 2026-03-30 |[info](records/track_non_record_16mb/2026-03-29_HNet_ByteVsSubword_Study/README.md)|
94
+
| Orthogonal Random Maps + LoRA Adapters | 1.3705 | gowtham0992 | On PR #1113: random orthogonal attention/MLP weights regenerated from seeds with rank-32 LoRA adapters, 30M effective params, and a 5.19MB artifact | 2026-03-30 |[info](https://github.com/openai/parameter-golf/pull/1113)|
95
+
| Olmo Hybrid GDN Long-Context Study | 1.4709 | aarjunsrinivasan | On PR #1371: GDN + attention long-context study with 8K/16K/32K crossover evidence showing SSM-style hybrids help at longer contexts | 2026-04-04 |[info](records/track_non_record_16mb/2026-04-04_GDN_Hybrid_LongContext/README.md)|
96
+
| XNOR-Net Binary Activation Study | 1.5390 | CiprianFlorin-Ifrim | On PR #1388: broad 1-bit/XNOR research package with binary weights/activations, Triton XNOR-popcount kernels, 10-minute logs, and long-run evidence | 2026-04-05 |[info](records/track_non_record_16mb/2026-04-05_118M_XNOR-Net_FP8_1024D_10L/README.md)|
0 commit comments