Skip to content

Commit e159eec

Browse files
committed
docs: add notable non-record submissions
1 parent 0016fca commit e159eec

1 file changed

Lines changed: 14 additions & 1 deletion

File tree

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,21 @@ Happy training!
7979

8080
| Run | Score | Author | Summary | Date | Info |
8181
|-----|------:|--------|---------|------|------|
82-
| 1 Bit Quantization | 1.1239 | Ciprian-Florin Ifrim | 106M params quantized to 1 bit + misc arch changes + 2hr training | 2026-03-24 | [info](records/track_non_record_16mb/2026-03-24_106M_Binary_Asymmetric_UNet_FP8_15L_8192BPE_YaRN_NeoMuon_Smear/README.md) |
82+
| 1 Bit Quantization | 1.1239 | CiprianFlorin-Ifrim | 106M params quantized to 1 bit + misc arch changes + 2hr training | 2026-03-24 | [info](records/track_non_record_16mb/2026-03-24_106M_Binary_Asymmetric_UNet_FP8_15L_8192BPE_YaRN_NeoMuon_Smear/README.md) |
83+
| MDLM Text Diffusion | 1.1465 | agalimova | On PR #1106: masked diffusion LM with absorbing-mask ELBO-style eval, validated on 2xH100 as a discrete diffusion language-model entry | 2026-03-29 | [info](records/track_non_record_16mb/2026-03-29_LLaDA_MDLM_Diffusion/README.md) |
84+
| Hymba-8L + Sliding Attention at 32K | 1.1467 | mkenney2 | On PR #1245: hybrid Mamba SSM + sliding-window attention with 32K context, 3-seed under-16MB result, and score-first TTT | 2026-04-01 | [info](https://github.com/openai/parameter-golf/pull/1245) |
85+
| Mamba-3 Hybrid SSM + SP8192 + Legal TTT | 1.1473 | mradassaad | On PR #1644: 7-layer Mamba-3 hybrid with 5 SSM blocks, 2 attention layers, SP8192, AR GPTQ, chunk score-first TTT, and stateful-overlap eval | 2026-04-15 | [info](records/track_non_record_16mb/2026-04-15_Mamba3Hybrid_SP8192_GPTQ_TTT/README.md) |
86+
| Differential-Gated Attention | 1.1898 | ddavidgao | On PR #542: alternative attention mechanism carrying novelty/differential payloads in deep layers, with analysis of depth-dependent redundancy | 2026-03-23 | [info](records/track_non_record_16mb/2026-03-23_DGAttention_DavidGao/README.md) |
87+
| Learned Adapters on Random Linear Maps | 1.1971 | pranavxiyer | On PR #2058: random-seeded adapter MLPs with rank-160 learned LoRA-style adapters, 12 layers, 3x MLP, mixed int6/int8 compression, and 3-seed 10-minute evidence | 2026-04-30 | [info](records/track_non_record_16mb/2026-04-30_Random_Linear_Adapter/README.md) |
88+
| JEPA + Mamba2 LeWorldModel | 1.2064 | CiprianFlorin-Ifrim | On PR #903: ambitious SSM + JEPA latent-prediction submission with long-compute evidence and 10-minute logs | 2026-03-26 | [info](records/track_non_record_16mb/2026-03-26_37M_LeWM_Jepa_Mamba2_10L_UNet_INT4FP8QAT_Brotli/README.md) |
8389
| 4-Hour Baseline | 1.2074 | Will DePue | Testing unlimited compute, 4 hours on 8xH100 | 2026-03-18 | [info](records/track_non_record_16mb/2026-03-18_Quasi10Bfrom50B_SP1024_9x512_KV4_4h_pgut3/README.md) |
90+
| Universal Transformer with Iteration Embeddings | 1.2249 | gowtham0992 | On PR #1110: 3 unique blocks looped 4 times with iteration embeddings, giving 12 effective layers in a 4.95MB artifact | 2026-03-30 | [info](https://github.com/openai/parameter-golf/pull/1110) |
91+
| LegendreGPT Depth Parameterization | 1.2266 | sergimichi | On PR #1337: generates transformer weights as smooth Legendre-polynomial functions of layer depth, producing 24 virtual layers in 15.7MB after post-hoc quantization | 2026-04-04 | [info](records/track_non_record_16mb/2026-03-31_LegendreGPT/README.md) |
92+
| ByteJEPA | 1.3496 | hardik-bhadani-git | On PR #1443: byte-level JEPA with latent prediction, SIGReg anti-collapse, no tokenizer, and an auxiliary CE head for BPB evaluation | 2026-04-07 | [info](https://github.com/openai/parameter-golf/pull/1443) |
93+
| Byte-Level H-Net Dynamic Chunking | 1.3595 | DariusFeher | On PR #1104: systematic H-Net byte-vs-subword study with learned whitespace/word-like boundaries, plus 10-minute and 4-hour evidence | 2026-03-30 | [info](records/track_non_record_16mb/2026-03-29_HNet_ByteVsSubword_Study/README.md) |
94+
| Orthogonal Random Maps + LoRA Adapters | 1.3705 | gowtham0992 | On PR #1113: random orthogonal attention/MLP weights regenerated from seeds with rank-32 LoRA adapters, 30M effective params, and a 5.19MB artifact | 2026-03-30 | [info](https://github.com/openai/parameter-golf/pull/1113) |
95+
| Olmo Hybrid GDN Long-Context Study | 1.4709 | aarjunsrinivasan | On PR #1371: GDN + attention long-context study with 8K/16K/32K crossover evidence showing SSM-style hybrids help at longer contexts | 2026-04-04 | [info](records/track_non_record_16mb/2026-04-04_GDN_Hybrid_LongContext/README.md) |
96+
| XNOR-Net Binary Activation Study | 1.5390 | CiprianFlorin-Ifrim | On PR #1388: broad 1-bit/XNOR research package with binary weights/activations, Triton XNOR-popcount kernels, 10-minute logs, and long-run evidence | 2026-04-05 | [info](records/track_non_record_16mb/2026-04-05_118M_XNOR-Net_FP8_1024D_10L/README.md) |
8497

8598
#### Requests for PRs
8699

0 commit comments

Comments
 (0)