Skip to content

Commit c6a05ee

Browse files
committed
research(2026-05-03): post-competition day 3 — audit PR openai#2146, AsymLogit Rescale, 7th BPB bug
- logs/daily_research.md: new May 3 entry; DRAFT PR openai#2146 grace-policy audit adds 4 records (pending SOTA 1.05651 via PR openai#2135); AsymLogit Rescale documented (~5 lines, zero legality risk); PR openai#2124 seed/config inconsistency; PR openai#2138 BPB bug #7 confirmed; data overlap hazard in PR openai#2130 flagged; no new high-relevance papers beyond prior scan. - CLAUDE.md: Competition Strategy updated to reflect closed competition, pending audit status, and key post-competition findings (AsymLogit Rescale, GPTQ calibration batches, data overlap isolation requirement). https://claude.ai/code/session_013Q2rFE4xRHRRYaSPfzCiip
1 parent 1f2972a commit c6a05ee

2 files changed

Lines changed: 104 additions & 1 deletion

File tree

CLAUDE.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,22 @@ torchrun --standalone --nproc_per_node=8 train_gpt.py
112112

113113
## Competition Strategy
114114

115-
**Merged leaderboard SOTA**: **1.0611 val_bpb** (codemath3000, PR #1855, 2026-04-27) — UPDATED Apr 30. Organizer pending branches fully merged. 12 new records now in main. Previous SOTA was 1.0810 (PR #1493). **New target: ≤1.0561** (beat by 0.005 nats).
115+
**COMPETITION CLOSED April 30, 2026. Post-competition audit in progress (May 3).**
116+
117+
**Final Merged SOTA**: **1.0611 val_bpb** (codemath3000, PR #1855) — stable since Apr 29. No upstream/main commits since then.
118+
119+
**Pending Audit (DRAFT PR #2146, grace policy)**: Organizers reviewing 4 post-deadline entries where code was filed pre-cutoff. If merged, effective SOTA drops to **1.05651** (PR #2135: PR#2130 base + GPTQ_CALIBRATION_BATCHES=32). Stack: CaseOps + LQER Asym + SparseAttnGate + SmearGate BOS-fix + AsymLogit Rescale + token-only n-gram tilt + phased LoRA TTT.
120+
121+
**Our status**: PR #771 REJECTED (train-then-score TTT violation). No submission.
122+
123+
**Key post-competition findings (May 1–3)**:
124+
- **AsymLogit Rescale** (PR #1923/#2130): Two trainable scalars replace fixed logit_softcap. ~5 lines. Appears in V22 stack (PR #1945, 1.05877). Zero legality risk. First-add for future competition.
125+
- **GPTQ calibration batches**: 16→32 gives ~0.001 bpb. Free win at submission time.
126+
- **Data overlap bug**: PR #2130 (1.05670) excluded by audit for docs 10,000–49,999 train/val overlap. Verify validation isolation explicitly before filing any future submission.
127+
- **PR #2138 BPB bug**: 7th BPB bug in competition. Divided by CaseOps-transformed bytes instead of raw sidecar. Corrected 0.979 → 1.067. Always verify denominator against raw-text bytes.
128+
- **PPM-D (Issue #1872)**: No organizer ruling as of May 3. Competition ended unresolved.
129+
130+
**Previous target**: ≤1.0561 (beat by 0.005 nats). Now moot — competition closed.
116131

117132
Top merged records (Apr 30 confirmed):
118133
1. 1.0611 — codemath3000 (PR #1855): SP8192 + LQER Asym + SparseAttnGate + BOS-Fixed SmearGate + 9-hparam greedy + lrzip

logs/daily_research.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,91 @@
1+
# Parameter Golf Daily Research - 2026-05-03 (POST-COMPETITION DAY 3)
2+
3+
## PR #771 STATUS: CLOSED (REJECTED 2026-03-27) — Final
4+
5+
No change. Train-then-score TTT violation per @valerio-oai. No appeal path.
6+
7+
## N-GRAM PR STATUS (Final)
8+
- **PR #727**: CLOSED — hash key includes target token (eval leakage). Final.
9+
- **PR #731**: OPEN, dormant — seeds 1337/2024 never filed. Competition closed. Dead.
10+
- **PR #758**: OPEN, dead — same XOR target-token violation as #727.
11+
12+
## Leaderboard
13+
14+
### Current Merged (upstream/main)
15+
| Rank | Score | Author | PR | Key Stack |
16+
|------|-------|--------|----|-----------|
17+
| 1 | **1.0611** | codemath3000 | #1855 | BOS-Fixed SmearGate + LQER Asym + SparseAttnGate + 9-hparam + lrzip |
18+
| 2 | 1.0614 | aquariouseworkman | #1851/#1868 | SmearGate BOS Fix + PR#1787 + LQER Asym + Phased TTT |
19+
| 3 | 1.0634 | nprime06 | #1787 | CaseOps + Polar Express NS + MIN_LR + SparseAttnGate + FusedCE + Warm-A TTT |
20+
| 4 | 1.0645 | dexhunter | #1769 | CaseOps + MLPClip12 + SmearGate + LoRA-TTT |
21+
| 5 | 1.0655 | dexhunter | #1736 | CaseOps + GatedAttn + QuantGate + PhasedTTT |
22+
23+
No upstream/main commits since Apr 29. Leaderboard frozen at SOTA 1.0611.
24+
25+
### Pending Audit (Draft PR #2146 — NOT merged yet)
26+
Organizer grace policy: code filed pre-cutoff, results filed post-deadline. Four rows pending:
27+
| PR | Score | Techniques | Note |
28+
|----|-------|------------|------|
29+
| #1945 (V22) | 1.05877–1.05943 | AWQ-lite mixed-precision + AsymLogit Rescale + no_qv TTT masking + seq_len=2816 | 3-seed, all <600s |
30+
| #1953 | 1.05855 | PR#1945 base + delta unknown | Under audit |
31+
| #2014 | 1.05759 | PR#1953 base + delta unknown | Under audit |
32+
| **#2135** | **1.05651** | PR#2130 base + GPTQ_CALIBRATION_BATCHES 16→32 | New top if merged |
33+
34+
If PR #2146 merges, effective SOTA drops to **1.05651** and new target becomes **≤1.05151**.
35+
36+
## What Changed (May 2–3, 2026)
37+
38+
### New Open PRs
39+
| PR | Author | Score | Technique | Legality |
40+
|----|--------|-------|-----------|----------|
41+
| #2149 | YaseenHQ | unknown | SP8192 + RandProj384 tied embeddings + Pairwise-QK Muon | Non-record filing, May 3 |
42+
| #2130 | TanishGudise | **1.05670** | Token-only n-gram tilt + AsymLogit Rescale + 3 hyperparams (MATRIX_LR=0.028, LQER_ASYM_GROUP=32, TTT_LORA_LR=8e-5) + NUM_PHASES=1 | ⚠️ Reviewer flagged train/val data overlap (docs 10,000–49,999). Excluded by audit. |
43+
| #2124 | vaibhavmishra1 | **1.05933** | CaseOps + Gated XSA + NgramTilt + LQER g32/top4 + Phased TTT | ⚠️ 3-seed config inconsistency: headline uses third seed from different config. "Not record-ready as submitted." |
44+
| #2138 | anmarhindi | ~~0.979556~~**1.067219** | Lock-In Byte Mixer (PPM-D gate, λ activates only at PPM_conf≥0.9999) | **CONFIRMED BPB BUG** (7th in competition): divides by CaseOps bytes not raw-text sidecar bytes. Corrected score 1.067219 = below SOTA. Do NOT track. |
45+
46+
### Key Technique: AsymLogit Rescale (PR #1923 / #2130)
47+
- Replace single `logit_softcap=30.0` with two trainable scalars `softcap_pos`, `softcap_neg`
48+
- Parameters adapt via TTT global prefix pass
49+
- Implementation: ~5 lines, zero legality risk
50+
- Used in V22 stack (PR #1945) and post-deadline leader PR #2135
51+
52+
### BPB Bug Tally: 7 confirmed this competition
53+
Bugs in: PR #1545, #1576, #1687, #1698, #1848 (risk), #1858 (partial data), #2138.
54+
55+
## New Research Papers (May 3 scan)
56+
57+
No new highly relevant papers since May 2 scan. Prior high-priority items still pending:
58+
59+
| Paper | arXiv | Priority |
60+
|-------|-------|----------|
61+
| In-Place TTT (NTP-aligned loss) | 2604.06169 | High — read before next competition TTT design |
62+
| Bell Box Quantization (BBQ) | 2603.01599 | High — ITO quantization; could replace GPTQ/LQER |
63+
| EntroLLM entropy coding | 2505.02380 | High — additive to lrzip artifact compression |
64+
| Decoupling Tokenization Effects | 2604.27263 | Medium — theoretical backing for CaseOps BPB debate |
65+
66+
**No new May 2026 competition-relevant papers found in this scan.**
67+
68+
## Status Summary
69+
70+
| Item | Status |
71+
|------|--------|
72+
| Competition | **CLOSED** (April 30, 2026) |
73+
| Final Merged SOTA | **1.0611** (codemath3000, PR #1855) |
74+
| Pending Audit SOTA | **1.05651** (PR #2135, DRAFT PR #2146, not merged) |
75+
| Our submission | **REJECTED** (PR #771, train-then-score violation) |
76+
| Upstream commits since close | 5 — all non-record/notable submissions |
77+
| Issue #1872 (PPM-D legality) | No ruling — competition ended unresolved |
78+
79+
## Recommended Action
80+
81+
Competition is over. Three actionable items:
82+
83+
1. **Monitor PR #2146** — if the grace-policy audit merges, it reveals: (a) V22 lineage (AWQ-lite + AsymLogit Rescale) is the actual winning stack; (b) AsymLogit Rescale delivers ~0.003 bpb standalone; (c) GPTQ calibration batch count matters at the margin (0.001 bpb).
84+
2. **Read arXiv:2604.06169** (In-Place NTP-aligned TTT) — directly applicable to future competition legal TTT design.
85+
3. **Document lesson**: Data overlap audit (docs 10,000–49,999 train/val overlap) invalidated PR #2130 despite otherwise clean technique. Any future competition needs explicit validation-set isolation check before filing.
86+
87+
---
88+
189
# Parameter Golf Daily Research - 2026-05-02 (POST-COMPETITION DAY 2)
290

391
## PR #771 STATUS: CLOSED (REJECTED 2026-03-27) — Final

0 commit comments

Comments
 (0)