Skip to content

Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean)#1855

Merged
cocohearts merged 4 commits intoopenai:mainfrom
codemath3000:submission/sp8192-lqer-bos-smear-fix-9hp-stack
Apr 29, 2026
Merged

Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean)#1855
cocohearts merged 4 commits intoopenai:mainfrom
codemath3000:submission/sp8192-lqer-bos-smear-fix-9hp-stack

Conversation

@codemath3000
Copy link
Copy Markdown
Contributor

@codemath3000 codemath3000 commented Apr 27, 2026

Summary

11L 512d 8H/4KV transformer with U-Net skips, parallel residuals, partial RoPE, Polar-Express Newton-Schulz Muon, LQER asymmetric int4 rank-4 quant correction, sparse attention head-output gate, SmearGate with cross-document leak fix on BOS positions (audit response), fused LeakyReLU-square MLP, fused softcapped CE Triton kernel, GPTQ int6 + int7 embed + per-row int8 attn-gate, per-group lrzip + brotli compression pipeline (added in this submission — PR #1797's base only ships lzma / brotli), phased TTT eval (3 cumulative phases at doc-boundaries 833 / 1666 / 2500, max prefix=2500 docs), with 9 hyperparameter overrides validated by greedy forward-selection on 8×H100 real fixed-step.

3-seed mean: 1.06108 BPB (std 0.00090) on 8×H100 SXM, all artifacts under the 16 MB cap.

seed post-TTT val_bpb artifact bytes eval_time
42 1.05989 15,897,259 508.8 s
0 1.06125 15,900,947 455.1 s
1234 1.06209 15,907,550 470.0 s
mean 1.06108 15,901,919 478.0 s

vs current leaderboard (1.0810 BPB): −0.01992 BPB / −0.04359 nats.

SmearGate cross-document leak fix

SmearGate's per-token forward-1 mixing (x[:, 1:] + g * x[:, :-1]) leaks the last token of doc N into the BOS embedding of doc N+1 in a packed validation stream. Fix masks the prev-token term wherever the current token is BOS:

not_bos = (input_ids[:, 1:] != BOS_ID).to(x.dtype).unsqueeze(-1)
x = torch.cat([x[:, :1], x[:, 1:] + g * x[:, :-1] * not_bos], dim=1)

Applied symmetrically in _forward_hidden and forward_ttt so training and TTT eval are leak-free.

Per-group compression pipeline

PR #1797's base only exposes lzma / brotli compressors. This submission adds a per-group serializer (COMPRESSOR=pergroup):

  1. Buckets the int6 GPTQ tensors by role (qo_bank, kv_bank, mlp_up_bank, mlp_down_bank, etc.) so similarly-distributed weights compress together.
  2. For "hot" 2D groups (_tok_emb, attn.c_q, mlp.fc), runs an L1 nearest-neighbour similarity sort on rows before transposing — adjacent rows in the serialized stream are now numerically close, giving the entropy coder longer runs of small deltas. Permutation indices are stored as uint16 and brotli-compressed.
  3. Compresses each group blob with lrzip -z -L 9 (ZPAQ context-mixing back-end). lrzip's long-range deduplication catches cross-tensor repetition that brotli's 24-bit window misses.
  4. Falls back to brotli for the remainder (state-dict scaffolding, scales, LQER factors, gate tensors) and the code wrapper.

Net effect on this stack: ~280 KB smaller artifact than COMPRESSOR=brotli, at the cost of ~75 s of additional serialize time. The lrzip binary must be present on the system before the training script runs (e.g. install with apt-get install lrzip during instance setup). The script itself does not run apt-get; the Python subprocess.run wrapper just shells out to the already-installed lrzip binary.

Hyperparameter stack

9 greedy-validated overrides:

hparam value default
MLP_CLIP_SIGMAS 11.5 10.0
EMBED_CLIP_SIGMAS 14.0 20.0
WARMDOWN_FRAC 0.85 0.75
BETA2 0.99 0.95
TTT_BETA2 0.99 0.999
TTT_WEIGHT_DECAY 0.5 1.0
TTT_LORA_RANK 80 96
SPARSE_ATTN_GATE_SCALE 0.5 1.0
PHASED_TTT_PREFIX_DOCS 2500 2000

Each individually accepted on a strict greedy-keep rule (mean improvement vs current best stack) at fixed-step.

See records/track_10min_16mb/2026-04-27_SP8192_LQER_SparseGate_BOSSmearFix_9HpStack_1.0611/README.md for full architecture lineage and credits.

Test plan

  • Trains within 600s wallclock on 8×H100 80GB SXM (4917-4945 steps achieved, ~121.7 ms/step mean)
  • All 3 artifacts under 16 MB cap (max 15,907,550 B; min 15,897,259 B)
  • TTT eval completes within 600s eval cap (max 508.8 s)
  • 3-seed mean reproduced; per-seed numbers verified in attached logs
  • SmearGate fix verified by code-diff audit; applied to both _forward_hidden and forward_ttt

🤖 Generated with Claude Code

codemath3000 and others added 2 commits April 27, 2026 03:45
…aram Greedy Stack — val_bpb 1.06108 (3-seed mean)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or internal hp trials only; submission used 600s wallclock
@codemath3000
Copy link
Copy Markdown
Contributor Author

codemath3000 commented Apr 27, 2026

Quick note: the initial commit, 612a1a9, was the only one with any code changes; the other two were README-only changes to clarify things and fix some AI hallucinations. In terms of evaluating when this PR was submitted relative to other PRs, the timing of the initial commit, not that of the most recent commit, should be used, since the initial commit contains all the code, logs, etc., and the other commits are just README changes. Thank you so much!

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 27, 2026
…ate BOS-fix

Lands openai#1797's training stack (PolarNS, MIN_LR, Sparse Attn Gate, Fused CE,
Smear Gate, LQER asym) verbatim into a new record dir, with the BOS-fix
patch from openai#1855 applied at both _forward_hidden and forward_ttt sites.
Per CLAUDE.md baseline-migration exception, lands directly on research
(not exp/<slug>).

Spec: research/specs/050-baseline-1797-bos-fix.md
Code: records/track_10min_16mb/2026-04-27_050_PR1797_Base_BOS_Fix/

Expected: post-TTT ~1.061 (matches openai#1797's 1.06157 ± noise).
Skipped from openai#1855: 9-hparam bundle and lrzip serializer (deferred for
clean attribution of subsequent levers).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 27, 2026
…penai#1855

Changes train_gpt.py defaults for two of openai#1855's 9 greedy-validated hparams:
- BETA2 0.95 -> 0.99 (smoother optim variance estimate, generic win)
- SPARSE_ATTN_GATE_SCALE 1.0 -> 0.5 (softer gating early; only affects openai#1787's
  sparse attn-output gate path, no coupling with our 047 family)

Both still env-var-overridable for ablation. WARMDOWN_FRAC=0.85 deferred
because it interacts with loop-activation timing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 27, 2026
… required; PR openai#1848 BPB risk; Day 18 plateau; Session 23

- Merged SOTA still 1.0810 (Day 18, no change since Apr 9)
- PPM-D byte mixture confirmed by dexhunter at 1.0322 (PR openai#1857, self-closed)
- SmearGate BOS bug documented: prev-token leaks at document boundaries; fix required
- PR openai#1848 (newjordan, 0.87980) flagged BPB risk: sibling PR openai#1846 closed same day
- PR openai#1858 (0.9946) only covers 8M/40.5M tokens — not leaderboard-comparable
- PR openai#1855 (codemath3000, 1.06108) and openai#1851 (aquariouseworkman, 1.06128) both clean
- PPM-D wave: PRs openai#1850, openai#1854, openai#1835 await organizer ruling
- Added Session 23 lessons to CLAUDE.md
- 3 days to deadline (Apr 30) — final GPU run window

https://claude.ai/code/session_01RmJtLYUmKNzDgDVTnWoKzU
Fija pushed a commit to Fija/parameter-golf that referenced this pull request Apr 28, 2026
- Adds 2-line BOS mask in both forward_logits and forward_ttt SmearGate
  paths. Before fix, the last token of doc N smeared into the BOS of doc
  N+1 — model-quality bug, not a C1 issue. Identical fix to PR openai#1851
  @aquariouseworkman, audit by @cocohearts.

- runpod/phase_g_3seed.sh: full 3-seed driver. Sets PR openai#1797 stack env
  vars + the PR openai#1855 9-hparam greedy stack delta:
    MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 WARMDOWN_FRAC=0.85
    BETA2=0.99 TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 TTT_LORA_RANK=80
    SPARSE_ATTN_GATE_SCALE=0.5 PHASED_TTT_PREFIX_DOCS=2500
  Mixers (NGRAM/TEMP) stay OFF — pure neural baseline + bug fix +
  hparam stack. Auto-runs Welch t-test vs PR openai#1797 (1.06157±0.00066).

- TTT 4-epoch (PR openai#1812) explicitly NOT adopted: that scheme targets the
  PR openai#1493 SGD-on-whole-model TTT path, not the PR openai#1797 LoRA-phased
  per-doc-reset path we're on. No clean mapping.

Legality: all 16/16 unit tests still pass. BOS fix preserves causality
(it only zeroes a gate at positions where current token is BOS, never
references future tokens).
GodlyDonuts added a commit to GodlyDonuts/parameter-golf that referenced this pull request Apr 28, 2026
…olar Express NS + MIN_LR + LQER)

Triage of 5 new PRs the user surfaced (1858, 1852, 1855, 1874, 1877):
- openai#1852: hard rule violation (pre-quant TTT on validation data).
- openai#1858: eval subset (8M of 40.5M tokens), reviewer caught and author admitted.
- openai#1877: broken normalization (byte PPM × token NN doesn't sum to 1 over
  token alphabet), reviewer @sharpobject caught.
- openai#1855: techniques mostly legit but apt-get install lrzip violates Issue
  openai#1017 Rule 3 (artifact must be self-contained).
- openai#1874: LEGITIMATE - 3-seed mean 1.06766, std 0.00076, three orthogonal
  training-time techniques citing prior validated PRs. If it merges,
  our submission threshold shifts from 1.0760 to ~1.0627.

PR openai#1874's three techniques:
1. Polar Express NS coefficients (PR openai#1344) - 5 minimax-tuned tuples
   replace the fixed (3.4445, -4.775, 2.0315) at MUON_BACKEND_STEPS=5.
2. MIN_LR=0.10 warmdown floor (PR openai#1787) - LR floors at 10% of max
   instead of decaying to 0. Already wired in our v1+; just env-var
   opt-in.
3. LQER asymmetric int4 rank-4 quantization correction (PR openai#1797) -
   SVD on top-K=3 highest-error GPTQ residuals, packed as int4
   per-group-64 asymmetric. ~200-400 LOC; deferred to v4.

train_gpt_v3.py implements (1) and exposes (2):
- POLAR_EXPRESS_NS=0 default (byte-for-byte SOTA when off).
- _PE_COEFFS module-level constant + _POLAR_EXPRESS_NS flag read at
  import time so torch.compile sees them as constants.
- zeropower_via_newtonschulz5 branches on _POLAR_EXPRESS_NS to use
  per-iteration coefficients instead of fixed.
- MIN_LR was already an env var; setting MIN_LR=0.10 at runtime opts in.

Sizes: v3 raw 54,977 lzma 15,128 (+272 vs v2, +1,880 vs SOTA). Worst-
seed artifact slack: ~4,888 bytes under cap. Tight but workable.

AST-validated on Python 3.13 (macOS) and 3.12 (Vultr Linux).

Stacking projection (single-seed):
- Phase 0 baseline:       1.08038
- + LR=0.010 (Stage 2):   1.08021
- + Polar Express NS:     1.0787-1.0797
- + MIN_LR=0.10:          1.0777-1.0794
- + ConfTTT (PR openai#1879):   1.0772-1.0793
- + LQER (v4 work):       1.0742-1.0783
- + Phase 2 architecture: 1.0712-1.0773
- + Newton-Muon Stage E:  1.066-1.075

Path B (absorb-and-stack) recommended over Path A (race-to-merge-with-
current-stack) since current stack alone doesn't clear 1.0760.

Race awareness: openai#1874, openai#1855 (lrzip-stripped), and openai#1797 are all open.
Whichever merges first becomes new SOTA and our threshold tightens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aquariouseworkman
Copy link
Copy Markdown
Contributor

apt-get install lrzip is required at runtime, thus this breaks rule 3

@okezue
Copy link
Copy Markdown

okezue commented Apr 28, 2026

Independent reproduction (3 seeds)

Re-ran the stack on a fresh 8×H100 SXM pod (cu129/torch 2.9.1/FA3 cu129_torch291) with the env vars from this PR's hparam table plus the gates (`SMEAR_GATE_ENABLED=1 SPARSE_ATTN_GATE_ENABLED=1 EMBED_BITS=7 MIN_LR=0.1 GPTQ_RESERVE_SECONDS=0.5 PHASED_TTT_NUM_PHASES=3` etc.). 600 s wallclock, otherwise identical to this PR.

seed pre-quant quantized post-TTT
42 1.06383 1.07237 1.05965
314 1.06450 1.07316 1.06041
999 1.06545 1.07398 1.06124
mean 1.06043

3-seed mean 1.06043 vs this PR's reported 1.06108 — within 1σ. Independent reproduction confirms the stack.

These three runs used `COMPRESSOR=brotli` and produced 16,112,007-byte artifacts (over the 16 MB cap; the BPB numbers are unaffected by compressor choice but the artifacts are technically non-compliant). One additional pergroup re-run with seed 42 produced a compliant 15,902,285-byte artifact at val_bpb 1.06052, matching seed 42's brotli value within run-to-run noise. (Couldn't get 3 pergroup seeds due to a string of RunPod capacity / image-pull failures over a 4-hour window — the locked volume on the original machine still has all the brotli logs and the s42 pergroup artifact saved.)

Net: the stack reproduces. The −0.019 BPB jump over PR #1493's 1.0810 is real on independent hardware.

@codemath3000
Copy link
Copy Markdown
Contributor Author

codemath3000 commented Apr 28, 2026

apt-get install lrzip is required at runtime, thus this breaks rule 3

@aquariouseworkman Thanks for flagging — you're right that the README/PR wording was imprecise about when lrzip is needed, and I've fixed that. To clarify the substantive question:

The training/eval script never runs apt-get itself. The lrzip binary is installed once during instance setup and the script just shells out to the already-installed binary via subprocess.run. No apt-get, no network calls, and no external downloads occur during the 600 s training window or the 600s eval window.

The official FAQ explicitly authorizes external dependencies handled this way: "Yes, you're free to import any package or library you want… Just include a requirements.txt in your records folder and mention setup instructions in your README.md." Both are present (requirements.txt documents lrzip; the README has the install command).

For precedent: the current leaderboard SOTA (PR for the 2026-04-09 SP8192 record at 1.0810 BPB) installs FlashAttention 3 from a custom third-party wheel host (pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/...) — also an external download performed before training begins. lrzip via apt-get is actually a more conservative dependency than that: it pulls from the official Debian/Ubuntu package repos rather than from a single contributor's GitHub Pages site. If the standard FA3 setup is acceptable, this should be acceptable a fortiori.

Also worth noting that "rule 3" as numbered in the field guide is the author's paraphrase, not the official rule text. The literal official rule (FAQ) is the one I quoted above.

…quisite (not auto-installed by the training script)
@aquariouseworkman
Copy link
Copy Markdown
Contributor

aquariouseworkman commented Apr 28, 2026

apt-get install lrzip is required at runtime, thus this breaks rule 3

@aquariouseworkman Thanks for flagging — you're right that the README/PR wording was imprecise about when lrzip is needed, and I've fixed that. To clarify the substantive question:

The training/eval script never runs apt-get itself. The lrzip binary is installed once during instance setup and the script just shells out to the already-installed binary via subprocess.run. No apt-get, no network calls, and no external downloads occur during the 600 s training window or the 600s eval window.

The official FAQ explicitly authorizes external dependencies handled this way: "Yes, you're free to import any package or library you want… Just include a requirements.txt in your records folder and mention setup instructions in your README.md." Both are present (requirements.txt documents lrzip; the README has the install command).

For precedent: the current leaderboard SOTA (PR for the 2026-04-09 SP8192 record at 1.0810 BPB) installs FlashAttention 3 from a custom third-party wheel host (pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/...) — also an external download performed before training begins. lrzip via apt-get is actually a more conservative dependency than that: it pulls from the official Debian/Ubuntu package repos rather than from a single contributor's GitHub Pages site. If the standard FA3 setup is acceptable, this should be acceptable a fortiori.

Also worth noting that "rule 3" as numbered in the field guide is the author's paraphrase, not the official rule text. The literal official rule (FAQ) is the one I quoted above.

Strange, from what I can see in your current code, you would have to do one of the following to be valid (code review based, not based on your AI README post/information) :

a) re-run all three seeds with COMPRESSOR=brotli and reports those numbers
b) replaces the lrzip subprocess call with a pure-Python ZPAQ implementation that lives inside the artifact
c) maintainers would have to to add lrzip to the base eval image.

Evidence:

  1. hard calls to lrzip train_gpt.py:2381-2390
  2. hard routrer to lrzip # train_gpt.py:2700
  3. requirements.txt:13:# System dep (apt): lrzip (used by per-group compressor) - pip thus cant install this as it isnt a py package (ref apt) thats why adding it to requirements.txt is not a fix, which is why the AI in your README has to call out apt-get install lrzip as a separate setup step, which is exactly what makes it a side dependency

@codemath3000
Copy link
Copy Markdown
Contributor Author

codemath3000 commented Apr 28, 2026

Hi @aquariouseworkman, thanks for the detailed read. Walking through the three options:

(a) Re-run with COMPRESSOR=brotli — this would actually disqualify the submission on a different rule. @okezue's independent reproduction earlier in this thread used COMPRESSOR=brotli and got 16,112,007-byte artifacts — about 112 KB over the 16 MB cap. Pergroup is what brings this submission's artifacts under the cap (15.9 MB max). Re-running with brotli isn't a fix; it produces a non-compliant submission on artifact size.

(b) Pure-Python ZPAQ embedded in the artifact — no rule requires this. The official rule (FAQ on artifact size) prohibits "external downloads, training dataset access, or network calls during evaluation" — not subprocess shell-outs to OS utilities. lrzip is invoked only via subprocess.run against an already-installed binary; no network call and no download at runtime.

(c) Maintainers add lrzip to the base eval image — this is the standard workflow for the challenge, explicitly authorized by the FAQ:

"Yes, you're free to import any package or library you want… Just include a requirements.txt in your records folder and mention setup instructions in your README.md."

The FAQ uses "package or library" (not "pip package") and treats requirements.txt and "setup instructions in your README.md" as separate, both-acceptable declaration mechanisms. Pip-installability isn't a rule criterion. We declare lrzip in both places: requirements.txt documents it as a comment ("System dep (apt): lrzip"), and the README spells out the install command.

A point worth being explicit about: the rule's verb is "include a requirements.txt." That means have one, present it as a reference — not "every dependency must resolve automatically via pip install -r requirements.txt." The official README itself reinforces this on line 176, describing requirements.txt as "provided as a reference if you want to self-setup" — a reference document, not a self-executing manifest. Submission contents rule (line 227) similarly says "any other dependencies," with no restriction to pip-installable packages. So a requirements.txt that mentions lrzip in a comment alongside the README's install instructions satisfies the rule as written.

For precedent: the current leaderboard SOTA (2026-04-09, val_bpb 1.0810 by @bigbag) installs FlashAttention 3 from windreamer.github.io, a single contributor's GitHub Pages site, via a custom --find-links URL that does not resolve from PyPI. By the "if it isn't a normal pip-resolvable package, it's a side dependency" reading, that submission would also fail. It doesn't, because the FAQ allows external deps via README setup instructions — the exact same mechanism we're using for lrzip. lrzip from official Debian/Ubuntu apt repos is structurally a more conservative external-dep pattern than installing custom-built CUDA wheels from a GitHub Pages site.

The literal self-containment rule (FAQ on artifact size) is about runtime behavior during evaluation: "No external downloads, training dataset access, or network calls are allowed during evaluation." lrzip is invoked locally against an already-installed binary — no network, no download, no training-data access at runtime.

So to be explicit on the conclusion: the submission is valid as-is. The artifact is under 16 MB, the script makes no network calls or external downloads during evaluation, and the lrzip dependency is declared exactly the way the official FAQ asks external dependencies to be declared (requirements.txt + README setup instructions) — the same way every recent record has handled FA3, including the current leaderboard SOTA. None of options (a)–(c) is required; (a) would in fact create a new compliance violation.

@aquariouseworkman
Copy link
Copy Markdown
Contributor

Hi @aquariouseworkman, thanks for the detailed read. Walking through the three options:

(a) Re-run with COMPRESSOR=brotli — this would actually disqualify the submission on a different rule. @okezue's independent reproduction earlier in this thread used COMPRESSOR=brotli and got 16,112,007-byte artifacts — about 112 KB over the 16 MB cap. Pergroup is what brings this submission's artifacts under the cap (15.9 MB max). Re-running with brotli isn't a fix; it produces a non-compliant submission on artifact size.

(b) Pure-Python ZPAQ embedded in the artifact — no rule requires this. The official rule (FAQ on artifact size) prohibits "external downloads, training dataset access, or network calls during evaluation" — not subprocess shell-outs to OS utilities. lrzip is invoked only via subprocess.run against an already-installed binary; no network call and no download at runtime.

(c) Maintainers add lrzip to the base eval image — this is the standard workflow for the challenge, explicitly authorized by the FAQ:

"Yes, you're free to import any package or library you want… Just include a requirements.txt in your records folder and mention setup instructions in your README.md."

The FAQ uses "package or library" (not "pip package") and treats requirements.txt and "setup instructions in your README.md" as separate, both-acceptable declaration mechanisms. Pip-installability isn't a rule criterion. We declare lrzip in both places: requirements.txt documents it as a comment ("System dep (apt): lrzip"), and the README spells out the install command.

A point worth being explicit about: the rule's verb is "include a requirements.txt." That means have one, present it as a reference — not "every dependency must resolve automatically via pip install -r requirements.txt." The official README itself reinforces this on line 176, describing requirements.txt as "provided as a reference if you want to self-setup" — a reference document, not a self-executing manifest. Submission contents rule (line 227) similarly says "any other dependencies," with no restriction to pip-installable packages. So a requirements.txt that mentions lrzip in a comment alongside the README's install instructions satisfies the rule as written.

For precedent: the current leaderboard SOTA (2026-04-09, val_bpb 1.0810 by @bigbag) installs FlashAttention 3 from windreamer.github.io, a single contributor's GitHub Pages site, via a custom --find-links URL that does not resolve from PyPI. By the "if it isn't a normal pip-resolvable package, it's a side dependency" reading, that submission would also fail. It doesn't, because the FAQ allows external deps via README setup instructions — the exact same mechanism we're using for lrzip. lrzip from official Debian/Ubuntu apt repos is structurally a more conservative external-dep pattern than installing custom-built CUDA wheels from a GitHub Pages site.

The literal self-containment rule (FAQ on artifact size) is about runtime behavior during evaluation: "No external downloads, training dataset access, or network calls are allowed during evaluation." lrzip is invoked locally against an already-installed binary — no network, no download, no training-data access at runtime.

So to be explicit on the conclusion: the submission is valid as-is. The artifact is under 16 MB, the script makes no network calls or external downloads during evaluation, and the lrzip dependency is declared exactly the way the official FAQ asks external dependencies to be declared (requirements.txt + README setup instructions) — the same way every recent record has handled FA3, including the current leaderboard SOTA. None of options (a)–(c) is required; (a) would in fact create a new compliance violation.

Yes .. my bad, this does appear valid. :) your now the #1 spot

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
PR openai#1902 (cocohearts) accepted openai#1851/openai#1868 over openai#1736 and excluded openai#1855
only on significance grounds (p=0.325). Our prior 050 line built on openai#1797
which is under validity-cloud per cocohearts. Re-anchor research baseline
on openai#1855's accepted chain.

Pure port — zero modifications. Files copied verbatim from
codemath3000/parameter-golf:submission/sp8192-lqer-bos-smear-fix-9hp-stack
@ 1e43966 into records/track_10min_16mb/2026-04-29_PR1855_Port_Baseline/.

Spec 060B+ will fork exp/060B-* etc. to stack quant-repair / deploy-time
levers (046B-tight SDClip, 046L deploy-time repair, 046G-tighter, etc.)
on this baseline.
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
- pinned SHA da50cd6 (spec 060A baseline)
- TTT_ENABLED=1, PHASED_TTT_ENABLED=3 (per memory: =3 not =0)
- All openai#1855 defaults made explicit in env (BETA2=0.99,
  SPARSE_ATTN_GATE_SCALE=0.5, MLP_CLIP_SIGMAS=11.5, EMBED_CLIP_SIGMAS=14.0,
  WARMDOWN_FRAC=0.85, PHASED_TTT_PREFIX_DOCS=2500, TTT_BETA2=0.99,
  TTT_WEIGHT_DECAY=0.5, TTT_LORA_RANK=80)
- apt install lrzip (required by openai#1855's _lrzip_compress)
- Both final_model.pt and final_model.int6.ptz verified after run;
  fail-loud (exit 2) on missing artifacts; chmod a-w on success
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
Six follow-on specs to spec 060A (openai#1855 port):

- 060B: SDClip ATTN tightening (config-only, eval via RESUME_FROM_CKPT)
- 060C: 046L deploy-time quant repair (~150 lines code port from
  exp/046-quant-repair @ fcb816f); eval-side, free
- 060D: 046G-tighter SDClip (config-only, fits within openai#1855 lrzip headroom)
- 060E: full stack (060B + 060C combined)
- 060F: LQER bumps (RANK=5, TOP_K=4, ASYM_GROUP=32; config-only)
- 060G: Partial SpinQuant from PR openai#1898 (~100 lines code port)

Plus tmp_exec/launch_060_eval.sh: shared eval-only launcher for
RESUME_FROM_CKPT mode, used by 060B/D/E/F. Loads 060A's final_model.pt,
re-quantizes + re-evals with overridden env vars. ~-3 per arm vs
~ for full retrain.

All specs reference 060A's checkpoint at runs/060A-1855-port/seed_42/
final_model.pt as their hotstart.
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
openai#1855's per-group lrzip compressor saves ~280 KB vs default brotli.
Without this, 060A's artifact went over the 16,000,000 byte cap and
required post-hoc repacking (per execution session feedback).

Switching default to pergroup ensures all 060 family runs (A through G)
fit within the cap by default; no separate repack step needed.

Affects: launch_060A_run.sh (full training run), launch_060_eval.sh
(eval-only via RESUME_FROM_CKPT for 060B/D/E/F).
Fija pushed a commit to Fija/parameter-golf that referenced this pull request Apr 28, 2026
Phase M seed-42 hit val_bpb 1.05891 (record-clearing) but artifact 17.25 MB
(over by 1.25 MB) because lzma compression made things WORSE on quantized
weights — Phase G with brotli was 16.14 MB, lzma made it 17.25.

Lesson: brotli > lzma on this data.

Phase N strategy: same Phase G config (9-hparam stack on PR openai#1797 V2 base
with BOS fix, brotli compression) but revert MLP_CLIP_SIGMAS from 11.5 to
10.0 (PR openai#1797 default). Tighter MLP weight clip → narrower magnitude →
brotli compresses tighter → expected ~100-200 KB saved. Phase G's 16,144,312
bytes should drop to <16,000,000.

BPB cost of reverting MLP_CLIP: small (one of 9 hparams; PR openai#1855 reported
mean delta -0.00049 across all 9; reverting one ~adds 0.00006 BPB). Phase G's
mean 1.05969 should shift to ~1.0598 — still well below the 1.05963 record
bar (PR openai#1797 = 1.06157 - 0.00194 = 1.05963).

Auto-stop pod (trap EXIT + hard wallclock kill 100min) and HF result push
after each seed (so abort is recoverable from outside the pod).
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
Four post-training specs to stack on 060A's openai#1855 port:

- 060I: port PR openai#1908's activation-aware mixed-bit GPTQ (3-seed validated
  −0.000265 BPB on openai#1855 itself). 4 env vars + ~100 LOC port.
- 060J: PHASED_TTT_NUM_PHASES 3→4 (low confidence; openai#1727 measured noise on
  weaker base, never tested with 2500 prefix).
- 060L: PHASED_TTT_PREFIX_DOCS 2500→3000 (high confidence; codemath3000
  greedy-validated 2000→2500 on this exact stack in openai#1855).
- 060M: TTT_EPOCHS 3→4 (highest predicted Δ; PR openai#1812 reported −0.008 on
  weaker base; never tested on phased+SmearGate stack like openai#1855).

All eval-only via RESUME_FROM_CKPT on 060A's seed_42_4h pt. No code change
for 060J/L/M. 060K (rank-up) deleted — rowed against openai#1855's own greedy
direction (which decreased rank 96→80).

Idea files: research/ideas/{1908-awq-lite-mixed-bit-gptq,ttt-budget-reinvestment}.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
…oundary

This submission extends PR openai#1855's record candidate (LQER + SparseAttnGate +
BOS-fixed SmearGate + Polar-Express Muon + phased TTT eval + 9-hparam stack;
3-seed mean 1.06108) with two additions:

1. MP3 marker-pair fusion (vocab surgery): the three 2-grams [SPACE, TITLE]/
   [SPACE, ALLCAPS]/[SPACE, CAPNEXT] are fused into single alias donor tokens
   (donors 8/9/10 from byte-fallback IDs that occur 0x in the CaseOps corpus).
   Word X is preserved (no full-fusion d=1 collapse). Token saving 8.47%.

2. Alias smear boundary: SmearGate's previous-position contribution is fully
   disabled at positions immediately following an alias token
   (ALIAS_PREV_SMEAR_SCALE=0.0). Regular non-alias positions are unchanged.
   Conceptually: alias tokens act as smear boundaries.

1-seed reference (8xH100, 600s wallclock, on author's DGX H100 box):
  val_bpb (phased TTT) : 1.06042
  size                  : 16.74 MB on DGX (over budget); the same
                          PR openai#1855 codebase unmodified also produces
                          16.75 MB on the same DGX box, so the ~840 KB
                          delta vs the runpod 15.90 MB number is
                          environmental (likely lrzip ZPAQ version /
                          numerical state). The 3-seed runpod
                          verification is the authoritative size
                          measurement.

Submission contents:
  - train_gpt.py        : PR openai#1855 train_gpt.py (~3.8k lines) + 5-hunk MP3 patch
  - prepare_caseops_data.py : CaseOps tokeniser (multiprocess)
  - prepare_marker_pair_v3.py : MP3 vocab surgery
  - download_docs.py    : HF docs_selected.jsonl downloader
  - lossless_caps.py    : CaseOps infra
  - tokenizers/...model : SentencePiece model
  - alias_map.json      : MP3 alias map
  - requirements.txt    : Python deps + lrzip note
  - run_3seed.sh        : 3-seed runner (SEEDS=42 0 1234)
  - README.md

Pipeline (skip 1a/1b/2 if MP3 dataset is already prepared):
  1a. python3 download_docs.py
  1b. python3 prepare_caseops_data.py --docs ... --out ./data --sp tokenizers/...
  2.  python3 prepare_marker_pair_v3.py
  3.  bash run_3seed.sh

3-seed runpod verification pending.
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request May 1, 2026
Audits every CaseOps-lineage record-track PR (merged + unmerged) since
2026-04-18 for whether val docs are also in the training set.

Working set: 34 PRs (31 from chronological seed list + 3 discovered ancestors:
openai#1908, openai#1923, openai#2007). Boundary nodes openai#1493 / openai#1626 (pre-CaseOps).

Verdicts:
  - CLEAN (8): openai#1729, openai#1851, openai#1868, openai#1908, openai#2019, openai#2027, openai#2031, openai#2068
  - LEAK (25): openai#1736 (our research baseline) → openai#1769openai#1787openai#1797openai#1855 → V21 family (openai#1945, openai#1923, openai#1953, openai#1967) → openai#2018openai#2118
    (current claimed frontier 1.04350), plus siblings.
  - INHERIT (1): openai#2050 (eval-only on frozen openai#1915)

Code-level evidence (not README claims):
  - Every shipped prepare_caseops_data.py is byte-identical:
    SHARD_TOKENS=10_000_000, default=10_000 for --val-docs
  - NO PR overrides --val-docs (searched all .sh files in all 34 PRs)
  - cached_challenge_fineweb.py downloads from romeerp/parameter-golf-caseops-v1
    HF dataset whose manifest pins docs_val=50000, docs_train=8181945,
    sums match → CLEAN by construction
  - PR openai#2018's DATASET_AUDIT.md is gold-standard explicit leak description
  - PR openai#2118's submission.json admits "--val-docs=10000 train shards + 50k val eval"

Three signposts:
  - Leak introduced: PR openai#1736 by @dexhunter (Apr 19) — first prepare_caseops_data.py
    default invocation
  - Leak fixed: PR openai#1851 by @aquariouseworkman (Apr 27) — switched to HF dataset
  - Leak re-introduced: PR openai#1855 by @codemath3000 (same day) — rebuilt locally

The merged-leaderboard SOTA (openai#1851/openai#1868 at 1.06128/1.06141) is CLEAN.
The unmerged frontier (openai#2118 at 1.04350) is LEAK. The 0.018 bpb gap is
inflated by val memorization; spec 301 was designed to measure how much
remains under clean data.

Files:
  caseops-memory-leakage/README.md       — overview, methodology, takeaways
  caseops-memory-leakage/verdicts.md     — 34-row master table with evidence
  caseops-memory-leakage/family-tree.md  — ASCII trees with [C]/[L] annotations
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request May 1, 2026
User pushed back on openai#2014's LEAK call as too inference-based. Verified directly:
- README says "uses same shards as PR openai#1855. If you don't have them, prepare
  with included prepare_caseops_data.py" — phrasing implies inheritance from
  openai#1855 (LEAK) but doesn't explicitly invoke prep
- No setup.sh, no shell script invoking prep
- No HF download script
- Path /dev/shm/pgolf_caseops_data_80_l17_final is custom flat RAM-disk dir
  (not triple-nested local-prep signature)
- Could be either HF-flattened download OR local-prep copy

Demoted openai#2014 from LEAK to AMBIGUOUS (lean LEAK based on "same shards as openai#1855"
English, but not iron-clad).

Updated tally: CLEAN 9, LEAK 20 (was 21), AMBIGUOUS 4 (was 3), INHERIT 1.
@codemath3000
Copy link
Copy Markdown
Contributor Author

codemath3000 commented May 1, 2026

Following up on the val_docs=10_000 default question with a more detailed answer.

This submission's training and evaluation data come from the published romeerp/parameter-golf-caseops-v1 HuggingFace dataset, downloaded directly via:

MATCHED_FINEWEB_REPO_ID=romeerp/parameter-golf-caseops-v1 \
python3 cached_challenge_fineweb.py \
  --variant sp8192_lossless_caps_caseops_v1_reserved --train-shards 80

The script used is the cached_challenge_fineweb.py bundled with PR #1729 (record path), which extends dataset_dir_for_variant with a 2-line fallback so non-numeric variant names like sp8192_lossless_caps_caseops_v1_reserved resolve. The canonical data/cached_challenge_fineweb.py on main only handles byte260 and sp<VOCAB_SIZE> numeric variants. The patch is purely a variant-name accommodation and doesn't change the download logic — the data still pulls from romeerp/parameter-golf-caseops-v1 directly.

The shipped prepare_caseops_data.py in our record folder is a reference data-prep script and was not run for this submission.

The romeerp dataset's manifest.json authoritatively documents the partition:

"stats": {
    "docs_val": 50000,
    "docs_train": 8181945,
    "files_val": 1,
    "files_train": 80,
    "tokens_val": 47853344,
    "tokens_train": 8000000058
}

This is the canonical 50K val docs / 8B train tokens setup. Our train_seed42.log reports train_shards: 80 and val_tokens: 47,851,520 — matching the manifest's files_train: 80 and tokens_val: 47,853,344 (the latter rounds to 47,851,520 via the standard (numel-1)//eval_seq_len*eval_seq_len + 1 truncation with eval_seq_len=2048).

The val partition this submission evaluated on is the canonical 50K val docs from romeerp's dataset.

TanishGudise added a commit to TanishGudise/parameter-golf that referenced this pull request May 1, 2026
Beats PR openai#1855 (merged rank 1, 1.06108) by 0.00438 BPB.
Beats PR openai#2014 (best open, 1.05759) by 0.00089 BPB.
Beats PR openai#2060 (1.05792) by 0.00122 BPB.

Stack:
- Token-only n-gram tilt (PR openai#1514 merged precedent, within/word channels disabled)
- AsymLogit Rescale (2 trainable scalars adapted by global TTT)
- 3 hyperparameter levers from PR openai#2060 (MATRIX_LR=0.028, LQER_ASYM_GROUP=32, TTT_LORA_LR=8e-5)
- PHASED_TTT_NUM_PHASES=1 (matches PR openai#2014)
- NGRAM_HINT_PRECOMPUTE_OUTSIDE=0 (precompute INSIDE eval timer per PR openai#1514)

Compliance:
- All seeds eval ≤533.1s (cap 600s, 67-80s margin)
- All artifacts ≤15.95MB (cap 16MB)
- Token-only n-gram channel (within_gate=0, word_gate=0)
- Score-first TTT (per PR openai#402)
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
izlley added a commit to izlley/parameter-golf that referenced this pull request May 1, 2026
anmarhindi added a commit to anmarhindi/parameter-golf that referenced this pull request May 2, 2026
…0.979556)

The cond-PPM mixer used SP-piece UTF-8 bytes (incl. CaseOps sentinel
overhead, 164,594,398 per seed) as the BPB denominator instead of the
canonical raw-text sidecar (151,074,309 per seed) used by every other
CaseOps-lineage record per PR openai#1729 convention. Reported by @codemath3000
on PR openai#2138; thank you.

Per-token NLL is invariant under denominator change, so the correction
is algebraic — no re-eval required, original artifact and logs preserved
as forensic record. New per-seed BPB = old × 164594398 / 151074309 =
old × 1.089493:

  seed 42:   0.97949078 -> 1.067148
  seed 1337: 0.97954725 -> 1.067210
  seed 314:  0.97962885 -> 1.067299
  mean:      0.979556   -> 1.067219  (std ~7.6e-05)

On the canonical denominator the submission is +0.006 BPB worse than
PR openai#1855 SOTA (1.06108), so this is no longer a SOTA-claim. LBM still
gives a real -0.034 BPB improvement over sliding-window-alone (1.101347)
on the canonical denominator; the C2-correctness story is unchanged.

This commit only patches interpretation:
  - README.md: prepend Errata section, corrected 3-seed table, source-
    line citations, algebraic derivation; reposition writeup as
    not-SOTA. Original technique writeup retained below.
  - submission.json: corrected val_bpb / val_bpb_per_seed / std /
    eval_canonical_byte_count_per_seed / headline_metric_description;
    add errata{} object with summary, original values, inflation ratio,
    credit, fix-branch pointer.

Forensic items deliberately untouched: train_gpt.py (wrapped, contains
buggy denominator), final_model.int6.ptz, train_seed*.log (each shows
both the buggy 'cond_ppm bytes=164594398' line and the canonical-
correct 'quantized_sliding_window val_bpb' line — the sidecar count
151,074,309 is reverse-solvable from the latter).

Fix lives on cond-ppm-stack of github.com/anmarhindi/parameter-golf-a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants