Record: SmearGate BOS Fix 3-Seed Compliance Re-run — val_bpb 1.06141 (3-seed mean) by Christopher-Lee-McClendon · Pull Request #1868 · openai/parameter-golf

Christopher-Lee-McClendon · 2026-04-27T19:20:15Z

Record Support: 3-Seed Compliance Reproduction for PR #1851

val_bpb = 1.06145 (3-seed mean ± 0.00068) | ~15.95 MB | 8×H100 SXM 80GB

Summary

This PR is now positioned as a standalone support / record package for PR #1851, not as a separate technique claim.

It does three things:

packages the original 3-seed support evidence for PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851,
includes the original seed 42 log from PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851 alongside the two independent reproduction logs,
adds a later compliance re-run showing GPTQ fits within the 600s training budget with no statistically significant change in validation BPB.

No ML change is claimed here. The technique is still the PR #1851 stack.

Original 3-seed support result

These are the headline results this PR should be judged on.

Seed	Post-TTT BPB	Artifact (bytes)	Eval time	Source
42	1.06128183	15,952,086	519.5s	Original PR #1851 by @aquariouseworkman
314	1.06086831	15,952,419	525.6s	Independent reproduction
1234	1.06220261	15,952,690	479.6s	Independent reproduction
Mean ± Std	1.06145 ± 0.00068

All three artifacts are under 16,000,000 bytes and all eval times are under 600s.

Logs now included in this submission directory

Original support logs

train_seed42_pr1851_original.log
train_seed314_original.log
train_seed1234_original.log

Later compliance re-run logs

train_seed42_rerun_gptq8s.log
train_seed314_rerun_gptq8s.log
train_seed1234_rerun_gptq8s.log

Later compliance re-run (supplementary evidence only)

The original runs used GPTQ_RESERVE_SECONDS=0.5, which left too little margin for GPTQ hessian collection. To confirm compliance, I re-ran all 3 seeds later with GPTQ_RESERVE_SECONDS=8.0 and serialize-before-diagnostic ordering so GPTQ hessians complete within the 600s training-data-access budget.

Seed	Re-run post-TTT BPB	Train time	GPTQ ends by	Artifact (bytes)
42	1.06083288	592.1s	~595.5s ✅	15,949,701
314	1.06090748	592.0s	~595.5s ✅	15,951,777
1234	1.06248776	592.1s	~595.5s ✅	15,951,968
Mean ± Std	1.06141 ± 0.00093

Comparison to the original 3-seed support package:

original mean: 1.06145
compliance re-run mean: 1.06141
delta: -0.00004

That delta is well within ordinary seed noise, so the compliance fix does not materially change model quality.

Technique / attribution

This remains the PR #1851 technique stack:

PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 base architecture (@nprime06)
SmearGate + LQER lineage from PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797 (@dexhunter)
SmearGate BOS fix from PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851 (@aquariouseworkman)
CaseOps SP8192 (@romeerp)
GPTQ + SP8192 lineage (@clarkkev)
score-first TTT lineage (@abaybektursun)
BOS bug identification by @cocohearts

What changed in this PR update

Reframed the submission as a support package for PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851
Included all three original seed logs in the directory
Preserved the later GPTQ compliance re-run as supplementary evidence
Set the headline result back to the original 3-seed mean: 1.06145

GitHub link: #1851

@aquariouseworkman

3-seed reproduction of PR openai#1851 (SmearGate BOS document boundary fix). Code is byte-identical to openai#1851 by @aquariouseworkman. Results (post-TTT BPB): Seed 42: 1.06128 (original openai#1851 author) Seed 314: 1.06087 (this submission) Seed 1234: 1.06220 (this submission) Mean: 1.06145 ± 0.00068 All artifacts < 16,000,000 bytes. All runs < 600s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

aquariouseworkman · 2026-04-27T20:27:29Z

You are amazing!!!

PR openai#1902 (cocohearts) accepted openai#1851/openai#1868 over openai#1736 and excluded openai#1855 only on significance grounds (p=0.325). Our prior 050 line built on openai#1797 which is under validity-cloud per cocohearts. Re-anchor research baseline on openai#1855's accepted chain. Pure port — zero modifications. Files copied verbatim from codemath3000/parameter-golf:submission/sp8192-lqer-bos-smear-fix-9hp-stack @ 1e43966 into records/track_10min_16mb/2026-04-29_PR1855_Port_Baseline/. Spec 060B+ will fork exp/060B-* etc. to stack quant-repair / deploy-time levers (046B-tight SDClip, 046L deploy-time repair, 046G-tighter, etc.) on this baseline.

…an 1.06141 Re-ran all 3 seeds (42, 314, 1234) with GPTQ_RESERVE_SECONDS=8.0 (was 0.5) to ensure GPTQ hessian collection completes within the 600s training budget. Code changes: - Serialize artifact immediately after training (before diagnostic eval) - Added timing instrumentation (serialize_wallclock, GPTQ sub-timings) Results (all seeds fresh re-run on RunPod 8×H100 SXM): Seed 42: post-TTT BPB = 1.06083, artifact = 15,949,701, eval = 525.5s Seed 314: post-TTT BPB = 1.06091, artifact = 15,951,777, eval = 429.5s Seed 1234: post-TTT BPB = 1.06249, artifact = 15,951,968, eval = 481.2s 3-seed mean: 1.06141 ± 0.00093 Compliance: training loop ends at ~592s, GPTQ hessians end at ~595.5s (<600s). RunPod cost: ~$31. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cocohearts

Thanks, this is useful supporting evidence for #1851. Before merge, please make it a standalone support/record package: include all three seed logs in this directory, including the seed 42 log currently referenced from #1851, and keep the README/submission.json phrased as a 3-seed compliance reproduction/support package rather than a separate technique claim. No ML change needed.

cocohearts

Thanks, this is useful supporting evidence for #1851. Before merge, please make it a standalone support/record package: include all three seed logs in this directory, including the seed 42 log currently referenced from #1851, and keep the README/submission.json phrased as a 3-seed compliance reproduction/support package rather than a separate technique claim. No ML change needed.

Duplicate requested-changes review submitted by automation; keeping the earlier requested-changes review active.

@aquariouseworkman

…1851 - Add original seed 42 log from PR openai#1851 (@aquariouseworkman) - Add original seeds 314 and 1234 from independent reproduction - Rename compliance re-run logs to *_rerun_gptq8s.log for clarity - Rewrite README as support/compliance package (not a new technique claim) - Rewrite submission.json with headline val_bpb=1.06145 (original 3-seed mean) - Document compliance re-run as supplementary evidence (no stat-sig difference)

Christopher-Lee-McClendon · 2026-04-29T20:25:42Z

Updated, Including both the original 3-seeds and my independent compliance re-run. Thanks!

@dexhunter

Audits every CaseOps-lineage record-track PR (merged + unmerged) since 2026-04-18 for whether val docs are also in the training set. Working set: 34 PRs (31 from chronological seed list + 3 discovered ancestors: openai#1908, openai#1923, openai#2007). Boundary nodes openai#1493 / openai#1626 (pre-CaseOps). Verdicts: - CLEAN (8): openai#1729, openai#1851, openai#1868, openai#1908, openai#2019, openai#2027, openai#2031, openai#2068 - LEAK (25): openai#1736 (our research baseline) → openai#1769 → openai#1787 → openai#1797 → openai#1855 → V21 family (openai#1945, openai#1923, openai#1953, openai#1967) → openai#2018 → openai#2118 (current claimed frontier 1.04350), plus siblings. - INHERIT (1): openai#2050 (eval-only on frozen openai#1915) Code-level evidence (not README claims): - Every shipped prepare_caseops_data.py is byte-identical: SHARD_TOKENS=10_000_000, default=10_000 for --val-docs - NO PR overrides --val-docs (searched all .sh files in all 34 PRs) - cached_challenge_fineweb.py downloads from romeerp/parameter-golf-caseops-v1 HF dataset whose manifest pins docs_val=50000, docs_train=8181945, sums match → CLEAN by construction - PR openai#2018's DATASET_AUDIT.md is gold-standard explicit leak description - PR openai#2118's submission.json admits "--val-docs=10000 train shards + 50k val eval" Three signposts: - Leak introduced: PR openai#1736 by @dexhunter (Apr 19) — first prepare_caseops_data.py default invocation - Leak fixed: PR openai#1851 by @aquariouseworkman (Apr 27) — switched to HF dataset - Leak re-introduced: PR openai#1855 by @codemath3000 (same day) — rebuilt locally The merged-leaderboard SOTA (openai#1851/openai#1868 at 1.06128/1.06141) is CLEAN. The unmerged frontier (openai#2118 at 1.04350) is LEAK. The 0.018 bpb gap is inflated by val memorization; spec 301 was designed to measure how much remains under clean data. Files: caseops-memory-leakage/README.md — overview, methodology, takeaways caseops-memory-leakage/verdicts.md — 34-row master table with evidence caseops-memory-leakage/family-tree.md — ASCII trees with [C]/[L] annotations

someone114514 mentioned this pull request Apr 28, 2026

Experiment: SmearGate BOS Fix + train-only logit calibration #1884

Open

aquariouseworkman mentioned this pull request Apr 28, 2026

Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851

Merged

cocohearts mentioned this pull request Apr 28, 2026

Update Parameter Golf leaderboard with BOS fix #1902

Merged

Christopher-Lee-McClendon mentioned this pull request Apr 29, 2026

Non-record: Audited Byte-Level Neural/PPM-D Mixture BPB = 1.5221 (Full Validation) — Framework for Legal Score-First PPM-D Mixtures #1916

Open

Christopher-Lee-McClendon changed the title ~~Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean)~~ Record: SmearGate BOS Fix 3-Seed Compliance Re-run — val_bpb 1.06141 (3-seed mean) Apr 29, 2026

cocohearts requested changes Apr 29, 2026

View reviewed changes

cocohearts previously requested changes Apr 29, 2026

View reviewed changes

cocohearts merged commit fdde8dc into openai:main Apr 30, 2026

jamesEmerson112 mentioned this pull request May 1, 2026

Record: SP8192 + Headwise Gate + EMA 0.990 + Small Batch (1.0066 BPB, 3-seed) #2071

Open

MaxIv25 mentioned this pull request May 1, 2026

Non-record: Causal Bigram Blending — eval-time BPB improvement (1×H20… #2088

Open

JulianTang2027 mentioned this pull request May 1, 2026

Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b… #2117

Open

newjordan mentioned this pull request May 1, 2026

non-record - mockingbird - sota copy/10kvocab #2120

Open

leon2k2k2k mentioned this pull request May 1, 2026

Train/val data leakage in CaseOps records — prepare_caseops_data.py default overlaps 80% of val docs with training data #2127

Open

This was referenced May 2, 2026

Record candidate: PR #2130 base + GPTQ_CALIBRATION_BATCHES=32 — val_bpb 1.05651 (3-seed mean) #2135

Open

Record: SP8192 + Sliding-Window Eval + Lock-In Byte Mixer - val_bpb 1.067219 #2138

Open

Update leaderboard with May 1 audited rows #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SmearGate BOS Fix 3-Seed Compliance Re-run — val_bpb 1.06141 (3-seed mean)#1868

Record: SmearGate BOS Fix 3-Seed Compliance Re-run — val_bpb 1.06141 (3-seed mean)#1868
cocohearts merged 3 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/record-1851-3seed

Christopher-Lee-McClendon commented Apr 27, 2026 •

edited

Loading

Uh oh!

aquariouseworkman commented Apr 27, 2026

Uh oh!

cocohearts left a comment

Uh oh!

cocohearts left a comment

Uh oh!

Christopher-Lee-McClendon commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Christopher-Lee-McClendon commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Record Support: 3-Seed Compliance Reproduction for PR #1851

Summary

Original 3-seed support result

Logs now included in this submission directory

Original support logs

Later compliance re-run logs

Later compliance re-run (supplementary evidence only)

Technique / attribution

What changed in this PR update

Uh oh!

aquariouseworkman commented Apr 27, 2026

Uh oh!

cocohearts left a comment

Choose a reason for hiding this comment

Uh oh!

cocohearts left a comment

Choose a reason for hiding this comment

Uh oh!

Christopher-Lee-McClendon commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Christopher-Lee-McClendon commented Apr 27, 2026 •

edited

Loading