|
| 1 | +# Record: 3-Seed Compliance Reproduction — Support for PR #1851 |
| 2 | + |
| 3 | +**val_bpb = 1.06145** (3-seed mean ± 0.00068) | **~15.95 MB** | 8×H100 SXM 80GB |
| 4 | + |
| 5 | +## Summary |
| 6 | + |
| 7 | +This is a **3-seed compliance reproduction and support package** for [PR #1851](https://github.com/openai/parameter-golf/pull/1851) by @aquariouseworkman (SmearGate BOS Fix + PR #1787 Base + LQER Asymmetric + Phased TTT). |
| 8 | + |
| 9 | +The purpose of this package is to: |
| 10 | +1. Provide statistical significance evidence (3 seeds) for the PR #1851 result. |
| 11 | +2. Confirm that results are reproducible across seeds by an independent party. |
| 12 | +3. Document a compliance re-run demonstrating GPTQ fits within the 600s training budget. |
| 13 | + |
| 14 | +**No new ML technique is introduced.** This package reproduces the exact code and configuration from PR #1851. |
| 15 | + |
| 16 | +## 3-Seed Results (Original Runs) |
| 17 | + |
| 18 | +These are the originally-committed results. Seed 42 is from @aquariouseworkman's PR #1851 submission; seeds 314 and 1234 were run by @Christopher-Lee-McClendon as independent reproductions using the same code and environment variables. |
| 19 | + |
| 20 | +| Seed | Post-TTT BPB | Artifact (bytes) | Eval Time | Source | |
| 21 | +|------|-------------|------------------|-----------|--------| |
| 22 | +| 42 | **1.06128183** | 15,952,086 | 519.5s | PR #1851 (@aquariouseworkman) | |
| 23 | +| 314 | **1.06086831** | 15,952,419 | 525.6s | Reproduction (@Christopher-Lee-McClendon) | |
| 24 | +| 1234 | **1.06220261** | 15,952,690 | 479.6s | Reproduction (@Christopher-Lee-McClendon) | |
| 25 | +| **Mean ± Std** | **1.06145 ± 0.00068** | | | | |
| 26 | + |
| 27 | +All artifacts < 16,000,000 bytes ✓ |
| 28 | +All eval times < 600s ✓ |
| 29 | + |
| 30 | +### Log Files (Original) |
| 31 | + |
| 32 | +- `train_seed42_pr1851_original.log` — Seed 42 from PR #1851 by @aquariouseworkman |
| 33 | +- `train_seed314_original.log` — Seed 314 reproduction by @Christopher-Lee-McClendon |
| 34 | +- `train_seed1234_original.log` — Seed 1234 reproduction by @Christopher-Lee-McClendon |
| 35 | + |
| 36 | +## Compliance Re-run Evidence (GPTQ Within 600s) |
| 37 | + |
| 38 | +The original runs used `GPTQ_RESERVE_SECONDS=0.5`, which resulted in the training loop running until ~599.6s. GPTQ hessian collection (which accesses training data) adds ~3.5s, potentially extending past the 600s budget. |
| 39 | + |
| 40 | +To confirm compliance, all 3 seeds were re-run with `GPTQ_RESERVE_SECONDS=8.0`, ensuring the training loop ends at ~592s and GPTQ hessians complete by ~595.5s (well within 600s). The only code change is the timing margin — no ML change. |
| 41 | + |
| 42 | +| Seed | Post-TTT BPB (re-run) | Train Time | GPTQ Ends By | Artifact (bytes) | |
| 43 | +|------|----------------------|------------|--------------|------------------| |
| 44 | +| 42 | 1.06083288 | 592.1s | ~595.5s ✅ | 15,949,701 | |
| 45 | +| 314 | 1.06090748 | 592.0s | ~595.5s ✅ | 15,951,777 | |
| 46 | +| 1234 | 1.06248776 | 592.1s | ~595.5s ✅ | 15,951,968 | |
| 47 | +| **Mean ± Std** | **1.06141 ± 0.00093** | | | | |
| 48 | + |
| 49 | +**No statistically significant difference:** Original mean 1.06145 vs re-run mean 1.06141 (delta = −0.00004, well within 1-sigma noise). This confirms the GPTQ reserve setting has negligible impact on model quality. |
| 50 | + |
| 51 | +### Re-run Log Files |
| 52 | + |
| 53 | +- `train_seed42_rerun_gptq8s.log` |
| 54 | +- `train_seed314_rerun_gptq8s.log` |
| 55 | +- `train_seed1234_rerun_gptq8s.log` |
| 56 | + |
| 57 | +### What Changed in Re-run |
| 58 | + |
| 59 | +1. **`GPTQ_RESERVE_SECONDS` 0.5 → 8.0** — Training loop ends ~8s early for GPTQ headroom. |
| 60 | +2. **Serialize-before-diagnostic reordering** — Artifact written immediately after GPTQ, before pre-quant diagnostic eval. |
| 61 | +3. **Timing instrumentation** — `serialize_wallclock` and `artifact_production_wallclock` logged for transparency. |
| 62 | + |
| 63 | +### GPTQ Timing Breakdown (Re-run) |
| 64 | + |
| 65 | +| Phase | Time | Accesses Training Data? | |
| 66 | +|-------|------|------------------------| |
| 67 | +| Training loop (with 8s reserve) | ~592s | ✅ Yes | |
| 68 | +| Hessian collection | ~3.5s | ✅ Yes | |
| 69 | +| **Total training-data-access time** | **~595.5s** | **< 600s ✅** | |
| 70 | +| Quantization | ~10.1s | ❌ No (uses cached Hessians) | |
| 71 | +| Brotli compression | ~65-67s | ❌ No (pure I/O) | |
| 72 | + |
| 73 | +## Technique Stack |
| 74 | + |
| 75 | +All techniques inherited from PR #1851 and its lineage. No new techniques introduced. |
| 76 | + |
| 77 | +| Technique | Source | |
| 78 | +|-----------|--------| |
| 79 | +| Base architecture (11L, MLP 4×, MuonEq-R) | PR #1787 (@nprime06) | |
| 80 | +| SmearGate attention + BOS fix | PR #1797 (@dexhunter) + PR #1851 (@aquariouseworkman) | |
| 81 | +| LQER Asymmetric quantization | PR #1797 (@dexhunter) | |
| 82 | +| CaseOps SP8192 | PR #1729 (@romeerp) | |
| 83 | +| GPTQ + SP8192 | PR #1394 (@clarkkev) | |
| 84 | +| Score-first TTT (3 phases) | PR #549 (@abaybektursun) | |
| 85 | +| BOS bug identification | @cocohearts | |
| 86 | + |
| 87 | +## Architecture |
| 88 | + |
| 89 | +11L × 512d × 8H/4KV, MLP 4×, LeakyReLU(0.5)², Partial RoPE (16/64 dims), layerwise LN scale, tied embeddings, logit softcap=30.0. Depth recurrence: layers 3–5 looped ×2 (activated at frac=0.35). Parallel residuals from layer 8. XSA on all 11 layers. SmearGate window=12. |
| 90 | + |
| 91 | +## Reproduction |
| 92 | + |
| 93 | +```bash |
| 94 | +# Install dependencies |
| 95 | +pip install brotli python-minifier |
| 96 | + |
| 97 | +# Prepare CaseOps SP8192 data |
| 98 | +python3 prepare_caseops_data.py # downloads from romeerp/parameter-golf-caseops-v1 |
| 99 | + |
| 100 | +# Run training (replace SEED with 42, 314, or 1234) |
| 101 | +SEED=42 \ |
| 102 | +CASEOPS_ENABLED=1 \ |
| 103 | +EMBED_BITS=7 \ |
| 104 | +SMEAR_GATE_ENABLED=1 \ |
| 105 | +SPARSE_ATTN_GATE_ENABLED=1 \ |
| 106 | +MIN_LR=0.1 \ |
| 107 | +EMBED_CLIP_SIGMAS=15.0 \ |
| 108 | +MLP_CLIP_SIGMAS=12.0 \ |
| 109 | +GPTQ_RESERVE_SECONDS=8.0 \ |
| 110 | +PHASED_TTT_NUM_PHASES=3 \ |
| 111 | +torchrun --standalone --nproc_per_node=8 train_gpt.py |
| 112 | +``` |
| 113 | + |
| 114 | +**Hardware:** 8×H100 SXM 80GB (RunPod) |
| 115 | + |
| 116 | +## Credits |
| 117 | + |
| 118 | +- **@aquariouseworkman** — PR #1851 author (SmearGate BOS fix, original seed 42 result) |
| 119 | +- **@nprime06** — PR #1787 (base architecture) |
| 120 | +- **@romeerp** — PR #1729 (CaseOps) |
| 121 | +- **@dexhunter** — PR #1797 (SmearGate + LQER asymmetric quantization) |
| 122 | +- **@cocohearts** — BOS document boundary bug identification |
| 123 | +- **@abaybektursun** — PR #549 (score-first TTT) |
| 124 | +- **@clarkkev** — PR #1394 (GPTQ + SP8192) |
| 125 | +- **@Christopher-Lee-McClendon** — Seeds 314/1234 reproduction and compliance re-run |
0 commit comments