Skip to content

Commit fdde8dc

Browse files
authored
Merge PR #1868: Record: SmearGate BOS Fix 3-Seed Compliance Re-run — val_bpb 1.06141 (3-seed mean)
Merge accepted Parameter Golf record/support submission #1868 after requested format cleanup.
2 parents afc90a1 + c5fd1c7 commit fdde8dc

14 files changed

Lines changed: 31478 additions & 0 deletions
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Record: 3-Seed Compliance Reproduction — Support for PR #1851
2+
3+
**val_bpb = 1.06145** (3-seed mean ± 0.00068) | **~15.95 MB** | 8×H100 SXM 80GB
4+
5+
## Summary
6+
7+
This is a **3-seed compliance reproduction and support package** for [PR #1851](https://github.com/openai/parameter-golf/pull/1851) by @aquariouseworkman (SmearGate BOS Fix + PR #1787 Base + LQER Asymmetric + Phased TTT).
8+
9+
The purpose of this package is to:
10+
1. Provide statistical significance evidence (3 seeds) for the PR #1851 result.
11+
2. Confirm that results are reproducible across seeds by an independent party.
12+
3. Document a compliance re-run demonstrating GPTQ fits within the 600s training budget.
13+
14+
**No new ML technique is introduced.** This package reproduces the exact code and configuration from PR #1851.
15+
16+
## 3-Seed Results (Original Runs)
17+
18+
These are the originally-committed results. Seed 42 is from @aquariouseworkman's PR #1851 submission; seeds 314 and 1234 were run by @Christopher-Lee-McClendon as independent reproductions using the same code and environment variables.
19+
20+
| Seed | Post-TTT BPB | Artifact (bytes) | Eval Time | Source |
21+
|------|-------------|------------------|-----------|--------|
22+
| 42 | **1.06128183** | 15,952,086 | 519.5s | PR #1851 (@aquariouseworkman) |
23+
| 314 | **1.06086831** | 15,952,419 | 525.6s | Reproduction (@Christopher-Lee-McClendon) |
24+
| 1234 | **1.06220261** | 15,952,690 | 479.6s | Reproduction (@Christopher-Lee-McClendon) |
25+
| **Mean ± Std** | **1.06145 ± 0.00068** | | | |
26+
27+
All artifacts < 16,000,000 bytes ✓
28+
All eval times < 600s ✓
29+
30+
### Log Files (Original)
31+
32+
- `train_seed42_pr1851_original.log` — Seed 42 from PR #1851 by @aquariouseworkman
33+
- `train_seed314_original.log` — Seed 314 reproduction by @Christopher-Lee-McClendon
34+
- `train_seed1234_original.log` — Seed 1234 reproduction by @Christopher-Lee-McClendon
35+
36+
## Compliance Re-run Evidence (GPTQ Within 600s)
37+
38+
The original runs used `GPTQ_RESERVE_SECONDS=0.5`, which resulted in the training loop running until ~599.6s. GPTQ hessian collection (which accesses training data) adds ~3.5s, potentially extending past the 600s budget.
39+
40+
To confirm compliance, all 3 seeds were re-run with `GPTQ_RESERVE_SECONDS=8.0`, ensuring the training loop ends at ~592s and GPTQ hessians complete by ~595.5s (well within 600s). The only code change is the timing margin — no ML change.
41+
42+
| Seed | Post-TTT BPB (re-run) | Train Time | GPTQ Ends By | Artifact (bytes) |
43+
|------|----------------------|------------|--------------|------------------|
44+
| 42 | 1.06083288 | 592.1s | ~595.5s ✅ | 15,949,701 |
45+
| 314 | 1.06090748 | 592.0s | ~595.5s ✅ | 15,951,777 |
46+
| 1234 | 1.06248776 | 592.1s | ~595.5s ✅ | 15,951,968 |
47+
| **Mean ± Std** | **1.06141 ± 0.00093** | | | |
48+
49+
**No statistically significant difference:** Original mean 1.06145 vs re-run mean 1.06141 (delta = −0.00004, well within 1-sigma noise). This confirms the GPTQ reserve setting has negligible impact on model quality.
50+
51+
### Re-run Log Files
52+
53+
- `train_seed42_rerun_gptq8s.log`
54+
- `train_seed314_rerun_gptq8s.log`
55+
- `train_seed1234_rerun_gptq8s.log`
56+
57+
### What Changed in Re-run
58+
59+
1. **`GPTQ_RESERVE_SECONDS` 0.5 → 8.0** — Training loop ends ~8s early for GPTQ headroom.
60+
2. **Serialize-before-diagnostic reordering** — Artifact written immediately after GPTQ, before pre-quant diagnostic eval.
61+
3. **Timing instrumentation**`serialize_wallclock` and `artifact_production_wallclock` logged for transparency.
62+
63+
### GPTQ Timing Breakdown (Re-run)
64+
65+
| Phase | Time | Accesses Training Data? |
66+
|-------|------|------------------------|
67+
| Training loop (with 8s reserve) | ~592s | ✅ Yes |
68+
| Hessian collection | ~3.5s | ✅ Yes |
69+
| **Total training-data-access time** | **~595.5s** | **< 600s ✅** |
70+
| Quantization | ~10.1s | ❌ No (uses cached Hessians) |
71+
| Brotli compression | ~65-67s | ❌ No (pure I/O) |
72+
73+
## Technique Stack
74+
75+
All techniques inherited from PR #1851 and its lineage. No new techniques introduced.
76+
77+
| Technique | Source |
78+
|-----------|--------|
79+
| Base architecture (11L, MLP 4×, MuonEq-R) | PR #1787 (@nprime06) |
80+
| SmearGate attention + BOS fix | PR #1797 (@dexhunter) + PR #1851 (@aquariouseworkman) |
81+
| LQER Asymmetric quantization | PR #1797 (@dexhunter) |
82+
| CaseOps SP8192 | PR #1729 (@romeerp) |
83+
| GPTQ + SP8192 | PR #1394 (@clarkkev) |
84+
| Score-first TTT (3 phases) | PR #549 (@abaybektursun) |
85+
| BOS bug identification | @cocohearts |
86+
87+
## Architecture
88+
89+
11L × 512d × 8H/4KV, MLP 4×, LeakyReLU(0.5)², Partial RoPE (16/64 dims), layerwise LN scale, tied embeddings, logit softcap=30.0. Depth recurrence: layers 3–5 looped ×2 (activated at frac=0.35). Parallel residuals from layer 8. XSA on all 11 layers. SmearGate window=12.
90+
91+
## Reproduction
92+
93+
```bash
94+
# Install dependencies
95+
pip install brotli python-minifier
96+
97+
# Prepare CaseOps SP8192 data
98+
python3 prepare_caseops_data.py # downloads from romeerp/parameter-golf-caseops-v1
99+
100+
# Run training (replace SEED with 42, 314, or 1234)
101+
SEED=42 \
102+
CASEOPS_ENABLED=1 \
103+
EMBED_BITS=7 \
104+
SMEAR_GATE_ENABLED=1 \
105+
SPARSE_ATTN_GATE_ENABLED=1 \
106+
MIN_LR=0.1 \
107+
EMBED_CLIP_SIGMAS=15.0 \
108+
MLP_CLIP_SIGMAS=12.0 \
109+
GPTQ_RESERVE_SECONDS=8.0 \
110+
PHASED_TTT_NUM_PHASES=3 \
111+
torchrun --standalone --nproc_per_node=8 train_gpt.py
112+
```
113+
114+
**Hardware:** 8×H100 SXM 80GB (RunPod)
115+
116+
## Credits
117+
118+
- **@aquariouseworkman** — PR #1851 author (SmearGate BOS fix, original seed 42 result)
119+
- **@nprime06** — PR #1787 (base architecture)
120+
- **@romeerp** — PR #1729 (CaseOps)
121+
- **@dexhunter** — PR #1797 (SmearGate + LQER asymmetric quantization)
122+
- **@cocohearts** — BOS document boundary bug identification
123+
- **@abaybektursun** — PR #549 (score-first TTT)
124+
- **@clarkkev** — PR #1394 (GPTQ + SP8192)
125+
- **@Christopher-Lee-McClendon** — Seeds 314/1234 reproduction and compliance re-run

0 commit comments

Comments
 (0)