|
| 1 | +# Spec 060C — 046L deploy-time quant repair on 060A baseline |
| 2 | + |
| 3 | +**Date:** 2026-04-29 |
| 4 | +**Branch:** `exp/060C-deploy-repair` (forked from research, code from `exp/046-quant-repair` @ `fcb816f`) |
| 5 | +**Parent:** 060A checkpoint + 046L code (already exists, just needs cherry-pick into 060A's train_gpt.py). |
| 6 | + |
| 7 | +## Hypothesis |
| 8 | + |
| 9 | +046L's deploy-time quant repair runs a passthrough fp16 fit AT EVAL TIME using AR-self-generated calibration data. Costs ZERO bytes (uses spare 100-180s of eval budget), bypasses the 16MB cap entirely. On 045-armD it was specced but never fully measured; predicted gain ~−0.001 to −0.005 BPB if it acts even partially like TTT. |
| 10 | + |
| 11 | +## Baseline |
| 12 | + |
| 13 | +060A. |
| 14 | + |
| 15 | +## Expected Δ |
| 16 | + |
| 17 | +**−0.001 to −0.005 BPB**, low confidence (was never cleanly validated end-to-end, only specced). |
| 18 | + |
| 19 | +## Accept criteria |
| 20 | + |
| 21 | +- post-quant + post-TTT val_bpb ≤ (060A − 0.0005) |
| 22 | +- eval_time still ≤ 600s (deploy-time repair runs in ~60s; should fit) |
| 23 | +- no NaN, no instability |
| 24 | + |
| 25 | +## Config diff vs 060A |
| 26 | + |
| 27 | +``` |
| 28 | +DEPLOY_TIME_REPAIR_ENABLED=1 |
| 29 | +DEPLOY_TIME_REPAIR_BATCHES=8 |
| 30 | +DEPLOY_TIME_REPAIR_SEQ_LEN=512 |
| 31 | +DEPLOY_TIME_REPAIR_LR=1e-3 |
| 32 | +DEPLOY_TIME_REPAIR_ITERS=5 |
| 33 | +``` |
| 34 | + |
| 35 | +## Code changes |
| 36 | + |
| 37 | +Cherry-pick three blocks from `exp/046-quant-repair` @ `fcb816f` into 060A's `train_gpt.py`: |
| 38 | + |
| 39 | +1. `fit_passthrough_to_self_consistency()` — the passthrough param fit function (~50 lines). |
| 40 | +2. `ARSelfGenCalibLoader` class — generates AR samples without val data leak (~30 lines). |
| 41 | +3. Hook in `train_and_eval()` after `deserialize()` (~10 lines): |
| 42 | + ```python |
| 43 | + eval_model = deserialize(h, device) |
| 44 | + if h.num_loops > 0: |
| 45 | + eval_model.looping_active = True |
| 46 | + if h.deploy_time_repair_enabled: |
| 47 | + repair_calib = generate_ar_calib(eval_model, h, n_batches=h.deploy_time_repair_batches, seq_len=h.deploy_time_repair_seq_len) |
| 48 | + fit_passthrough_to_self_consistency(eval_model, repair_calib, h) |
| 49 | + ``` |
| 50 | + |
| 51 | +Plus 5 new env-var-driven Hyperparameters fields. |
| 52 | + |
| 53 | +## Hardware ladder |
| 54 | + |
| 55 | +- 4×H100, RESUME_FROM_CKPT mode (no re-train; load 060A's pt, repair, eval). |
| 56 | +- ~10-15 min wall, ~$3 cost. |
| 57 | + |
| 58 | +## Seed plan |
| 59 | + |
| 60 | +1 seed: 42. |
| 61 | + |
| 62 | +## Inputs |
| 63 | + |
| 64 | +- Hotstart: `/workspace/runs/060A-1855-port/seed_42/final_model.pt` |
| 65 | +- AR calib generated at eval time (no external data needed beyond what's in the model). |
| 66 | + |
| 67 | +## Stop-early criteria |
| 68 | + |
| 69 | +- Repair fit diverges (loss > initial × 2 after 3 iters) → skip repair, run vanilla eval |
| 70 | +- Post-repair val_bpb > 1.075 → kill (repair broke the model) |
| 71 | + |
| 72 | +## Cost estimate |
| 73 | + |
| 74 | +~$3, ~15 min wall. |
| 75 | + |
| 76 | +## Open questions |
| 77 | + |
| 78 | +1. Cherry-pick onto 060A's #1855-derived train_gpt.py — verify the LR schedule + LoRA-A path doesn't conflict with the 046L hooks. |
| 79 | +2. AR-calib generator may need adjustment for #1855's slightly different forward signature. |
0 commit comments