Skip to content

Commit ff90522

Browse files
committed
Align README and submission.json with PR #2014/#2130 conventions
1 parent f086a9f commit ff90522

2 files changed

Lines changed: 35 additions & 32 deletions

File tree

records/track_10min_16mb/2026-05-01_SP8192_PR2130Base_Calib32/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
1-
# Record candidate: PR #1797 base + Token-only n-gram tilt + AsymLogit Rescale + #2060 levers + NUM_PHASES=1 + GPTQ_CALIBRATION_BATCHES=32 — val_bpb 1.05651 (3-seed mean)
1+
# Record candidate: PR #1797 base + Token-only n-gram tilt + AsymLogit Rescale + #2060 levers + NUM_PHASES=1 + GPTQ_CALIBRATION_BATCHES=32
22

3-
**val_bpb: 1.05651** (3-seed mean, sample std 3.58e-04) | **15.95 MB max** | 8xH100 SXM | 600s train / 600s eval
3+
**val_bpb: 1.05651** (3-seed mean, std 0.00036) | **val_loss: 2.31203 nats** (std 0.00078) | **15.95 MB max** | 8xH100 SXM | 600s train / 600s eval
44

5-
**Improvement over comparison baseline PR #2130 (1.05670 BPB):** -0.00019 BPB
6-
**Improvement over merged PR #1855 leaderboard record (1.06108 BPB):** -0.00457 BPB
5+
**Improvement over merged PR #1855 leaderboard record (1.06107587 BPB):**
6+
**-0.00457 BPB / -0.01000 nats**
77

88
This submission keeps the PR #2130 architecture/training stack identical and only changes: **GPTQ_CALIBRATION_BATCHES=32**. Every other hyperparameter, env var, and code path is byte-for-byte the PR #2130 reproduction command.
99

1010
## Results
1111

12-
| Seed | Pre-quant BPB | Quant BPB | Post-TTT BPB | TTT eval s | Artifact bytes |
13-
|------|---------------|-----------|--------------|------------|----------------|
14-
| 0 | 1.06105556 | 1.06939370 | **1.05679341** | 567.1 | 15,942,822 |
15-
| 42 | 1.06026908 | 1.06867913 | **1.05610947** | 540.1 | 15,947,490 |
16-
| 314 | 1.06091124 | 1.06921334 | **1.05662016** | 567.1 | 15,945,305 |
17-
| **Mean** | **1.06074529** | **1.06909539** | **1.05650768** | **558.1** | **15,945,206** |
12+
| Seed | Steps | ms/step | Train ms | Pre-quant BPB | Quant BPB | **Post-TTT BPB** | TTT eval s | Artifact bytes |
13+
|-----:|------:|--------:|---------:|--------------:|----------:|-----------------:|-----------:|---------------:|
14+
| 0 | 4,997 | 120.0 | 599,608 | 1.06105556 | 1.06939370 | **1.05679341** | 567.1 | 15,942,822 |
15+
| 42 | 5,001 | 119.9 | 599,672 | 1.06026908 | 1.06867913 | **1.05610947** | 540.1 | 15,947,490 |
16+
| 314 | 4,983 | 120.3 | 599,593 | 1.06091124 | 1.06921334 | **1.05662016** | 567.1 | 15,945,305 |
17+
| **Mean** | **4,994** | **120.1** | **599,624** | **1.06074529** | **1.06909539** | **1.05650768** | **558.1** | **15,945,206** |
1818

19-
3-seed sample std (n−1): 3.58e-04 BPB / 7.78e-04 nats.
19+
3-seed sample std: **0.00035573 BPB / 0.00077803 nats**.
2020

21-
All seeds under the 16,000,000-byte artifact cap and 600s train/eval budgets.
21+
All seeds under the 16,000,000-byte artifact cap and 600s train/eval budgets. Maximum artifact is **15,947,490 bytes** and the maximum eval pass is **567.1s**.
2222

2323
## Full validation coverage
2424

records/track_10min_16mb/2026-05-01_SP8192_PR2130Base_Calib32/submission.json

Lines changed: 23 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,20 @@
22
"author": "Benjamin Hadad",
33
"github_id": "codemath3000",
44
"name": "PR #1797 base + n-gram tilt + AsymLogit + #2060 levers + NUM_PHASES=1 + GPTQ_CALIBRATION_BATCHES=32",
5-
"blurb": "Single-knob refinement on top of PR #2130: GPTQ_CALIBRATION_BATCHES=32. Every other hyperparameter, env var, and code path is byte-for-byte the PR #2130 reproduction command. 3-seed mean val_bpb 1.05651 (sample std 3.58e-04), beats PR #2130's 1.05670 by 0.00019 BPB. Full validation target coverage preserved (val_tokens == target_tokens == 47851520).",
5+
"blurb": "11L 512d SP8192 CaseOps stack with LQER asymmetric rank-4 correction, SparseAttnGate, BOS-fixed SmearGate, AsymLogit Rescale, token-only n-gram tilt (per merged PR #1514), per-group compression, single-phase quantized phased score-first TTT (LoRA RANK=80, K-off), with GPTQ calibration batches doubled from 16 to 32. Full validation target coverage preserved: val_tokens == target_tokens == 47,851,520 in all included logs.",
66
"date": "2026-05-01",
77
"track": "10min_16mb",
88
"val_loss": 2.31203043,
99
"val_bpb": 1.05650768,
1010
"val_loss_std": 0.00077803,
1111
"val_bpb_std": 0.00035573,
12-
"seeds": [
13-
314,
14-
42,
15-
0
16-
],
12+
"seeds": [314, 42, 0],
1713
"seed_results": {
1814
"314": {
1915
"val_loss": 2.31227656,
2016
"val_bpb": 1.05662016,
2117
"artifact_bytes": 15945305,
18+
"steps": 4983,
2219
"step_avg_ms": 120.32,
2320
"train_wallclock_ms": 599593,
2421
"eval_time_s": 567.1,
@@ -31,6 +28,7 @@
3128
"val_loss": 2.31115900,
3229
"val_bpb": 1.05610947,
3330
"artifact_bytes": 15947490,
31+
"steps": 5001,
3432
"step_avg_ms": 119.91,
3533
"train_wallclock_ms": 599672,
3634
"eval_time_s": 540.1,
@@ -43,6 +41,7 @@
4341
"val_loss": 2.31265572,
4442
"val_bpb": 1.05679341,
4543
"artifact_bytes": 15942822,
44+
"steps": 4997,
4645
"step_avg_ms": 119.99,
4746
"train_wallclock_ms": 599608,
4847
"eval_time_s": 567.1,
@@ -52,32 +51,36 @@
5251
"quantized_val_bpb": 1.06939370
5352
}
5453
},
55-
"comparison_baseline": "PR #2130",
56-
"comparison_baseline_bpb": 1.0566953,
57-
"delta_vs_baseline_bpb": -0.00018762,
58-
"merged_leaderboard_baseline": "PR #1855",
59-
"merged_leaderboard_baseline_bpb": 1.06107587,
54+
"comparison_baseline": "PR #1855 merged leaderboard record",
55+
"comparison_baseline_bpb": 1.06107587,
56+
"comparison_baseline_loss": 2.32202732,
6057
"delta_vs_leaderboard_bpb": -0.00456819,
58+
"delta_vs_leaderboard_nats": -0.00999689,
59+
"artifact_bytes_mean": 15945206,
6160
"artifact_bytes_max": 15947490,
61+
"bytes_total": 15947490,
62+
"train_steps_mean": 4993.67,
63+
"train_wallclock_ms_mean": 599624,
6264
"train_wallclock_ms_max": 599672,
6365
"step_avg_ms_mean": 120.07,
6466
"eval_time_s_mean": 558.1,
6567
"eval_time_s_max": 567.1,
6668
"hardware": "8xH100 80GB SXM",
6769
"pytorch_version": "2.11.0+cu130",
68-
"lever_changes": [
69-
{
70-
"name": "GPTQ_CALIBRATION_BATCHES",
71-
"from": 16,
72-
"to": 32
73-
}
74-
],
70+
"technique_summary": "PR #2130 stack (token-only n-gram tilt + AsymLogit Rescale + #2060 hp levers + NUM_PHASES=1 on PR #1797 / PR #1855 lineage) with a single hyperparameter change: GPTQ_CALIBRATION_BATCHES doubled from 16 to 32. All other code paths and env vars are byte-for-byte identical to the PR #2130 reproduction.",
7571
"compliance": {
7672
"artifact_under_16mb": true,
77-
"train_under_600s": true,
73+
"train_under_600s_strict": true,
7874
"eval_under_600s": true,
79-
"n_seeds": 3,
75+
"three_seeds": true,
8076
"full_validation_targets": true,
77+
"score_before_update": "Quantized phased TTT scores each validation chunk before the LoRA update; inherited unchanged from PR #2130. Within-word and word-start n-gram channels are explicitly disabled (logs confirm within_gate=0 word_gate=0 agree2plus=0).",
78+
"no_validation_training_leak": true,
8179
"val_tokens_equal_target_tokens": true
80+
},
81+
"logs": {
82+
"seed314": "train_seed314.log",
83+
"seed42": "train_seed42.log",
84+
"seed0": "train_seed0.log"
8285
}
8386
}

0 commit comments

Comments
 (0)