You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
110 LOC pure addition to train_gpt.py, fully env-gated by
BIGRAM_HASH_ENABLED=0/1. Default-off invariant: with env unset the
forward pass, state_dict, and optimizer param list are byte-identical
to baseline.
Components:
- BigramHashEmbedding(nn.Module): embed(buckets, dim) + CastedLinear
proj(dim, model_dim). proj._zero_init=True -> identity at step 0.
Hash: ((prime_a * curr) ^ (prime_b * prev)) % buckets. Position-0
fallback: prev = curr (self-bigram). Cross-doc leakage not special
cased, matching openai#1736's SmearGate convention.
- GPT.__init__: creates self.bigram_embed when enabled else None.
- forward_logits + forward_ttt: additive merge of bigram(input_ids)
to tok_emb(input_ids) before SmearGate. attr-guarded.
- Optimizers: embed.weight -> AdamW optimizer_tok (embed_wd), proj.weight
-> Muon matrix_params.
- GPTQ hessian hooks: bigram_embed.embed output -> (dim,dim) hessian;
bigram_embed.proj input -> (dim,dim) hessian (proj is <=65536 numel
so fp16 passthrough; harmless hook).
- Startup log line echoing config.
Sizing: 16384*32 int6 embed ~= 393KB. 512*32 fp16 proj = 32KB.
Total ~425KB added to artifact; budget dry-run needed before launch.
Env vars (defaults): BIGRAM_HASH_ENABLED=0, BIGRAM_HASH_BUCKETS=16384,
BIGRAM_HASH_DIM=32, BIGRAM_HASH_PRIME_A=36313, BIGRAM_HASH_PRIME_B=27191.
Bug lesson learned from exp/training-bundle commit 8d54854: when Edit's
old_string only captures part of a for-loop body, trailing loop
statements get pushed outside the loop and may be absorbed by nearby
conditional blocks. This patch is a pure prepend/append style (no
splits of existing blocks) so that failure mode is avoided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments