Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)#1443
Merged
valerio-oai merged 1 commit intoopenai:mainfrom May 3, 2026
Merged
Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)#1443valerio-oai merged 1 commit intoopenai:mainfrom
valerio-oai merged 1 commit intoopenai:mainfrom
Conversation
Pure byte-level Joint-Embedding Predictive Architecture with no tokenizer (vocab=256). Three-stage training: JEPA pretraining -> bridge -> CE+SWA. Addresses the open JEPA bounty in the Requests for PRs section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…ramLite reversal, new directions Subagent re-verified the 3 still-novel patches (TabHash, GatedAttention, MTP) against the latest 25 open PRs. Zero hits — they remain uncontested, even though only MTP shows marginal training-loss benefit at our scale. EngramLite (Patch 22) verdict SOFT-REVERSED: EL2 cycle-2 = 3.2742, only +0.0008 above champion. Tied within noise, not falsified. Spend ~$1.40 / $36 (6% utilization). Pod healthy. New comp directions worth considering for next research fire: Per-Sample SLOT (legal variant of suspicious PR openai#1430), Codebook VQ compression (PR openai#1433), ByteJEPA (PR openai#1443 — non-competitive but novel category). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache Analysis PR #1443 (
|
valerio-oai
approved these changes
May 3, 2026
Contributor
valerio-oai
left a comment
There was a problem hiding this comment.
Selected for the notable non-record submissions section.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vocab_size=256)Results
Why this matters
This is the first submission to successfully change the core learning objective from token-prediction to representation-prediction (JEPA). Byte-level models have an inherent context disadvantage (~4.7x less effective context than SP1024), so the BPB won't match tokenized submissions, but the approach demonstrates that latent-space prediction can drive meaningful language model training.
Test plan