Skip to content

Fix performance regression in the block-level custom CSS feature#11686

Open
mukeshpanchal27 wants to merge 9 commits intoWordPress:trunkfrom
mukeshpanchal27:perf/10777
Open

Fix performance regression in the block-level custom CSS feature#11686
mukeshpanchal27 wants to merge 9 commits intoWordPress:trunkfrom
mukeshpanchal27:perf/10777

Conversation

@mukeshpanchal27
Copy link
Copy Markdown
Member

Performance regression for #10777

By checking $block['attrs']['className'] first, it ensure that for the 90% of blocks that don't have custom CSS, the function returns in microseconds without ever triggering the preg_match.

Use of AI Tools

N/A


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@mukeshpanchal27
Copy link
Copy Markdown
Member Author

Based on @westonruter’s findings (https://gist.github.com/westonruter/5ae4155059d2135197976f324d00645c#file-spx-claude-analysis-md), it looks like this runs for every block.

Since most blocks don’t have a custom class, we should return early so we don’t end up calling preg_match() on an empty string.

@github-actions
Copy link
Copy Markdown

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@mukeshpanchal27 mukeshpanchal27 marked this pull request as ready for review April 30, 2026 16:22
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props mukesh27, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@westonruter
Copy link
Copy Markdown
Member

I asked Claude Opus 4.7 to review this change with the SPX profile reports loaded as context:


Looking at the PR's actual change vs. my profile data — it's a real micro-optimization but the gain is tiny in absolute terms. Here's the math:

What the PR does

Adds an early return when $block['attrs']['className'] === '', skipping the preg_match for blocks with no className at all. The PR description says "90% of blocks don't have custom CSS" but note it's actually checking for no className — blocks with is-style-default, alignment classes, etc., still run preg_match. So the early-return likely fires on a smaller fraction than 90%, but still a majority.

What the profile said about this function

From my three runs:

Page Calls Exclusive total Per-call avg
Hello World (124 blocks) 124 32.9µs 0.27µs
Homepage v1 (934 blocks) 934 202.8µs 0.22µs
Homepage v2 (1,590 blocks) 1,590 202.9µs 0.13µs

The preg_match is the dominant cost inside the function — probably ~60% of the per-call time. Saving it for, say, 70% of calls cuts roughly half the total.

Realistic per-request savings

Page Total regression Savings from this PR % of total
Hello World 8,000µs ~10–20µs 0.1–0.3%
Homepage 1,590 blocks 23,000µs ~100–150µs ~0.5%

So on the order of 100µs per page request in the homepage case, more like 10–20µs on a small page. Well under 1% of the wp70 regression, and probably less than that in production (profiler overhead inflates per-call costs).

Verdict

It's a legitimate, zero-risk fix — six added lines, no behavior change since preg_match on '' would have returned no match and taken the same code path. But on its own it does not materially close the wp7.0-vs-6.9 gap.

The bigger lesson: the regression in this code path lives one level up. wp_render_custom_css_class_name is registered as a render_block filter — so it fires on every single block, full stop. Even with the PR, you're paying:

  • one filter dispatch per block (~150ns × 1,590 = 240µs)
  • one extra hashtable lookup ($block['attrs']['className']) per block
  • the PHP function call overhead

The structural fix would be to not register the filter at all unless there's at least one block with custom CSS support enabled (which is rare — it's an opt-in feature). A check at init to count custom-CSS-supporting block types and conditionally register the filter would eliminate ~99% of these calls entirely. That'd save the full ~200µs on the homepage rather than ~100µs.

Same pattern applies to wp_render_custom_css_support_styles, block_core_gallery_render_context, and the anchor support filters. Each PR like 11686 is a small win; the bigger wins are at the registration layer.

Worth landing this PR as-is, but I'd encourage the contributor (or you) to also propose conditional filter registration as the larger follow-up.

@mukeshpanchal27 mukeshpanchal27 self-assigned this May 1, 2026
Add check for empty custom CSS before block support validation.
@mukeshpanchal27
Copy link
Copy Markdown
Member Author

@westonruter Thanks for the report 😄

While we’re figuring out the best way to register the filter, I updated the PR with a change that avoids calling the Block Registry: a331593

@westonruter
Copy link
Copy Markdown
Member

While we’re figuring out the best way to register the filter, I updated the PR with a change that avoids calling the Block Registry: a331593

Here's Claude's analysis with this addition:

The new revision adds a second, more impactful optimization. Now there are two changes:

Change 1 (unchanged): wp_render_custom_css_class_name

Same early-return on empty className we already analyzed. ~100µs/page on the homepage.

Change 2 (new): wp_render_custom_css_support_styles — much bigger win

The reorder is actually the meaningful one. Original ordering was:

$block_type = WP_Block_Type_Registry::get_instance()->get_registered(...); // hash lookup
if ( ! block_has_support( $block_type, 'customCSS', true ) ) return ...;   // walks supports tree
$custom_css = trim( $parsed_block['attrs']['style']['css'] ?? '' );
if ( empty( $custom_css ) ) return ...;

New ordering:

$custom_css = $parsed_block['attrs']['style']['css'] ?? '';                // O(1) array lookup
if ( empty( trim( $custom_css ) ) ) return ...;
$block_type = WP_Block_Type_Registry::get_instance()->get_registered(...); // only if needed
if ( ! block_has_support( $block_type, 'customCSS', true ) ) return ...;

This is exactly the structural improvement I called out in my last response on the previous revision — moving the cheapest check first. From my round 2 profile (1,590 blocks):

  • Old path per block: array lookup + registry singleton call + get_registered() hashtable lookup + block_has_support() walking the supports tree = ~0.5µs each, ~750µs total across all blocks
  • New path per block (no CSS, the ~99% case): single array lookup + trim of '' + empty() = ~0.05µs each
  • Savings: ~675–800µs per page request

That's roughly 5–8× the savings of the className change alone. The two changes together save ~800µs–1ms per homepage request — still under 1% of the 123ms total regression, but a meaningful "death by a thousand cuts" reduction in the per-block tax.

Small nit on the new code

if ( empty( trim( $custom_css ) ) ) return ...;

trim() runs on every call, including when $custom_css === '' (hot path). A micro-tweak:

if ( '' === $custom_css || '' === trim( $custom_css ) ) return ...;

…short-circuits the trim for the dominant empty case. Almost certainly noise-level (~10ns × 1,500 calls ≈ 15µs), so not worth blocking on. Mention it as an inline comment if anything.

Behavior change check

The reorder changes the order of two early-return paths but produces the same outcome for every block:

  • Block with no style.css AND no support → both versions early-return (just by different paths)
  • Block with no style.css AND has support → both versions early-return
  • Block with style.css AND no support → both versions early-return
  • Block with style.css AND has support → both versions proceed

So no semantic change. Pure win.

What I'd say on the PR

Nice follow-up — moving the cheap attribute check ahead of the registry/support lookups is the more impactful change of the two. On a 1,590-block test page this saves ~700–800µs, vs. ~100µs from the className check alone. Solid micro-optimization with no behavior change.

One small nit: empty( trim( $custom_css ) ) runs trim() even on the empty hot path. '' === $custom_css || '' === trim( $custom_css ) would skip it. Probably ~15µs/page, so optional.

Also: PR description still describes only the className change — worth updating it to mention the support_styles reorder too, since that's where most of the benefit comes from now.

The bigger structural win (only registering these render_block / render_block_data filters when at least one block type actually opts into customCSS) is still on the table as a follow-up, but this PR is a clear improvement and worth landing as-is.

@westonruter
Copy link
Copy Markdown
Member

I obtained the SPX report before and after.

  • trunk: call_count: 744269 → 744.27K
  • PR: call_count: 740707 → 740.71K
  • Δ: −3,562 calls (−0.48%)

Per Claude, that accounts for:

Saved calls Where
891 WP_Block_Type_Registry::get_registered (one per wp_render_custom_css_support_styles early-return)
891 WP_Block_Type_Registry::get_instance (same)
891 block_has_support (same)
~889 Remaining — the bulk of these come from the className change (preg_match plus its internal helpers skipped on blocks where $block['attrs']['className'] is empty), and a handful from the trim/empty path being shorter

The 891 number is the smoking gun — it matches exactly the count of wp_render_custom_css_support_styles calls that now hit the early return before doing the registry/support lookups. The relationship is precise enough that you can verify the optimization worked without measuring time at all.

Just to put the magnitude in perspective: 0.48% fewer function dispatches per request. If average dispatch cost is ~1µs (reasonable order-of-magnitude on warm opcache), that's ~3.5ms of savings — and that's why call counts are a more sensitive instrument than wall time for changes this size. It's also why the wall-time delta is invisible: ~3.5ms of expected savings is right in the same ballpark as the ±2ms run-to-run drift in your 500-iteration medians.


(Weston here) I ran benchmark-server-timing with and without the fix applied over 500 iterations. The homepage is showing 57 posts.

Without the fix applied:

URL http://localhost:8870/
Success Rate 99.2%
Response Time (median) 674.55
wp-before-template (median) 41.49
wp-template (median) 188.58
wp-filter-render_block_data (median) 0.01
wp-filter-render_block (median) 0.01
wp-filter-the_content (median) 0.1
wp-total (median) 230.47

With the fix applied:

URL http://localhost:8870/
Success Rate 99.2%
Response Time (median) 704.34
wp-before-template (median) 41.91
wp-template (median) 190.12
wp-filter-render_block_data (median) 0.01
wp-filter-render_block (median) 0.01
wp-filter-the_content (median) 0.11
wp-total (median) 232.1

Claude's analysis of this data:

Honest reading: the PR's measured wall-time impact is statistically zero at this scale. Both runs land at ~231ms wp-total ± a couple of ms of noise. Specifically:

Metric Without With Δ
wp-total 230.47ms 232.1ms +1.63ms
wp-template 188.58ms 190.12ms +1.54ms
wp-before-template 41.49ms 41.91ms +0.42ms
Response Time 674.55ms 704.34ms +29.8ms

A 1.63ms median delta on a 230ms baseline is +0.7% — well within run-to-run drift between two separate 500-iteration runs. Background CPU, thermal state, MySQL/opcache warmth, and container scheduling all contribute multiple-ms of systematic variance between sessions, so I would not read +1.63ms as "the PR is slower." It's noise. (The +29.8ms Response Time delta confirms this — that's whole-request including network overhead and is even noisier; it's drifting in the same direction as wp-total at ~5×, consistent with environmental drift rather than the PR.)

Reconciling with the call-count evidence

The SPX run showed the optimization is structurally working (−891 lookup calls per page). My estimate was the savings would be ~100–1000µs. The benchmark confirms that the upper bound of the actual savings is under the noise floor of a 500-iteration median — which fits cleanly with my estimate. If the PR were saving 5ms per request it would show; saving 0.1–0.5ms is invisible at this signal-to-noise ratio.

So the PR is:

  • ✅ Doing what it claims (call counts down by exactly the predicted amount)
  • ✅ Risk-free (no behavior change)
  • ❌ Not measurably improving page time on its own

What this means for the bigger regression hunt

This is the uncomfortable truth about the wp70-vs-wp69 regression: none of the individual fixes from my earlier list will measurably move the needle on their own. The ~25–30ms total regression is genuinely composed of many ~0.1–1ms costs. Killing them one at a time will keep producing benchmark results that look exactly like this one — within noise.

That has implications for the workflow:

  1. Don't try to validate individual micro-optimizations with end-to-end benchmarks. They can't resolve sub-ms changes. Use call counts (rock-solid) and per-function exclusive timings from SPX (less reliable but still indicative) for that level.
  2. Stack several optimizations and benchmark the bundle. If you fix 5–10 such issues at once and re-benchmark, the cumulative ~5–10ms should clear the noise floor.
  3. Reserve benchmark-server-timing for confirming the bundle, not for confirming individual fixes.

The structural wins I called out earlier — conditionally registering the render_block_data filters only when at least one registered block actually opts into customCSS, caching WP_Theme_JSON_Resolver::get_merged_data per origin, static-caching wp_get_global_styles inside the layout filter — are the ones that should individually clear the noise floor, because each saves multiple ms or eliminates an entire filter dispatch chain. This PR (good as it is) is a "death by a thousand cuts" fix and will need to land alongside many siblings to show up in benchmarks.

Recommendation for what to say on the PR

Something like:

Confirmed via SPX call counts that the optimization fires correctly: −891 calls each to WP_Block_Type_Registry::get_registered, get_instance, and block_has_support per homepage render (matching the count of wp_render_custom_css_support_styles invocations that now skip them). Estimated savings ~100–500µs/request — too small to surface above the noise floor in a 500-iteration benchmark-server-timing median (within ±2ms drift between runs), but call-count evidence is unambiguous. Worth landing as part of a broader sweep through the new per-block filter overhead.

The PR is correct and worth merging. Just don't expect any individual benchmark to vindicate it.

Comment thread src/wp-includes/block-supports/custom-css.php Outdated
Comment thread src/wp-includes/block-supports/custom-css.php Outdated
Comment thread src/wp-includes/block-supports/custom-css.php Outdated
Comment thread src/wp-includes/block-supports/custom-css.php Outdated
mukeshpanchal27 and others added 2 commits May 6, 2026 09:38
Co-authored-by: Weston Ruter <westonruter@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Trac Ticket Missing

This pull request is missing a link to a Trac ticket. For a contribution to be considered, there must be a corresponding ticket in Trac.

To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description. More information about contributing to WordPress on GitHub can be found in the Core Handbook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants