Summary
A Polars DataFrame with a deeply nested Struct/List column renders fine when displayed directly in a marimo cell. However, if the same dataframe is assigned to a normal top-level variable like x, marimo becomes progressively slower as nesting depth increases. If the dataframe is assigned to an underscore-prefixed variable like _x, the slowdown does not occur.
This suggests the issue is in marimo's handling of exported variables / dataset registration rather than Polars display itself.
Related warning / preview behavior
Once the dataframe is registered as a dataset, previewing the nested payload column also attempts chart generation and logs warnings like:
[W ... preview_column:113] Failed to get chart for column payload in table x
ValueError: Unexpected DtypeKind: Struct(...)
This looks like a second issue:
- nested / unknown column types should probably skip chart generation early
Suspected root cause
The expensive path appears to be recursive sample-value serialization for dataset metadata.
get_sample_values() recursively stringifies nested Python list/dict values without a depth cap, which becomes pathological for recursive struct/list payloads.
Notes
I validated a local patch that:
- caps nested sample serialization depth
- skips chart generation for unknown nested column types
That removes the pathological slowdown in my local tests, but I’m filing this issue first
because the performance regression itself seems worth discussing independently of the
exact fix.
Will you submit a PR?
Environment
{
"marimo": "0.23.3",
"editable": true,
"location": "/Users/.../marimo",
"OS": "Darwin",
"OS Version": "23.5.0",
"Processor": "arm",
"Python Version": "3.13.12",
"Locale": "C/en_US",
"Binaries": {
"Browser": "147.0.7727.116",
"Node": "v24.13.1",
"uv": "0.10.3 (c75a0c625 2026-02-16)"
},
"Dependencies": {
"click": "8.2.1",
"docutils": "0.22.4",
"itsdangerous": "2.2.0",
"jedi": "0.19.2",
"markdown": "3.10.2",
"narwhals": "2.20.0",
"packaging": "26.2",
"psutil": "7.2.2",
"pygments": "2.20.0",
"pymdown-extensions": "10.21.2",
"pyyaml": "6.0.3",
"starlette": "1.0.0",
"tomlkit": "0.14.0",
"typing-extensions": "4.15.0",
"uvicorn": "0.46.0",
"websockets": "16.0"
},
"Optional Dependencies": {
"altair": "6.1.0",
"anywidget": "0.9.21",
"basedpyright": "1.39.3",
"duckdb": "1.5.2",
"ibis-framework": "12.0.0",
"loro": "1.10.3",
"mcp": "1.27.0",
"nbformat": "5.10.4",
"openai": "2.32.0",
"pandas": "3.0.2",
"polars": "1.40.1",
"pyarrow": "24.0.0",
"pytest": "9.0.3",
"python-lsp-server": "1.14.0",
"ruff": "0.15.12",
"sqlglot": "30.6.0",
"vegafusion": "2.0.3"
},
"Experimental Flags": {
"multi_column": true,
"cache_panel": true,
"isolate_apps": true
}
}
Code to reproduce
import marimo
__generated_with = "0.23.1"
app = marimo.App(width="medium")
@app.cell
def _():
import polars as pl
return (pl,)
@app.cell
def _(pl):
def make_dummy_df(nesting_depth: int = 1, rows: int = 3) -> pl.DataFrame:
def build_payload(row_idx: int, depth: int):
base = {
"kind": chr(65 + (row_idx % 26)),
"scores": [row_idx + 1, row_idx + 2, row_idx + 3],
"meta": {
"city": ["Zurich", "Bern", "Geneva", "Basel"][row_idx % 4],
"active": row_idx % 2 == 0,
},
}
if depth == 0:
return base
return {
"level": depth,
"items": [
base,
{"branch": row_idx, "child": build_payload(row_idx, depth - 1)},
],
"summary": {"row": row_idx, "depth": depth},
}
return pl.DataFrame(
{
"row_id": list(range(1, rows + 1)),
"name": [f"row_{i}" for i in range(1, rows + 1)],
"value": [round(10.0 + i * 1.25, 2) for i in range(rows)],
"payload": [build_payload(i, nesting_depth) for i in range(rows)],
}
)
return (make_dummy_df,)
@app.cell
def _(make_dummy_df):
x = make_dummy_df(nesting_depth=7, rows=3)
x
return
if __name__ == "__main__":
app.run()
Control cases
These are fast:
make_dummy_df(nesting_depth=7, rows=3)
_x = make_dummy_df(nesting_depth=7, rows=3)
_x
This becomes very slow:
x = make_dummy_df(nesting_depth=7, rows=3)
x
Observed behavior
With my repro:
- depth 1-5: fine
- depth 6: noticeably slow
- depth 7: very slow
- depth 8+: effectively hangs / UI becomes unusable
Internal timings
I benchmarked the relevant internal paths directly. create_variable_value("x", df) stays flat. The slowdown is in dataset registration, specifically get_datasets_from_variables() and then NarwhalsTableManager.get_sample_values(). Observed timings for get_datasets_from_variables([("x", df)]):
- depth 5: 0.0105s
- depth 6: 0.0818s
- depth 7: 0.6633s
- depth 8: 5.5272s
Breaking that down further, the hotspot is get_sample_values() on the nested payload
column:
- depth 5: 0.0107s
- depth 6: 0.0791s
- depth 7: 0.6322s
- depth 8: 5.3794s
Summary
A Polars
DataFramewith a deeply nestedStruct/Listcolumn renders fine when displayed directly in a marimo cell. However, if the same dataframe is assigned to a normal top-level variable likex, marimo becomes progressively slower as nesting depth increases. If the dataframe is assigned to an underscore-prefixed variable like_x, the slowdown does not occur.This suggests the issue is in marimo's handling of exported variables / dataset registration rather than Polars display itself.
Related warning / preview behavior
Once the dataframe is registered as a dataset, previewing the nested payload column also attempts chart generation and logs warnings like:
This looks like a second issue:
Suspected root cause
The expensive path appears to be recursive sample-value serialization for dataset metadata.
get_sample_values()recursively stringifies nested Python list/dict values without a depth cap, which becomes pathological for recursive struct/list payloads.Notes
I validated a local patch that:
That removes the pathological slowdown in my local tests, but I’m filing this issue first
because the performance regression itself seems worth discussing independently of the
exact fix.
Will you submit a PR?
Environment
{
"marimo": "0.23.3",
"editable": true,
"location": "/Users/.../marimo",
"OS": "Darwin",
"OS Version": "23.5.0",
"Processor": "arm",
"Python Version": "3.13.12",
"Locale": "C/en_US",
"Binaries": {
"Browser": "147.0.7727.116",
"Node": "v24.13.1",
"uv": "0.10.3 (c75a0c625 2026-02-16)"
},
"Dependencies": {
"click": "8.2.1",
"docutils": "0.22.4",
"itsdangerous": "2.2.0",
"jedi": "0.19.2",
"markdown": "3.10.2",
"narwhals": "2.20.0",
"packaging": "26.2",
"psutil": "7.2.2",
"pygments": "2.20.0",
"pymdown-extensions": "10.21.2",
"pyyaml": "6.0.3",
"starlette": "1.0.0",
"tomlkit": "0.14.0",
"typing-extensions": "4.15.0",
"uvicorn": "0.46.0",
"websockets": "16.0"
},
"Optional Dependencies": {
"altair": "6.1.0",
"anywidget": "0.9.21",
"basedpyright": "1.39.3",
"duckdb": "1.5.2",
"ibis-framework": "12.0.0",
"loro": "1.10.3",
"mcp": "1.27.0",
"nbformat": "5.10.4",
"openai": "2.32.0",
"pandas": "3.0.2",
"polars": "1.40.1",
"pyarrow": "24.0.0",
"pytest": "9.0.3",
"python-lsp-server": "1.14.0",
"ruff": "0.15.12",
"sqlglot": "30.6.0",
"vegafusion": "2.0.3"
},
"Experimental Flags": {
"multi_column": true,
"cache_panel": true,
"isolate_apps": true
}
}
Code to reproduce
Control cases
These are fast:
This becomes very slow:
Observed behavior
With my repro:
Internal timings
I benchmarked the relevant internal paths directly.
create_variable_value("x", df)stays flat. The slowdown is in dataset registration, specificallyget_datasets_from_variables()and thenNarwhalsTableManager.get_sample_values(). Observed timings forget_datasets_from_variables([("x", df)]):Breaking that down further, the hotspot is
get_sample_values()on the nested payloadcolumn: