Named Polars DataFrames with deeply nested struct columns become extremely slow when registered as datasets

## Summary

A Polars `DataFrame` with a deeply nested `Struct/List` column renders fine when displayed directly in a marimo cell. However, if the same dataframe is assigned to a normal top-level variable like `x`, marimo becomes progressively slower as nesting depth increases. If the dataframe is assigned to an underscore-prefixed variable like `_x`, the slowdown does not occur.

This suggests the issue is in marimo's handling of exported variables / dataset registration rather than Polars display itself.

## Related warning / preview behavior

Once the dataframe is registered as a dataset, previewing the nested payload column also attempts chart generation and logs warnings like:

```
[W ... preview_column:113] Failed to get chart for column payload in table x
ValueError: Unexpected DtypeKind: Struct(...)
```

This looks like a second issue:

- nested / unknown column types should probably skip chart generation early

## Suspected root cause

The expensive path appears to be recursive sample-value serialization for dataset metadata.

`get_sample_values()` recursively stringifies nested Python list/dict values without a depth cap, which becomes pathological for recursive struct/list payloads.

## Notes

I validated a local patch that:

- caps nested sample serialization depth
- skips chart generation for unknown nested column types

That removes the pathological slowdown in my local tests, but I’m filing this issue first
because the performance regression itself seems worth discussing independently of the
exact fix.

### Will you submit a PR?

- [x] Yes

### Environment

{
  "marimo": "0.23.3",
  "editable": true,
  "location": "/Users/.../marimo",
  "OS": "Darwin",
  "OS Version": "23.5.0",
  "Processor": "arm",
  "Python Version": "3.13.12",
  "Locale": "C/en_US",
  "Binaries": {
    "Browser": "147.0.7727.116",
    "Node": "v24.13.1",
    "uv": "0.10.3 (c75a0c625 2026-02-16)"
  },
  "Dependencies": {
    "click": "8.2.1",
    "docutils": "0.22.4",
    "itsdangerous": "2.2.0",
    "jedi": "0.19.2",
    "markdown": "3.10.2",
    "narwhals": "2.20.0",
    "packaging": "26.2",
    "psutil": "7.2.2",
    "pygments": "2.20.0",
    "pymdown-extensions": "10.21.2",
    "pyyaml": "6.0.3",
    "starlette": "1.0.0",
    "tomlkit": "0.14.0",
    "typing-extensions": "4.15.0",
    "uvicorn": "0.46.0",
    "websockets": "16.0"
  },
  "Optional Dependencies": {
    "altair": "6.1.0",
    "anywidget": "0.9.21",
    "basedpyright": "1.39.3",
    "duckdb": "1.5.2",
    "ibis-framework": "12.0.0",
    "loro": "1.10.3",
    "mcp": "1.27.0",
    "nbformat": "5.10.4",
    "openai": "2.32.0",
    "pandas": "3.0.2",
    "polars": "1.40.1",
    "pyarrow": "24.0.0",
    "pytest": "9.0.3",
    "python-lsp-server": "1.14.0",
    "ruff": "0.15.12",
    "sqlglot": "30.6.0",
    "vegafusion": "2.0.3"
  },
  "Experimental Flags": {
    "multi_column": true,
    "cache_panel": true,
    "isolate_apps": true
  }
}

### Code to reproduce

```python
import marimo
__generated_with = "0.23.1"
app = marimo.App(width="medium")

@app.cell
def _():
    import polars as pl
    return (pl,)

@app.cell
def _(pl):
    def make_dummy_df(nesting_depth: int = 1, rows: int = 3) -> pl.DataFrame:
        def build_payload(row_idx: int, depth: int):
            base = {
                "kind": chr(65 + (row_idx % 26)),
                "scores": [row_idx + 1, row_idx + 2, row_idx + 3],
                "meta": {
                    "city": ["Zurich", "Bern", "Geneva", "Basel"][row_idx % 4],
                    "active": row_idx % 2 == 0,
                },
            }
            if depth == 0:
                return base
            return {
                "level": depth,
                "items": [
                    base,
                    {"branch": row_idx, "child": build_payload(row_idx, depth - 1)},
                ],
                "summary": {"row": row_idx, "depth": depth},
            }
        return pl.DataFrame(
            {
                "row_id": list(range(1, rows + 1)),
                "name": [f"row_{i}" for i in range(1, rows + 1)],
                "value": [round(10.0 + i * 1.25, 2) for i in range(rows)],
                "payload": [build_payload(i, nesting_depth) for i in range(rows)],
            }
        )
    return (make_dummy_df,)

@app.cell
def _(make_dummy_df):
    x = make_dummy_df(nesting_depth=7, rows=3)
    x
    return

if __name__ == "__main__":
    app.run()
```

## Control cases

These are fast:

```
make_dummy_df(nesting_depth=7, rows=3)
```

```
_x = make_dummy_df(nesting_depth=7, rows=3)
_x
```

This becomes very slow:

```
x = make_dummy_df(nesting_depth=7, rows=3)
x
```

## Observed behavior

With my repro:
- depth 1-5: fine
- depth 6: noticeably slow
- depth 7: very slow
- depth 8+: effectively hangs / UI becomes unusable

## Internal timings

I benchmarked the relevant internal paths directly. `create_variable_value("x", df)` stays flat. The slowdown is in dataset registration, specifically `get_datasets_from_variables()` and then `NarwhalsTableManager.get_sample_values()`. Observed timings for `get_datasets_from_variables([("x", df)])`:

- depth 5: 0.0105s
- depth 6: 0.0818s
- depth 7: 0.6633s
- depth 8: 5.5272s

Breaking that down further, the hotspot is `get_sample_values()` on the nested payload
column:

- depth 5: 0.0107s
- depth 6: 0.0791s
- depth 7: 0.6322s
- depth 8: 5.3794s


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Named Polars DataFrames with deeply nested struct columns become extremely slow when registered as datasets #9378

Summary

Related warning / preview behavior

Suspected root cause

Notes

Will you submit a PR?

Environment

Code to reproduce

Control cases

Observed behavior

Internal timings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Named Polars DataFrames with deeply nested struct columns become extremely slow when registered as datasets #9378

Description

Summary

Related warning / preview behavior

Suspected root cause

Notes

Will you submit a PR?

Environment

Code to reproduce

Control cases

Observed behavior

Internal timings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions