Skip to content

feat(server): /load backend=polars, /load_expr config parity, per-client search isolation (#851)#882

Open
paddymul wants to merge 2 commits into
mainfrom
feat/server-load-polars-and-search-isolation
Open

feat(server): /load backend=polars, /load_expr config parity, per-client search isolation (#851)#882
paddymul wants to merge 2 commits into
mainfrom
feat/server-load-polars-and-search-isolation

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

What this does

Bundles a set of standalone-server /load enhancements that were sitting uncommitted on feat/load-column-config-overrides. They extend the server so a POST /load (and /load_expr) call can reach the same configuration surface a notebook BuckarooInfiniteWidget already has, add a polars loading path, and fix a multi-client live-search bug.

1. backend='polars' for POST /load

New module buckaroo/server/data_loading_polars.py:

  • load_file_polars / get_metadata_polars — eager parquet/csv/tsv/json read into a pl.DataFrame.
  • PolarsServerDataflow — polars analogue of ServerDataflow (polars analysis / autocleaning / stats / sampling), capped at pre_limit=1_000_000 rows for the stats pipeline.
  • handle_infinite_request_buckaroo_polars — polars row-fetch handler; applies the live search_string as a literal substring match on String columns, matching the pandas search_df_str semantics.

handlers.py validates backend (pandas default, or polars; polars only valid with mode='buckaroo') and routes the load/build path. websocket_handler.py routes the infinite row-fetch to the polars handler when session.backend == 'polars'. polars stays an optional dependency — it's imported lazily, only when a request actually asks for backend='polars'.

2. Per-client search_string (fixes #851)

search_string was stored on the shared SessionState, so two browser clients on the same session clobbered each other's live-typed filter (and each other's highlight). It now lives on the DataStreamHandler instance:

  • The row-fetch dispatch reads self.search_string instead of session.search_string.
  • A new _send_highlight_overlay sends an initial_state with highlight_phrase injected into string-column displayer_args to only the typing client — never broadcast — and is skipped when a dataflow rebuild is already going to broadcast the highlight anyway.
  • The term is stripped before snapshotting buckaroo_state onto the session, and reset to "" on every new /load / /load_expr push.

3. /load_expr config-override parity

LoadExprHandler now forwards column_config_overrides, extra_grid_config, and init_sd to XorqServerDataflow, mirroring the /load kwargs added in 6bf4c12b.

4. init_sd vs. highlight ordering (styling.py)

init_sd displayer_args are now merged before lowcode-op highlight metadata is injected, so a column whose pandas type is obj but which init_sd promotes to displayer: 'string' still picks up highlight_phrase. The highlight injection skips any key the caller already set, so an explicit init_sd highlight wins.

Tangential (also uncommitted, rode along)

  • pyproject.toml / uv.lock: add matplotlib (used by buckaroo/customizations/histogram.py).
  • packages/buckaroo-js-core/src/stories/DFViewerDirect.stories.tsx: a Storybook story demonstrating the direct <DFViewer> consumer pattern.

Base note

This PR is based on main but includes commit 6bf4c12b (/load accepts column_config_overrides, extra_grid_config, init_sd) in addition to the new work, because the polars/load_expr changes build directly on that plumbing and that commit is not yet in main. If 6bf4c12b lands separately first, rebasing will collapse this PR's diff to just the new commit.

Testing

  • pytest tests/unit/server/ — 103 passed, 1 skipped (the one failure, test_mcp_uvx_install, is a network uvx install that timed out — unrelated).
  • pytest tests/unit/.../customizable_dataflow_test.py tests/unit/server/test_server.py -k "highlight or init_sd or string" — 6 passed.
  • ruff check + paddy-format --check (full tree) — clean.
  • No dedicated tests for the polars backend or the docs(articles): add column-config styling gallery #851 per-client behaviour yet — worth a follow-up.

🤖 Generated with Claude Code

paddymul and others added 2 commits May 24, 2026 14:44
…g, init_sd

The headless ServerDataflow that powers mode="buckaroo" sessions
already supports these kwargs (it inherits CustomizableDataflow); they
just weren't exposed through the POST /load body. Now you can match a
notebook BuckarooInfiniteWidget invocation server-side:

    POST /load { "session": "demo", "path": "...", "mode": "buckaroo",
                 "column_config_overrides": {...},
                 "extra_grid_config": {"rowHeight": 70},
                 "init_sd": {...} }

All three are optional. Existing /load calls (which don't include the
new fields) are unchanged — body.get(...) returns None and the
dataflow handles None just like the widget does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures work-in-progress standalone-server /load enhancements that were
sitting uncommitted on feat/load-column-config-overrides.

* backend='polars' for POST /load — new data_loading_polars.py
  (PolarsServerDataflow, load_file_polars, get_metadata_polars,
  handle_infinite_request_buckaroo_polars). mode='buckaroo' can now build
  a polars-backed dataflow. handlers.py validates/routes the backend;
  websocket_handler.py routes the infinite row-fetch to the polars
  handler. polars stays optional — imported lazily only when requested.

* Per-client search_string (#851): search_string moves off the shared
  SessionState onto each DataStreamHandler. Two clients sharing a session
  were clobbering each other's live-typed filter. A targeted highlight
  overlay (_send_highlight_overlay) goes to the typing client only; the
  term is stripped before snapshotting onto the session and reset to ""
  on every new /load.

* /load_expr config parity: LoadExprHandler forwards
  column_config_overrides / extra_grid_config / init_sd to
  XorqServerDataflow (mirrors the /load kwargs added in 6bf4c12).

* styling: merge init_sd displayer_args before injecting lowcode-op
  highlight metadata, so an init_sd-promoted string column still picks
  up highlight_phrase and an explicit init_sd highlight wins.

* deps: add matplotlib (used by customizations/histogram.py).

* js: DFViewerDirect storybook story (direct <DFViewer> consumer demo).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9fe25809d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

elif ext in (".parquet", ".parq"):
return pl.read_parquet(path)
elif ext == ".json":
return pl.read_ndjson(path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Load regular JSON files with the polars backend

When POST /load is called with backend='polars' for a .json file, this uses the NDJSON reader, so ordinary JSON documents that the existing pandas /load path accepts via pd.read_json (for example a records array like [{"a":1}]) fail even though they have the same .json extension. This makes the new backend unexpectedly reject valid JSON inputs; use the regular JSON reader here or reserve read_ndjson for an NDJSON-specific extension/option.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant