Skip to content

feat(cubestore): error when a query plan node materializes too many rows#10997

Open
waralexrom wants to merge 3 commits into
masterfrom
cubestore-materialized-rows-limit
Open

feat(cubestore): error when a query plan node materializes too many rows#10997
waralexrom wants to merge 3 commits into
masterfrom
cubestore-materialized-rows-limit

Conversation

@waralexrom
Copy link
Copy Markdown
Member

@waralexrom waralexrom commented Jun 2, 2026

Summary

CubeStore now returns an error when a single physical plan node materializes more rows than the configured limit, telling the user to consider creating a pre-aggregation for that stage. Guards against queries that accumulate huge intermediate results in memory instead of failing late or OOMing.

Changes

  • New MaterializedRowsLimitExec wrapper node (modeled after CheckMemoryExec): a pass-through stream with a shared atomic row counter across all partitions of the wrapped node; exceeding the limit fails the query with a user-facing error naming the stage.
  • Wrappers are inserted as the last step of finalize_physical_plan only at points where rows actually accumulate in memory (streaming nodes are not counted):
    • SortExec input (skipped when TopK fetch <= limit — the buffer holds min(input, fetch) rows)
    • HashJoinExec build side and CrossJoinExec left side
    • AggregateExec output (hash groups; sorted/inline streaming aggregates excluded)
    • WindowAggExec input
    • WorkerExec input (worker result collected before sending to the router)
    • plan root on the router (final query result collect)
  • New config option CUBESTORE_MATERIALIZED_ROWS_LIMIT, default 500000, 0 disables the check. Plumbed ConfigObjQueryExecutorImplCubeQueryPlanner.
  • Pretty printer hides the new node behind a dedicated show_materialized_rows_limit_nodes flag, so existing plan assertions are unaffected.

Only the query execution path (QueryExecutor) is affected — ingestion, compaction and cachestore paths are untouched.

Testing

  • Unit tests for MaterializedRowsLimitExec (under/over limit, error message includes stage and pre-aggregation suggestion).
  • E2E test sql::tests::materialized_rows_limit with limit 4: a 3-row query succeeds, a 6-row query fails with the pre-aggregation hint.
  • Full cubestore --lib suite (206), cubestore-sql-tests in-process (168) and cluster (167) suites all green with the default 500k limit enabled.

Add MaterializedRowsLimitExec that counts rows at the points of the plan
where they accumulate in memory: sort and window inputs, join build
sides, aggregation outputs, worker results and the final query result.
When a single node materializes more than the limit, the query fails
with a suggestion to create a pre-aggregation for that stage.

Configured via CUBESTORE_MATERIALIZED_ROWS_LIMIT (default 500000, 0
disables the check).
@waralexrom waralexrom requested a review from a team as a code owner June 2, 2026 17:52
@github-actions github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels Jun 2, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 2, 2026

Claude finished @waralexrom's task in 2m 45s —— View job


Claude PR Review

Verdict: LGTM overall — solid design, clean implementation. 4 low-severity findings, all posted inline.

Review details

Completed checklist

  • Read PR context and diff
  • Read all 8 changed files
  • Checked for stale Claude review threads (none)
  • Posted inline findings
  • Final summary

Summary of findings (all low severity)

# File Concern
1 optimizations/materialized_rows_limit.rs:26 HashJoinExec wraps children()[0] assuming left = build side — fine today, fragile against DF upgrades. Worth a pinning comment/assert.
2 materialized_rows_limit.rs:107 rows counter is shared across execute() calls on the same plan instance and never reset — defensive concern, not a live bug.
3 materialized_rows_limit.rs:132 Error message omits the observed row count, the limit value, and the env var name — adding them makes the error actionable from logs.
4 sql/mod.rs:2747-2794 E2E test only covers router root "query result" stage. Stage labels (hash join build, agg, sort, worker) are the UX promise; one small test per stage would lock the labels against optimizer reshuffles.

Positives

  • MaterializedRowsLimitExec mirrors CheckMemoryExec cleanly; shared atomic counter across partitions is correctly preserved through with_new_children.
  • Pretty printer change is properly gated behind a dedicated flag so existing plan-snapshot tests are untouched.
  • Bottom-up rewrite_physical_plan traversal plus the wrap_children "already wrapped" guard correctly prevents double-counting when a stage is the child of another stage.
  • Skip rule for SortExec when fetch <= limit and for sorted-input AggregateExec is well reasoned and commented.
  • 0 to disable the limit is a useful escape hatch; defaults at 500k seem sensible.
  • Plan plumbing (ConfigObjQueryExecutorImplCubeQueryPlanner, both router and worker constructors) is consistent.
  • Router root wrap is correctly guarded by is_router && !already wrapped so it doesn't double-wrap when the root is already a WorkerExec/AggregateExec etc.

Nits (not posted inline)

  • In optimizations/materialized_rows_limit.rs, AggregateExec uses Ok(wrap(p, ...)) while the others use return wrap_children(...). Cosmetic — picking one style would read better.
  • The comment on total > self.limit could note the off-by-one explicitly (exactly limit rows passes, limit+1 fails) — currently you have to read the code to confirm it matches the doc string ("more than limit rows").
· branch: `cubestore-materialized-rows-limit`

Comment thread rust/cubestore/cubestore/src/sql/mod.rs
Include the observed row count, the limit and the env var name in the
error message. Cover stage labels with e2e and plan-shape tests, pin
the hash join build-side and counter lifetime invariants in comments.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 2, 2026

Claude finished @waralexrom's task in 46s —— View job


Claude PR Review (re-run after ed1529c)

Verdict: LGTM — the new commit addresses all 4 prior findings. No new issues.

Review details

Completed checklist

  • Diffed new commit ed1529c against prior review
  • Re-checked the 4 previously-flagged sites
  • Resolved the 4 stale review threads I owned
  • No new findings to post

How each prior finding was addressed

# Prior concern Resolution in ed1529c
1 HashJoinExec left-as-build invariant unpinned Pinning comment added at optimizations/materialized_rows_limit.rs:26 (// HashJoinExec always builds the hash table from its left input.)
2 Atomic counter not reset across execute() calls Doc comment added on the rows field documenting the invariant (Never reset: plans are built per query and executed once.). Behavior unchanged but the contract is now explicit — acceptable.
3 Error message omitted observed count, limit, env var Now reads materialized at least {total} rows which exceeds the limit of {limit} rows … adjust the CUBESTORE_MATERIALIZED_ROWS_LIMIT environment variable. Unit test asserts each substring.
4 E2E test only covered "query result" stage materialized_rows_limit now also asserts 'aggregation groups' and 'worker result' labels. New materialized_rows_limit_plan test uses the pretty printer to assert all four stage labels (sort input, query result, aggregation groups, worker result) appear in the planned router/worker plans — locks the labels against optimizer reshuffles, which is what I wanted.

Other observations

  • Pretty printer gains a MaterializedRowsLimitExec case so show_materialized_rows_limit_nodes=true produces stable MaterializedRowsLimit, stage: … output — exactly what the new plan test needs.
  • Off-by-one: total > limit ⇒ exactly limit rows pass, limit+1 fails. Error message says "at least {total}" which matches.
  • Hash join / cross join stage labels are still untested at the E2E layer (only worker result + aggregation groups exercise the error path). Minor; the plan test covers the label surface for the common stages.
· branch: `cubestore-materialized-rows-limit`

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.50%. Comparing base (a2c65aa) to head (ed1529c).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (a2c65aa) and HEAD (ed1529c). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (a2c65aa) HEAD (ed1529c)
cubesql 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10997       +/-   ##
===========================================
- Coverage   83.32%   58.50%   -24.82%     
===========================================
  Files         255      216       -39     
  Lines       77177    17151    -60026     
  Branches        0     3494     +3494     
===========================================
- Hits        64304    10035    -54269     
+ Misses      12873     6607     -6266     
- Partials        0      509      +509     
Flag Coverage Δ
cube-backend 58.50% <ø> (?)
cubesql ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…hreshold

A single plan node should not materialize more than one partition's
worth of rows, so CUBESTORE_MATERIALIZED_ROWS_LIMIT now defaults to the
configured CUBESTORE_PARTITION_SPLIT_THRESHOLD (2M rows by default)
instead of a flat 500k. Drop the env var mention from the error
message.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 3, 2026

Claude finished @waralexrom's task in 37s —— View job


Claude PR Review (re-run after b50385e)

Verdict: LGTM — small follow-up commit, no new issues.

Review details

What changed in b50385e

  • CUBESTORE_MATERIALIZED_ROWS_LIMIT default switched from a flat 500_000 to the resolved CUBESTORE_PARTITION_SPLIT_THRESHOLD value (default 2M). Threshold is now computed once and reused for both fields.
  • Error message no longer mentions CUBESTORE_MATERIALIZED_ROWS_LIMIT; the env-var assertion is dropped from the unit test accordingly.

Assessment

  • The "one node should not materialize more than one partition's worth" framing is a good anchor for the default — it ties the operator-facing knob to an existing, already-tuned threshold instead of a magic number, and a user who raises the partition threshold automatically gets a matching materialization budget.
  • partition_split_threshold as usize (u64 → usize) is fine on the 64-bit targets cubestore actually runs on; not worth a checked cast.
  • Removing the env-var hint from the error trades a bit of operator self-service for not leaking a tunable internal — reasonable; the stage name + "consider a pre-aggregation" is still the primary action.
  • Prior round's 4 findings remain addressed (HashJoin invariant comment, counter-doc, row counts in error, stage-label plan test). No regressions in this commit.

Nit (not posted inline)

  • Default 1048576 * 2 is now duplicated between the local let and the in-place env_parse for partition_split_threshold would have read the same — minor, current form is clearer.

· branch: cubestore-materialized-rows-limit ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cube store Issues relating to Cube Store rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants