feat(cubestore): error when a query plan node materializes too many rows by waralexrom · Pull Request #10997 · cube-js/cube

waralexrom · 2026-06-02T17:52:32Z

Summary

CubeStore now returns an error when a single physical plan node materializes more rows than the configured limit, telling the user to consider creating a pre-aggregation for that stage. Guards against queries that accumulate huge intermediate results in memory instead of failing late or OOMing.

Changes

New MaterializedRowsLimitExec wrapper node (modeled after CheckMemoryExec): a pass-through stream with a shared atomic row counter across all partitions of the wrapped node; exceeding the limit fails the query with a user-facing error naming the stage.
Wrappers are inserted as the last step of finalize_physical_plan only at points where rows actually accumulate in memory (streaming nodes are not counted):
- SortExec input (skipped when TopK fetch <= limit — the buffer holds min(input, fetch) rows)
- HashJoinExec build side and CrossJoinExec left side
- AggregateExec output (hash groups; sorted/inline streaming aggregates excluded)
- WindowAggExec input
- WorkerExec input (worker result collected before sending to the router)
- plan root on the router (final query result collect)
New config option CUBESTORE_MATERIALIZED_ROWS_LIMIT, default 500000, 0 disables the check. Plumbed ConfigObj → QueryExecutorImpl → CubeQueryPlanner.
Pretty printer hides the new node behind a dedicated show_materialized_rows_limit_nodes flag, so existing plan assertions are unaffected.

Only the query execution path (QueryExecutor) is affected — ingestion, compaction and cachestore paths are untouched.

Testing

Unit tests for MaterializedRowsLimitExec (under/over limit, error message includes stage and pre-aggregation suggestion).
E2E test sql::tests::materialized_rows_limit with limit 4: a 3-row query succeeds, a 6-row query fails with the pre-aggregation hint.
Full cubestore --lib suite (206), cubestore-sql-tests in-process (168) and cluster (167) suites all green with the default 500k limit enabled.

Add MaterializedRowsLimitExec that counts rows at the points of the plan where they accumulate in memory: sort and window inputs, join build sides, aggregation outputs, worker results and the final query result. When a single node materializes more than the limit, the query fails with a suggestion to create a pre-aggregation for that stage. Configured via CUBESTORE_MATERIALIZED_ROWS_LIMIT (default 500000, 0 disables the check).

claude · 2026-06-02T17:52:57Z

Claude finished @waralexrom's task in 2m 45s —— View job

Claude PR Review

Verdict: LGTM overall — solid design, clean implementation. 4 low-severity findings, all posted inline.

Review details

Completed checklist

Summary of findings (all low severity)

#	File	Concern
1	`optimizations/materialized_rows_limit.rs:26`	`HashJoinExec` wraps `children()[0]` assuming left = build side — fine today, fragile against DF upgrades. Worth a pinning comment/assert.
2	`materialized_rows_limit.rs:107`	`rows` counter is shared across `execute()` calls on the same plan instance and never reset — defensive concern, not a live bug.
3	`materialized_rows_limit.rs:132`	Error message omits the observed row count, the limit value, and the env var name — adding them makes the error actionable from logs.
4	`sql/mod.rs:2747-2794`	E2E test only covers router root "query result" stage. Stage labels (hash join build, agg, sort, worker) are the UX promise; one small test per stage would lock the labels against optimizer reshuffles.

Positives

MaterializedRowsLimitExec mirrors CheckMemoryExec cleanly; shared atomic counter across partitions is correctly preserved through with_new_children.
Pretty printer change is properly gated behind a dedicated flag so existing plan-snapshot tests are untouched.
Bottom-up rewrite_physical_plan traversal plus the wrap_children "already wrapped" guard correctly prevents double-counting when a stage is the child of another stage.
Skip rule for SortExec when fetch <= limit and for sorted-input AggregateExec is well reasoned and commented.
0 to disable the limit is a useful escape hatch; defaults at 500k seem sensible.
Plan plumbing (ConfigObj → QueryExecutorImpl → CubeQueryPlanner, both router and worker constructors) is consistent.
Router root wrap is correctly guarded by is_router && !already wrapped so it doesn't double-wrap when the root is already a WorkerExec/AggregateExec etc.

Nits (not posted inline)

In optimizations/materialized_rows_limit.rs, AggregateExec uses Ok(wrap(p, ...)) while the others use return wrap_children(...). Cosmetic — picking one style would read better.
The comment on total > self.limit could note the off-by-one explicitly (exactly limit rows passes, limit+1 fails) — currently you have to read the code to confirm it matches the doc string ("more than limit rows").

· branch: `cubestore-materialized-rows-limit`

Include the observed row count, the limit and the env var name in the error message. Cover stage labels with e2e and plan-shape tests, pin the hash join build-side and counter lifetime invariants in comments.

claude · 2026-06-02T18:14:18Z

Claude finished @waralexrom's task in 46s —— View job

Claude PR Review (re-run after `ed1529c`)

Verdict: LGTM — the new commit addresses all 4 prior findings. No new issues.

Review details

Completed checklist

Diffed new commit ed1529c against prior review
Re-checked the 4 previously-flagged sites
Resolved the 4 stale review threads I owned
No new findings to post

How each prior finding was addressed

#	Prior concern	Resolution in `ed1529c`
1	`HashJoinExec` left-as-build invariant unpinned	Pinning comment added at `optimizations/materialized_rows_limit.rs:26` (`// HashJoinExec always builds the hash table from its left input.`)
2	Atomic counter not reset across `execute()` calls	Doc comment added on the `rows` field documenting the invariant (`Never reset: plans are built per query and executed once.`). Behavior unchanged but the contract is now explicit — acceptable.
3	Error message omitted observed count, limit, env var	Now reads `materialized at least {total} rows which exceeds the limit of {limit} rows … adjust the CUBESTORE_MATERIALIZED_ROWS_LIMIT environment variable.` Unit test asserts each substring.
4	E2E test only covered "query result" stage	`materialized_rows_limit` now also asserts `'aggregation groups'` and `'worker result'` labels. New `materialized_rows_limit_plan` test uses the pretty printer to assert all four stage labels (`sort input`, `query result`, `aggregation groups`, `worker result`) appear in the planned router/worker plans — locks the labels against optimizer reshuffles, which is what I wanted.

Other observations

Pretty printer gains a MaterializedRowsLimitExec case so show_materialized_rows_limit_nodes=true produces stable MaterializedRowsLimit, stage: … output — exactly what the new plan test needs.
Off-by-one: total > limit ⇒ exactly limit rows pass, limit+1 fails. Error message says "at least {total}" which matches.
Hash join / cross join stage labels are still untested at the E2E layer (only worker result + aggregation groups exercise the error path). Minor; the plan test covers the label surface for the common stages.

· branch: `cubestore-materialized-rows-limit`

codecov · 2026-06-02T18:32:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.50%. Comparing base (a2c65aa) to head (ed1529c).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (a2c65aa) and HEAD (ed1529c). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (a2c65aa) HEAD (ed1529c)

cubesql 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #10997       +/-   ##
===========================================
- Coverage   83.32%   58.50%   -24.82%     
===========================================
  Files         255      216       -39     
  Lines       77177    17151    -60026     
  Branches        0     3494     +3494     
===========================================
- Hits        64304    10035    -54269     
+ Misses      12873     6607     -6266     
- Partials        0      509      +509

Flag	Coverage Δ
cube-backend	`58.50% <ø> (?)`
cubesql	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…hreshold A single plan node should not materialize more than one partition's worth of rows, so CUBESTORE_MATERIALIZED_ROWS_LIMIT now defaults to the configured CUBESTORE_PARTITION_SPLIT_THRESHOLD (2M rows by default) instead of a flat 500k. Drop the env var mention from the error message.

claude · 2026-06-03T08:03:48Z

Claude finished @waralexrom's task in 37s —— View job

Claude PR Review (re-run after `b50385e`)

Verdict: LGTM — small follow-up commit, no new issues.

Review details

What changed in `b50385e`

CUBESTORE_MATERIALIZED_ROWS_LIMIT default switched from a flat 500_000 to the resolved CUBESTORE_PARTITION_SPLIT_THRESHOLD value (default 2M). Threshold is now computed once and reused for both fields.
Error message no longer mentions CUBESTORE_MATERIALIZED_ROWS_LIMIT; the env-var assertion is dropped from the unit test accordingly.

Assessment

The "one node should not materialize more than one partition's worth" framing is a good anchor for the default — it ties the operator-facing knob to an existing, already-tuned threshold instead of a magic number, and a user who raises the partition threshold automatically gets a matching materialization budget.
partition_split_threshold as usize (u64 → usize) is fine on the 64-bit targets cubestore actually runs on; not worth a checked cast.
Removing the env-var hint from the error trades a bit of operator self-service for not leaking a tunable internal — reasonable; the stage name + "consider a pre-aggregation" is still the primary action.
Prior round's 4 findings remain addressed (HashJoin invariant comment, counter-doc, row counts in error, stage-label plan test). No regressions in this commit.

Nit (not posted inline)

Default 1048576 * 2 is now duplicated between the local let and the in-place env_parse for partition_split_threshold would have read the same — minor, current form is clearer.

· branch: cubestore-materialized-rows-limit ·

waralexrom requested a review from a team as a code owner June 2, 2026 17:52

github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels Jun 2, 2026

vercel Bot deployed to Preview June 2, 2026 17:54 View deployment

claude Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread rust/cubestore/cubestore/src/queryplanner/optimizations/materialized_rows_limit.rs

claude Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread rust/cubestore/cubestore/src/queryplanner/materialized_rows_limit.rs

claude Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread rust/cubestore/cubestore/src/queryplanner/materialized_rows_limit.rs

claude Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread rust/cubestore/cubestore/src/sql/mod.rs

feat(cubestore): report row counts in materialized rows limit error

ed1529c

Include the observed row count, the limit and the env var name in the error message. Cover stage labels with e2e and plan-shape tests, pin the hash join build-side and counter lifetime invariants in comments.

paveltiunov approved these changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cubestore): error when a query plan node materializes too many rows#10997

feat(cubestore): error when a query plan node materializes too many rows#10997
waralexrom wants to merge 3 commits into
masterfrom
cubestore-materialized-rows-limit

waralexrom commented Jun 2, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 2, 2026 •

edited

Loading

Completed checklist

Summary of findings (all low severity)

Positives

Nits (not posted inline)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 2, 2026 •

edited

Loading

Completed checklist

How each prior finding was addressed

Other observations

Uh oh!

codecov Bot commented Jun 2, 2026

Uh oh!

claude Bot commented Jun 3, 2026 •

edited

Loading

What changed in `b50385e`

Assessment

Nit (not posted inline)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waralexrom commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Uh oh!

claude Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review

Completed checklist

Summary of findings (all low severity)

Positives

Nits (not posted inline)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (re-run after ed1529c)

Completed checklist

How each prior finding was addressed

Other observations

Uh oh!

codecov Bot commented Jun 2, 2026

Codecov Report

Uh oh!

claude Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude PR Review (re-run after b50385e)

What changed in b50385e

Assessment

Nit (not posted inline)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

waralexrom commented Jun 2, 2026 •

edited

Loading

claude Bot commented Jun 2, 2026 •

edited

Loading

claude Bot commented Jun 2, 2026 •

edited

Loading

Claude PR Review (re-run after `ed1529c`)

claude Bot commented Jun 3, 2026 •

edited

Loading

Claude PR Review (re-run after `b50385e`)

What changed in `b50385e`