Skip to content

feat(tesseract): native domain model representation behind env flag#10986

Open
waralexrom wants to merge 5 commits into
masterfrom
tesseract-native-model-data
Open

feat(tesseract): native domain model representation behind env flag#10986
waralexrom wants to merge 5 commits into
masterfrom
tesseract-native-model-data

Conversation

@waralexrom

Copy link
Copy Markdown
Member

Summary

Introduces the Tesseract native domain model — a Rust-side representation of the schema and the bridge to populate it from JS — staged behind the off-by-default CUBEJS_TESSERACT_NATIVE_MODEL flag. The model is built and held but not yet consumed for SQL: the planner stays on the existing per-request path, so this PR is a no-op in production and a foundation for the follow-up that routes planning through the model.

Changes

  • Domain model (cubesqlplanner/src/model/*): cubes, measures, dimensions, segments, joins, pre-aggregations, access policies, and view resolution, plus the cube_bridge traits and SchemaModelBuilder that populate it from the JS schema.
  • Native endpoints + JS wrapper: prepareModel builds the model and hands JS a JsBox handle wrapped in a TesseractModel; CubeEvaluator builds it at the end of compile() only when CUBEJS_TESSERACT_NATIVE_MODEL is enabled. BaseQuery is unchanged (still nativeBuildSqlAndParams).
  • Measure types: MeasureType now supports the multi-stage-only rank / numberAgg types so cubes with rank measures build; build_multi_stage_spec maps rank to a filtering stage.
  • Scope trim: hierarchies are deliberately excluded — they're presentation-only metadata (BI drill-down, /meta) and never participate in SQL generation. Pre-aggregation build/refresh metadata (refresh_key, indexes, build_range, …) is collected but not yet read, kept on purpose for the upcoming index/refresh-key SQL work.

Testing

  • cargo check + cargo clippy --tests clean; 975 cubesqlplanner lib tests pass (incl. YAML-fixture model build with rank/numberAgg).
  • yarn test:bridge → 205/205 native bridge tests, including the new model-roundtrip suite and the extended object-bridges-coverage for the new cube/measure/dimension/segment/granularity/pre-agg getters.
  • tsc passes on @cubejs-backend/shared, @cubejs-backend/native, @cubejs-backend/schema-compiler.
  • Flag off by default → no production behavior change.

@waralexrom waralexrom requested review from a team as code owners June 1, 2026 17:17
@github-actions github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Jun 1, 2026
@claude

claude Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 3m 22s —— View job


Code review — verdict: approve with notes (0 high, 0 medium, 4 low)

The PR cleanly stages the native model behind CUBEJS_TESSERACT_NATIVE_MODEL (default off), routes through tryPrepareTesseractModel with errors degraded to warnings, and never touches the planner — so the production behavior promise (no-op when flag is off) holds. The Rust side is well-organized, the bridge surface is mirrored by the JS contract in bridge-fixtures.ts, and the new model-roundtrip test exercises real prepareModel → handle → describe roundtrips. Posted 4 low-severity inline notes; none block merge.

Details

Inline findings

  • model/builder.rs:375-389build_case_label swallows MemberSql::sql() errors and substitutes an empty CaseLabel::String(""). Since this is the one-shot build phase (and tryPrepareTesseractModel already has a warning channel), the natural fix is to return Result<CaseLabel, CubeError> and let a malformed case label fail fast at build time.
  • model/model.rs:69-71ModelBuilder::add_cube silently overwrites on duplicate CubeName. build already returns Result; surfacing a "duplicate cube" error would catch a class of schema-compiler bugs that otherwise vanish silently.
  • model/path.rs:53-65MemberPath::parse is intentionally 2-segment-only. None of the current call sites feed join-hinted paths through it, but the restriction is load-bearing and quietly violatable; worth either a clearer doc-comment or growing the type to model join hints explicitly.
  • packages/cubejs-schema-compiler/src/compiler/SchemaSource.ts:40-46wrapDimension / wrapPreAggregation mutate gran.name/idx.name on the underlying objects, leaking a name field back onto the cached EvaluatedCube. Spreading into a fresh object would keep this wrapper read-only against the evaluator.

Notes on what looks right

  • CubeEvaluator.tryPrepareTesseractModel correctly degrades native errors into warnings, and compile() still proceeds — schema compilation cannot fail just because the model layer hiccups.
  • Lifetime story (NativeRustHandle held by TesseractModel, released via JS GC when the evaluator is replaced) is sound and called out in comments.
  • MeasureType::Rank / NumberAgg only become valid when the multi-stage flag is set; build_multi_stage_spec maps Rank to Filtering and everything else to Aggregating, matching the JS validator contract.
  • build_included_member correctly drops "hierarchies" entries via Result<Option<_>> + transpose() + filter_map, so a view that includes a hierarchy contributes no SQL member.
  • The bridge-test harness uses a drift guard (invoke_bridge_inner compares invoked vs. expected field sets) so a new trait method without a matching invoke entry fails loudly — good defense against silent coverage loss.

Security / perf / docs

  • No new auth or input-validation surface; prepareModel consumes already-validated schema-compiler output.
  • Model build is per-schema-compile, not per-request, so cost is one-time.
  • Env flag isn't documented user-facing yet — fine for an off-by-default feature, but worth a docs follow-up once it flips on.
· [Branch](https://github.com/cube-js/cube/tree/tesseract-native-model-data)

Comment on lines +375 to +389
fn build_case_label(label: StringOrSql) -> CaseLabel {
match label {
StringOrSql::String(s) => CaseLabel::String(s),
StringOrSql::MemberSql(member) => {
// StructWithSqlMember holds a `sql` callable behind another
// trait — surface it as Expression. This swallows the
// `sql()` Result; we'd rather fail at build time, but the
// bridge surface returns it eagerly.
match member.sql() {
Ok(sql) => CaseLabel::Sql(Expression::new(sql)),
Err(_) => CaseLabel::String(String::new()),
}
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_case_label silently swallows a MemberSql::sql() error and substitutes an empty CaseLabel::String(""). That converts a real bridge failure (e.g. a malformed JS callable returned for a case label) into a built model with a broken/empty label that won't surface until a query actually hits this case. The comment acknowledges this is a shortcut, but since this is the model-build phase (one-shot, errors are already meant to be reported via tryPrepareTesseractModel's warning channel), the natural fix is to make build_case_label return Result<CaseLabel, CubeError> and propagate the error up through build_case_variant. That way a broken label fails fast at build time rather than producing wrong SQL later.

Comment on lines +69 to +71
pub fn add_cube(&mut self, cube: Rc<Cube>) {
let name = cube.name.clone();
self.cubes.insert(name, cube);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_cube silently overwrites on duplicate CubeName. If two cube definitions ever share a name (e.g. a bug in schema-compiler, a stale view) the second one wins with no diagnostic — the model loses a cube and nothing tells you. Consider returning an error when the key is already present, since SchemaModelBuilder::build already returns Result<Model, CubeError> and could surface a clear "duplicate cube" message at model-build time.

Comment on lines +53 to +65
/// Parses a `Cube.member` reference. Returns an error for paths
/// that do not split into exactly two segments (we'll grow this
/// to support view-style join chains later).
pub fn parse(path: &str) -> Result<Self, cubenativeutils::CubeError> {
match path.split_once('.') {
Some((cube, name)) if !cube.is_empty() && !name.is_empty() => {
Ok(MemberPath::new(CubeName::new(cube), name.to_string()))
}
_ => Err(cubenativeutils::CubeError::user(format!(
"Invalid member path: {path}"
))),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemberPath::parse is restricted to exactly two segments, but it's already being called on inputs that the JS layer may hand over with a join-hint prefix (e.g. evaluatePreAggregationReferences collects with { collectJoinHints: true }, producing View.Cube.member). In the current PR none of the call sites in builder.rs go through those join-hinted paths (access-policy resolves to cube.member, alias members come from pathFromArray of a 2-element array, view includedMember.memberPath is cube.member, etc.), so this is fine for now — but the restriction is load-bearing and easy to violate as soon as something starts feeding multi-segment paths through. Worth either: (a) leaving a clearer doc-comment that this is intentionally 2-segment-only and any join-hinted input needs different handling, or (b) growing the type now to model Vec<JoinHint> + member so callers can't accidentally bypass it.

Comment on lines +40 to +46
const wrapped = Object.create(dim);
wrapped.granularities = Object.entries(dim.granularities).map(([name, gran]: [string, any]) => {
if (gran.name === undefined) {
gran.name = name;
}
return gran;
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapDimension / wrapPreAggregation mutate the underlying gran / idx objects in place (gran.name = name). These objects live on cube.dimensions[*].granularities / cube.preAggregations[*].indexes in the evaluator, so the mutation leaks back into the cached EvaluatedCube state. It's idempotent on repeated calls, but it also means any downstream code that introspects the evaluator will start seeing a name field stamped on every granularity / index — surprising for a wrapper that's meant to be read-only against the source of truth. Cleaner: spread { name, ...gran } into a fresh object rather than mutating the original.

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.33%. Comparing base (bf13b69) to head (77805a3).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10986      +/-   ##
==========================================
+ Coverage   78.78%   83.33%   +4.54%     
==========================================
  Files         471      255     -216     
  Lines       94509    77261   -17248     
  Branches     3520        0    -3520     
==========================================
- Hits        74457    64382   -10075     
+ Misses      19544    12879    -6665     
+ Partials      508        0     -508     
Flag Coverage Δ
cube-backend ?
cubesql 83.33% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 1m 7s —— View job


Code review — re-review after push: no new findings

Latest commit (97f3e9f) is an eslint-only cleanup: import reordering in CubeEvaluator.ts, quote-escape style in the maskLiteral template, and reordering of fixture exports in bridge-fixtures.ts. No substantive logic changed since the prior review on 96675df.

The 4 prior low-severity inline threads still apply to the current head and remain unresolved — none of the referenced code was touched. Not re-posting to avoid duplicates:

  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/builder.rs:385build_case_label still swallows MemberSql::sql() errors into an empty CaseLabel::String("").
  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/model.rs:71add_cube still silently overwrites on duplicate CubeName.
  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/path.rs:65MemberPath::parse still hard-2-segment; load-bearing but undocumented.
  • packages/cubejs-schema-compiler/src/compiler/SchemaSource.ts:46wrapDimension/wrapPreAggregation still mutate gran.name/idx.name on the underlying objects, leaking into the cached EvaluatedCube.

Verdict unchanged: approve with notes (0 high, 0 medium, 4 low — all carried over). The off-by-default CUBEJS_TESSERACT_NATIVE_MODEL gating and warning-channel degradation in tryPrepareTesseractModel keep production behavior intact.

Tasks
  • Diff 97f3e9f against prior-reviewed 96675df
  • List existing review threads and check applicability against current head
  • Confirm prior findings still apply at the same paths/lines
  • Skip duplicate inline comments; summarize re-affirmation here
· [Branch](https://github.com/cube-js/cube/tree/tesseract-native-model-data)

…ehind CUBEJS_TESSERACT_NATIVE_MODEL

Introduces the Tesseract domain Model (cubesqlplanner/src/model/*) and the
cube_bridge traits to populate it from the JS schema, plus the native
endpoints (prepareModel / modelBuildSqlAndParams) and the TesseractModel
JS wrapper.

Population is gated behind the new CUBEJS_TESSERACT_NATIVE_MODEL flag
(off by default): CubeEvaluator builds the model at the end of compile()
only when the flag is on. The planner is NOT routed through the model yet
— BaseQuery stays on the per-request nativeBuildSqlAndParams path, so the
model is built and held but not consumed for SQL.

MeasureType supports the multi-stage-only rank / numberAgg types so cubes
with rank measures build; build_multi_stage_spec maps rank to a filtering
stage on the parsed enum.

Tests: native bridge model-roundtrip + extended object-bridges-coverage
for the new cube/dimension/measure/segment/granularity/pre-agg getters;
Rust YAML-fixture model build incl. rank/numberAgg.
Hierarchies are presentation-only metadata (BI drill-down, exposed via
/meta) and never participate in SQL generation, so they don't need to be
collected into the Tesseract domain model that the planner will read.

Removes model/hierarchy.rs and cube_bridge/hierarchy_definition.rs,
Cube.hierarchies, ViewSpec.evaluated_hierarchies, IncludedMemberKind::
Hierarchy, the CubeDefinition.hierarchies() getter, and the harness /
fixture references. View included members of kind "hierarchies" are now
skipped during model build (they contribute no SQL member).
RollingWindowKind matched the non-existent "time"/"row" types and
rejected the real Cube schema values, so a to_date rolling window failed
model build. Map the actual schema kinds: fixed, to_date, year_to_date,
quarter_to_date, month_to_date.
@waralexrom waralexrom force-pushed the tesseract-native-model-data branch from 97f3e9f to b119529 Compare June 8, 2026 08:23
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 1m 12s —— View job


Code review — re-review after push: no new findings

Latest commit (b119529) replaces the placeholder RollingWindowKind::Time / ::Row mapping (which matched non-existent schema strings) with the five real Cube rolling-window kinds: fixed, to_date, year_to_date, quarter_to_date, month_to_date. Verified against CubeValidator.ts:451-511 — strings match exactly. Touches only model/builder.rs:275-285 and the RollingWindowKind enum in model/measure.rs:113-118. The fix is correct, self-contained, and unblocks model build for to_date-style rolling windows.

The 4 prior low-severity inline threads still apply to the current head and remain unresolved — none of the referenced code was touched by this commit. Not re-posting to avoid duplicates:

  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/builder.rs:388build_case_label still swallows MemberSql::sql() errors into an empty CaseLabel::String("").
  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/model.rs:71add_cube still silently overwrites on duplicate CubeName.
  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/path.rs:65MemberPath::parse still hard-2-segment; load-bearing but undocumented.
  • packages/cubejs-schema-compiler/src/compiler/SchemaSource.ts:46wrapDimension/wrapPreAggregation still mutate gran.name/idx.name on the underlying objects, leaking into the cached EvaluatedCube.

Verdict unchanged: approve with notes (0 high, 0 medium, 4 low — all carried over). Off-by-default CUBEJS_TESSERACT_NATIVE_MODEL and warning-channel degradation in tryPrepareTesseractModel keep production behavior intact.

Tasks
  • Diff b119529 against prior-reviewed 97f3e9f
  • Verify rolling-window kind strings match CubeValidator.ts
  • List existing review threads and confirm prior findings still apply at same paths/lines
  • Skip duplicate inline comments; summarize re-affirmation here

· · Branch

- build_case_label propagates MemberSql::sql() errors instead of
  swallowing them into an empty label, so a malformed case label
  fails at model-build time
- SchemaSource wrappers build fresh granularity/index objects instead
  of mutating the underlying evaluated cube, keeping the wrapper
  read-only against the evaluator
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 48s —— View job


Code review — re-review after push: 2 of 4 prior notes addressed

Latest commit (77805a3) addresses two of the four prior low-severity threads:

  • Fixed: build_case_label now returns Result<CaseLabel, CubeError> and propagates MemberSql::sql() errors up through build_case_variant instead of swallowing them into an empty CaseLabel::String(""). Malformed case labels now fail fast at model-build time as intended. (model/builder.rs:378-388)
  • Fixed: wrapDimension / wrapPreAggregation build fresh granularity/index objects via { name, ...gran } instead of mutating gran.name/idx.name on the underlying evaluated cube. Wrapper is now read-only against the evaluator. (SchemaSource.ts:40-50)

Two prior threads still apply at the same paths/lines — not re-posting to avoid duplicates:

  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/model.rs:71add_cube still silently overwrites on duplicate CubeName. Surfacing a "duplicate cube" error from SchemaModelBuilder::build would catch a class of schema-compiler bugs that otherwise vanish silently.
  • rust/cube/cubesqlplanner/cubesqlplanner/src/model/path.rs:65MemberPath::parse still hard-2-segment, load-bearing but undocumented; worth either a clearer doc-comment or growing the type to model join hints explicitly.

Verdict: approve with notes (0 high, 0 medium, 2 low — down from 4). Off-by-default CUBEJS_TESSERACT_NATIVE_MODEL gating and warning-channel degradation in tryPrepareTesseractModel continue to keep production behavior intact.

Tasks
  • Diff 77805a3 against prior-reviewed b119529
  • Confirm build_case_label and SchemaSource wrapper fixes are correct
  • Confirm remaining model.rs / path.rs findings still apply at same paths/lines
  • Skip duplicate inline comments; summarize re-affirmation here
· [Branch](https://github.com/cube-js/cube/tree/tesseract-native-model-data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update Javascript code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants