[SPARK-56618][SQL][TESTS] Add DSv2 join refresh tests for incrementally constructed queries#55463
Open
longvu-db wants to merge 29 commits into
Open
[SPARK-56618][SQL][TESTS] Add DSv2 join refresh tests for incrementally constructed queries#55463longvu-db wants to merge 29 commits into
longvu-db wants to merge 29 commits into
Conversation
55b2561 to
b5cac5f
Compare
longvu-db
added a commit
to longvu-db/spark
that referenced
this pull request
Apr 24, 2026
…rios Add 6 specific join tests with checkAnswer that mirror the classic DataSourceV2DataFrameSuite join scenarios from PR apache#55463. These tests verify Connect-specific behavior where both sides re-analyze on every action, so operations that fail in classic mode (DROP COLUMN, drop/recreate table, type change) succeed in Connect. Co-authored-by: Isaac
longvu-db
added a commit
to longvu-db/spark
that referenced
this pull request
Apr 30, 2026
…rios Add 6 specific join tests with checkAnswer that mirror the classic DataSourceV2DataFrameSuite join scenarios from PR apache#55463. These tests verify Connect-specific behavior where both sides re-analyze on every action, so operations that fail in classic mode (DROP COLUMN, drop/recreate table, type change) succeed in Connect. Co-authored-by: Isaac
37a2cc0 to
0d6385a
Compare
Co-authored-by: Isaac
Remove separate test suites, test catalogs, and production code changes. Add 3 join tests for design doc scenarios 4-6 (drop+recreate table, drop+re-add column same type, drop+re-add column different type) directly in DataSourceV2DataFrameSuite. Co-authored-by: Isaac
…s 1-3 Group all 6 design doc join scenarios together at the bottom of the file, following the design doc order (scenarios 1-6). Co-authored-by: Isaac
…n drop+re-add Co-authored-by: Isaac
Co-authored-by: Isaac
…bles Co-authored-by: Isaac
Replace direct SQL statements (ALTER TABLE, DROP TABLE, INSERT) with catalog API calls to simulate truly external changes between df1 and df2 creation. This matches the pattern used in other DSv2 test sections. Co-authored-by: Isaac
Co-authored-by: Isaac
…ated)" This reverts commit 20871dd.
e6ac2d2 to
2791d3b
Compare
…pr3-joins # Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite.scala
Co-authored-by: Isaac
Co-authored-by: Isaac
Use named arguments for externalAppend calls Co-authored-by: Isaac EOF )
longvu-db
added a commit
to longvu-db/spark
that referenced
this pull request
May 8, 2026
…rios Add 6 specific join tests with checkAnswer that mirror the classic DataSourceV2DataFrameSuite join scenarios from PR apache#55463. These tests verify Connect-specific behavior where both sides re-analyze on every action, so operations that fail in classic mode (DROP COLUMN, drop/recreate table, type change) succeed in Connect. Co-authored-by: Isaac
Contributor
andreaschat-db
left a comment
There was a problem hiding this comment.
Thanks @longvu-db. Left a few comments regarding test coverage.
Add two new join test variants: - Scenario 4 variant: drop+recreate table detected via column IDs when table ID is null (uses nullidcat catalog) - Scenario 5 variant: drop+re-add column detected via column IDs when two separate alterTable calls assign a fresh column ID Co-authored-by: Isaac
…t names Rename existing tests to include "external" and add session-based variants that use SQL instead of catalog API for the mutations: - drop/recreate table with null table ID catalog - drop+re-add column Co-authored-by: Isaac
Rename existing tests to include "external" and add session-based variants using SQL for scenarios 1-3: - insert (session INSERT INTO) - ADD COLUMN (session ALTER TABLE ADD COLUMN) - DROP COLUMN (session ALTER TABLE DROP COLUMN) Co-authored-by: Isaac
Drop/recreate changes both id and salary column IDs, producing a multiline errors string. Use (?s).* to match across newlines. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
…support) Shows that table ID still detects drop/recreate when column IDs are null (using nullcolidcat catalog). Co-authored-by: Isaac
…labels Rename test name suffixes to clearly indicate catalog capabilities: - testcat: (table with both table and column ID support) - nullidcat: (table without table ID support, but with column ID support) - nullcolidcat: (table with table ID support, but without column ID support) Add sub-labels (4a-4d, 5a-5c) for scenario 4 and 5 variants. Co-authored-by: Isaac
…ble test Add "external" or "same-session" to all test names that were ambiguous. Remove Scenario 5a (single alterTable preserving column ID) since it tests an implementation detail of batched changes. Renumber 5b/5c to 5a/5b. Co-authored-by: Isaac
…-IDs catalog Remove same-session variants (4c, 4d, 4e, 5b, 5c) and replace with external variants using NullTableIdAndNullColumnIdInMemoryTableCatalog: - 4c: external drop/recreate with no IDs (undetected, join succeeds) - 5b: external drop+re-add column with no IDs (undetected, join succeeds) Add NullTableIdAndNullColumnIdInMemoryTableCatalog that strips both table and column IDs to simulate connectors with no identity tracking. Co-authored-by: Isaac
longvu-db
added a commit
to longvu-db/spark
that referenced
this pull request
May 18, 2026
…g ID suffixes Rewrite Connect join tests to mirror the classic DataSourceV2DataFrameSuite structure from PR apache#55463: - Add catalog ID support suffixes to all test names - Use external-only variants for scenarios 4 and 5 - Add nullidcat (4b) and nullbothidscat (4c, 5b) catalog variants - Scenario 6: use single alterTable call aligned with classic test - Remove same-session variants for scenarios 4, 5, 6 - Rename "session" to "same-session" for clarity Co-authored-by: Isaac
longvu-db
added a commit
to longvu-db/spark
that referenced
this pull request
May 18, 2026
Copy the test catalog from the classic PR (apache#55463) so the Connect tests for scenarios 4c and 5b can compile. Co-authored-by: Isaac
Add setVersionAndValidatedVersionFrom helper to InMemoryBaseTable to carry forward version metadata when wrapping one table into another. Without this, V2TableRefreshUtil cannot detect changes and refresh stale table references at execution time. Applied to all test catalogs that create new table instances via alterTableWithData, and refactored InMemoryTable.copy() to use the same helper. Also simplify Scenario 4c: remove externalAppend after drop/recreate, expect empty result since both sides refresh to the empty new table. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add join-specific tests for incrementally constructed queries to
DataSourceV2DataFrameSuite, along with a newNullTableIdAndNullColumnIdInMemoryTableCatalogtest catalog.When DataFrames are analyzed at different times and then joined, the refresh phase in QueryExecution must align all table references to the same version. These tests verify 6 core scenarios with catalog ID support variants:
COLUMNS_MISMATCH.TABLE_ID_MISMATCH.COLUMN_ID_MISMATCH.COLUMN_ID_MISMATCH.COLUMNS_MISMATCH.Why are the changes needed?
The existing tests covered scenarios 1-3 within larger multi-step tests but did not cover:
These scenarios are important to verify the correctness of DSv2 table refresh behavior for incrementally constructed queries.
Does this PR introduce any user-facing change?
No. This PR only adds tests and a test-only catalog.
How was this patch tested?
New tests in
DataSourceV2DataFrameSuiteand newNullTableIdAndNullColumnIdInMemoryTableCatalogtest catalog.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-6)