[SPARK-56618][CONNECT][TESTS] Add DSv2 join refresh tests for incrementally constructed queries in Spark Connect#55457
Open
longvu-db wants to merge 43 commits into
Open
[SPARK-56618][CONNECT][TESTS] Add DSv2 join refresh tests for incrementally constructed queries in Spark Connect#55457longvu-db wants to merge 43 commits into
longvu-db wants to merge 43 commits into
Conversation
263c8bf to
642b0bc
Compare
e058e6a to
6b7251d
Compare
Contributor
andreaschat-db
left a comment
There was a problem hiding this comment.
Thanks @longvu-db. Left a comment.
7081cff to
dd57944
Compare
Co-authored-by: Isaac
…rios Add 6 specific join tests with checkAnswer that mirror the classic DataSourceV2DataFrameSuite join scenarios from PR apache#55463. These tests verify Connect-specific behavior where both sides re-analyze on every action, so operations that fail in classic mode (DROP COLUMN, drop/recreate table, type change) succeed in Connect. Co-authored-by: Isaac
Co-authored-by: Isaac
These files belong in the classic test infrastructure PRs, not this Connect-only test PR. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Replace the client-jvm test with a server-side test using SparkConnectServerTest and withSession pattern, matching the style of DataSourceV2CacheConnectSuite. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Replace direct SQL statements (ALTER TABLE, DROP TABLE, INSERT) with catalog API calls to simulate truly external changes between df1 and df2 creation. This matches the pattern used in other DSv2 test sections. Co-authored-by: Isaac
Co-authored-by: Isaac
…alidation Remove external writer that adds rows after drop/recreate and drop/re-add column. Add schema assertions to verify the join result has the expected column names. Update expected results to reflect that Connect re-analyzes both sides against the new table state (empty after recreate, null salary after drop+re-add).
No longer needed after removing external writer from Scenario 6.
A single alterTable(ident, dropCol, addCol) applies both changes atomically, so the data migration sees salary in both old and new schemas and retains the old value. Splitting into two separate calls ensures the drop triggers data migration (nulling salary), matching the behavior of ALTER TABLE DROP/ADD COLUMN SQL statements.
Co-authored-by: Isaac
…eanup Co-authored-by: Isaac
Remove unnecessary ClassTag from serverCatalog, add externalAppend helper method, and use it to reduce boilerplate in test bodies. Co-authored-by: Isaac
Co-authored-by: Isaac
Scenario 5: use single alterTable call with both dropCol and addCol (matching classic test). Data is preserved since column name and type match. Add externalAppend to align test structure with classic. Scenario 6: keep two separate calls because InMemoryBaseTable preserves old INT data when names match in a single call, causing a type mismatch at read time. Document the difference in the test comment. Co-authored-by: Isaac
Rename existing tests to include "external" and add session-based variants that use SQL instead of catalog API for scenarios 4-6: - drop/recreate table - drop+re-add same-type column - drop+re-add different-type column Co-authored-by: Isaac
Rename existing tests to include "external" and add session-based variants using SQL for scenarios 1-3: - insert (session INSERT INTO) - ADD COLUMN (session ALTER TABLE ADD COLUMN) - DROP COLUMN (session ALTER TABLE DROP COLUMN) Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
…support) Co-authored-by: Isaac
Co-authored-by: Isaac
…g ID suffixes Rewrite Connect join tests to mirror the classic DataSourceV2DataFrameSuite structure from PR apache#55463: - Add catalog ID support suffixes to all test names - Use external-only variants for scenarios 4 and 5 - Add nullidcat (4b) and nullbothidscat (4c, 5b) catalog variants - Scenario 6: use single alterTable call aligned with classic test - Remove same-session variants for scenarios 4, 5, 6 - Rename "session" to "same-session" for clarity Co-authored-by: Isaac
…mments Extract common CREATE TABLE + INSERT + cleanup into withTable helper. Add "Datasets are re-analyzed" explanation to each scenario comment, with details on why classic fails (pinned plan) but Connect succeeds. Co-authored-by: Isaac
Move scenario labels (S1-S6) into test names for discoverability. Remove externalAppend from scenarios 4a/4b/4c since the recreated table should be empty, and expect Seq.empty in join results. Co-authored-by: Isaac
Copy the test catalog from the classic PR (apache#55463) so the Connect tests for scenarios 4c and 5b can compile. Co-authored-by: Isaac
…sionFrom Add setVersionAndValidatedVersionFrom helper to InMemoryBaseTable to carry forward version metadata when wrapping one table into another. Without this, V2TableRefreshUtil cannot detect changes and refresh stale table references at execution time. Applied to all test catalogs that create new table instances via alterTableWithData: NullTableIdAndNullColumnIdInMemoryTableCatalog, NullColumnIdInMemoryTableCatalog, MixedColumnIdTableCatalog, ComposedColumnIdTableCatalog, TypeChangeResetsColIdTableCatalog, InMemoryRowLevelOperationTableCatalog, and PartialSchemaEvolutionCatalog. Also refactored InMemoryTable.copy() to use the same helper. Co-authored-by: Isaac
…olumn Apply schemaAfterDrops fix to BasicInMemoryTableCatalog.alterTable so that combined DROP + ADD column changes properly drop old data. The re-added column has null values for existing rows, not the old value. Update test expectations from [1, 100] to [1, null] in 3 tests. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
A single combined alterTable call preserves the old column ID via assignMissingIds name matching, so COLUMN_ID_MISMATCH would not fire. Two separate calls ensure the first deletes the column (and its ID), so the second assigns a fresh ID. Co-authored-by: Isaac
Remove the schemaAfterDrops mechanism from InMemoryTableCatalog.alterTable that was causing ClassCastException in InsertIntoSchemaEvolutionTests. Restore alterTableWithData(table.data, schema) to match master behavior. Revert checkAnswer assertions for null-ID catalogs back to Row(1, 100). Fix scalafmt formatting in DataSourceV2JoinConnectSuite. Co-authored-by: Isaac
9d2f3df to
fd70f15
Compare
…rtions 1. Restore schemaAfterDrops in InMemoryTableCatalog.alterTable so that drop+re-add column in a single alterTable call properly clears the dropped column's data from existing rows. 2. Fix null/mixed column ID test assertions: with version tracking now propagated, the table is refreshed on re-read after drop+re-add, so re-added salary column has null values (Row(1, null) not Row(1, 100)). Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add DSv2 join test coverage for Spark Connect in a new
DataSourceV2JoinConnectSuite, mirroring the classicDataSourceV2DataFrameSuitejoin scenarios from #55463.The tests use an in-process Connect server (
SparkConnectServerTest) so that a Connect client creates DataFrames, modifies the underlying table, and then executes joins. This verifies Connect's re-analysis behavior for incrementally constructed queries.In Connect, Datasets are re-analyzed on every action, so operations that fail in classic mode (where each side pins its schema at analysis time) succeed here because both sides always get a fresh plan with the latest schema and data.
COLUMNS_MISMATCHbecause df1's pinned plan still references the dropped column.TABLE_ID_MISMATCHbecause df1's pinned plan captured the original table ID.COLUMN_ID_MISMATCHbecause df1's pinned column IDs differ from the recreated table's.COLUMN_ID_MISMATCHbecause the re-added column gets a fresh ID.COLUMNS_MISMATCHbecause df1's pinned plan still expects INT.The Connect and classic tests are aligned so the only differences are the APIs (Connect session vs classic
sql()/spark.table(),assertRowsvscheckAnswer, server-sideserverCatalogaccess).Shared test infrastructure changes
Same test catalogs as #55463:
NullTableIdAndNullColumnIdInMemoryTableCatalog,TypeChangeResetsColIdTableCatalog,NullColumnIdInMemoryTableCatalog,MixedColumnIdTableCatalog,ComposedColumnIdTableCatalog, andInMemoryBaseTable/InMemoryTableaccessor methods.Why are the changes needed?
Connect has fundamentally different semantics from classic mode: every action re-analyzes the plan from scratch on the server, meaning there is no pinned
QueryExecutionand no stale schema detection. These tests document and verify that behavior across all catalog ID support variants.This is the Spark Connect counterpart of #55463.
Does this PR introduce any user-facing change?
No. This PR only adds tests and test-only catalogs.
How was this patch tested?
New tests in
DataSourceV2JoinConnectSuite.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-6)