Skip to content

[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF#55978

Open
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56937
Open

[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF#55978
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56937

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 19, 2026

What changes were proposed in this pull request?

In verify_arrow_result (python/pyspark/worker.py), the positional branch zips expected and actual columns without a length check, silently truncating to the shorter list. This PR raises RESULT_COLUMN_SCHEMA_MISMATCH on length mismatch.

Why are the changes needed?

Latent since SPARK-40559. Under assignColumnsByName=false, a UDF returning the wrong number of columns either silently drops data (too many) or surfaces a JVM ArrayIndexOutOfBoundsException (too few). The name-based branch already raises a friendly error; positional should be symmetric.

Affects SQL_GROUPED_MAP_ARROW_UDF, SQL_GROUPED_MAP_ARROW_ITER_UDF, SQL_COGROUPED_MAP_ARROW_UDF.

Does this PR introduce any user-facing change?

Yes. Wrong column count under positional mode now raises RESULT_COLUMN_SCHEMA_MISMATCH instead of silent truncation or a JVM error.

How was this patch tested?

Added test_apply_in_arrow_returning_wrong_column_count_positional_assignment in test_arrow_grouped_map.py (covers iterator variant via function_variations) and test_arrow_cogrouped_map.py, exercising both too-many and too-few columns. Full grouped/cogrouped Arrow map suites pass.

Was this patch authored or co-authored using generative AI tooling?

No.

@Yicong-Huang Yicong-Huang changed the title [SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF (positional mode) [SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF May 19, 2026
@zhengruifeng
Copy link
Copy Markdown
Contributor

does these query also fail before this PR?
is this only a change in error message?

@Yicong-Huang
Copy link
Copy Markdown
Contributor Author

does these query also fail before this PR?

is this only a change in error message?

No. They either silently drops data (too many) or surfaces a JVM ArrayIndexOutOfBoundsException (too few).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants