[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF by Yicong-Huang · Pull Request #55978 · apache/spark

Yicong-Huang · 2026-05-19T08:01:16Z

What changes were proposed in this pull request?

In verify_arrow_result (python/pyspark/worker.py), the positional branch zips expected and actual columns without a length check, silently truncating to the shorter list. This PR raises RESULT_COLUMN_SCHEMA_MISMATCH on length mismatch.

Why are the changes needed?

Latent since SPARK-40559. Under assignColumnsByName=false, a UDF returning the wrong number of columns either silently drops data (too many) or surfaces a JVM ArrayIndexOutOfBoundsException (too few). The name-based branch already raises a friendly error; positional should be symmetric.

Affects SQL_GROUPED_MAP_ARROW_UDF, SQL_GROUPED_MAP_ARROW_ITER_UDF, SQL_COGROUPED_MAP_ARROW_UDF.

Does this PR introduce any user-facing change?

Yes. Wrong column count under positional mode now raises RESULT_COLUMN_SCHEMA_MISMATCH instead of silent truncation or a JVM error.

How was this patch tested?

Added test_apply_in_arrow_returning_wrong_column_count_positional_assignment in test_arrow_grouped_map.py (covers iterator variant via function_variations) and test_arrow_cogrouped_map.py, exercising both too-many and too-few columns. Full grouped/cogrouped Arrow map suites pass.

Was this patch authored or co-authored using generative AI tooling?

No.

… branch

zhengruifeng · 2026-05-19T08:14:17Z

does these query also fail before this PR?
is this only a change in error message?

Yicong-Huang · 2026-05-19T08:50:51Z

does these query also fail before this PR?

is this only a change in error message?

No. They either silently drops data (too many) or surfaces a JVM ArrayIndexOutOfBoundsException (too few).

fix: raise on column count mismatch in verify_arrow_result positional…

b728cf3

… branch

Yicong-Huang changed the title ~~[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF (positional mode)~~ [SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF#55978

[SPARK-56937][PYTHON] Raise error on wrong column count in Arrow grouped/cogrouped map UDF#55978
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56937

Yicong-Huang commented May 19, 2026 •

edited

Loading

Uh oh!

zhengruifeng commented May 19, 2026

Uh oh!

Yicong-Huang commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yicong-Huang commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented May 19, 2026

Uh oh!

Yicong-Huang commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yicong-Huang commented May 19, 2026 •

edited

Loading