Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768
Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768mkmkme wants to merge 1 commit into
Conversation
…-null-check Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter
il9ue
left a comment
There was a problem hiding this comment.
LGTM — seems like clean backport ✅
Diff matches upstream #102628. Two fixes in Reader.cpp:
-
Gate
use_filter_in_decoderon!column.need_null_map. The fast path processes all rows throughprocessDefLevelsForInnermostColumnand applies the filter at encoded-value indices, which don't line up 1:1 with row indices when nulls are present. Fall-back to the standard row-range path is correct. -
Inverted
memchr:0→1. ClickHouse convention is1 = NULL, so the old check cleared the null_map whenever any non-null existed — exactly backwards. With all-NULL filtered rows into a non-Nullable column atnull_as_default=0, this silently dropped the map instead of raisingCANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, then crashed downstream on the row-count mismatch.
Test covers all four meaningful cases (nullable output, null_as_default=1, the formerly-crashing path, and a no-nulls control). Correctly gated on input_format_parquet_use_native_reader_v3=1.
CI selection appropriate (Parquet/Iceberg/S3 Export, ASAN kept). Approve once green.
Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix LOGICAL_ERROR crash "Unexpected number of rows in column subchunk" in native Parquet V3 reader when reading nullable columns with a WHERE filter (ClickHouse#102628 by @groeneai).
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: