Skip to content

Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768

Open
mkmkme wants to merge 1 commit into
antalya-26.3from
backports/antalya-26.3/102628
Open

Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768
mkmkme wants to merge 1 commit into
antalya-26.3from
backports/antalya-26.3/102628

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented May 9, 2026

Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix LOGICAL_ERROR crash "Unexpected number of rows in column subchunk" in native Parquet V3 reader when reading nullable columns with a WHERE filter (ClickHouse#102628 by @groeneai).

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

…-null-check

Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Workflow [PR], commit [6d92950]

Copy link
Copy Markdown

@il9ue il9ue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — seems like clean backport ✅

Diff matches upstream #102628. Two fixes in Reader.cpp:

  1. Gate use_filter_in_decoder on !column.need_null_map. The fast path processes all rows through processDefLevelsForInnermostColumn and applies the filter at encoded-value indices, which don't line up 1:1 with row indices when nulls are present. Fall-back to the standard row-range path is correct.

  2. Inverted memchr: 01. ClickHouse convention is 1 = NULL, so the old check cleared the null_map whenever any non-null existed — exactly backwards. With all-NULL filtered rows into a non-Nullable column at null_as_default=0, this silently dropped the map instead of raising CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, then crashed downstream on the row-count mismatch.

Test covers all four meaningful cases (nullable output, null_as_default=1, the formerly-crashing path, and a no-nulls control). Correctly gated on input_format_parquet_use_native_reader_v3=1.

CI selection appropriate (Parquet/Iceberg/S3 Export, ASAN kept). Approve once green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants