Skip to content

fix: declare the logical plan schema in DoGet to match GetFlightInfo#52

Open
jonasdedden wants to merge 1 commit into
datafusion-contrib:mainfrom
jonasdedden:fix-doget-schema-mismatch
Open

fix: declare the logical plan schema in DoGet to match GetFlightInfo#52
jonasdedden wants to merge 1 commit into
datafusion-contrib:mainfrom
jonasdedden:fix-doget-schema-mismatch

Conversation

@jonasdedden

Copy link
Copy Markdown

Which issue does this PR close?

Closes #51.

What changes are included in this PR?

do_get_fallback currently declares the physical execution stream schema (stream.schema()) on the wire, while get_flight_info_* advertises the logical plan schema (get_schema_for_plan(...)).

The two can disagree on field nullability — reproduced with aggregates over statistics-backed sources such as Parquet, where the AggregateStatistics optimizer rewrites MIN()/MAX() into non-nullable literals taken from file statistics while the advertised logical schema keeps them nullable.

Strict clients that validate the DoGet stream against the advertised FlightInfo schema — notably the Go ADBC Flight SQL driver and therefore Python's adbc-driver-flightsql — reject such results:

[FlightSQL] endpoint 0 returned inconsistent schema:
expected schema: ... lo: type=int64, nullable
but got schema:  ... lo: type=int64

This PR makes all three arms of do_get_fallback (CommandStatementQuery, CommandPreparedStatementQuery,
CommandStatementSubstraitPlan) derive the declared schema from the logical plan via the same get_schema_for_plan(...) call that GetFlightInfo uses, so the advertised and streamed schemas always match.

The encoded batches themselves are unchanged; declaring the (more permissive) logical schema is always valid for the data, while the previous behavior was the direction that breaks readers.

For CommandStatementQuery this replaces ctx.execute_sql(&query) with the equivalent sql_to_logical_plan + execute_logical_plan pair so the plan is available for schema derivation; the other two arms already had the plan in hand.

Are these changes tested?

Yes — a new integration test (test_do_get_schema_matches_advertised_flight_info_schema) writes a small Parquet file, executes an aggregate query against it, decodes the schema from FlightInfo, consumes the DoGet stream, and asserts the stream's declared schema equals the advertised one. The test fails against main:

assertion `left == right` failed: DoGet stream schema must match the schema advertised in FlightInfo
  left: Schema { fields: [Field { name: "lo", data_type: Int32, nullable: true }, ...] }
 right: Schema { fields: [Field { name: "lo", data_type: Int32 }, ...] }

and passes with this change. All existing tests pass unchanged.

Are there any user-facing changes?

Result schemas for queries whose logical/physical schemas differ in nullability are now reported with the logical (nullable) variant consistently in both GetFlightInfo and DoGet. Clients that previously failed with "inconsistent schema" errors (e.g. all ADBC Flight SQL consumers) now work.

do_get_fallback declared the physical execution stream schema on the
wire, while get_flight_info_* advertises the logical plan schema. The
two can disagree on field nullability (e.g. aggregates over
non-nullable columns), and strict clients that validate the DoGet
stream against the advertised FlightInfo schema -- notably the Go ADBC
Flight SQL driver and the Python adbc-driver-flightsql package built on
it -- reject such results with 'endpoint 0 returned inconsistent
schema' errors.

Derive the declared schema in all three do_get_fallback arms
(CommandStatementQuery, CommandPreparedStatementQuery,
CommandStatementSubstraitPlan) from the logical plan via the same
get_schema_for_plan() call that GetFlightInfo uses. The encoded batches
are unchanged; declaring the (more permissive) logical schema is always
valid for the data.

Adds a regression test asserting the DoGet stream schema equals the
FlightInfo schema for an aggregate query.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DoGet stream schema can differ from the schema advertised in FlightInfo (rejected by ADBC clients)

1 participant