fix: declare the logical plan schema in DoGet to match GetFlightInfo#52
Open
jonasdedden wants to merge 1 commit into
Open
fix: declare the logical plan schema in DoGet to match GetFlightInfo#52jonasdedden wants to merge 1 commit into
jonasdedden wants to merge 1 commit into
Conversation
do_get_fallback declared the physical execution stream schema on the wire, while get_flight_info_* advertises the logical plan schema. The two can disagree on field nullability (e.g. aggregates over non-nullable columns), and strict clients that validate the DoGet stream against the advertised FlightInfo schema -- notably the Go ADBC Flight SQL driver and the Python adbc-driver-flightsql package built on it -- reject such results with 'endpoint 0 returned inconsistent schema' errors. Derive the declared schema in all three do_get_fallback arms (CommandStatementQuery, CommandPreparedStatementQuery, CommandStatementSubstraitPlan) from the logical plan via the same get_schema_for_plan() call that GetFlightInfo uses. The encoded batches are unchanged; declaring the (more permissive) logical schema is always valid for the data. Adds a regression test asserting the DoGet stream schema equals the FlightInfo schema for an aggregate query. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #51.
What changes are included in this PR?
do_get_fallbackcurrently declares the physical execution stream schema (stream.schema()) on the wire, whileget_flight_info_*advertises the logical plan schema (get_schema_for_plan(...)).The two can disagree on field nullability — reproduced with aggregates over statistics-backed sources such as Parquet, where the
AggregateStatisticsoptimizer rewritesMIN()/MAX()into non-nullable literals taken from file statistics while the advertised logical schema keeps them nullable.Strict clients that validate the
DoGetstream against the advertisedFlightInfoschema — notably the Go ADBC Flight SQL driver and therefore Python'sadbc-driver-flightsql— reject such results:This PR makes all three arms of
do_get_fallback(CommandStatementQuery,CommandPreparedStatementQuery,CommandStatementSubstraitPlan) derive the declared schema from the logical plan via the sameget_schema_for_plan(...)call thatGetFlightInfouses, so the advertised and streamed schemas always match.The encoded batches themselves are unchanged; declaring the (more permissive) logical schema is always valid for the data, while the previous behavior was the direction that breaks readers.
For
CommandStatementQuerythis replacesctx.execute_sql(&query)with the equivalentsql_to_logical_plan+execute_logical_planpair so the plan is available for schema derivation; the other two arms already had the plan in hand.Are these changes tested?
Yes — a new integration test (
test_do_get_schema_matches_advertised_flight_info_schema) writes a small Parquet file, executes an aggregate query against it, decodes the schema fromFlightInfo, consumes theDoGetstream, and asserts the stream's declared schema equals the advertised one. The test fails againstmain:and passes with this change. All existing tests pass unchanged.
Are there any user-facing changes?
Result schemas for queries whose logical/physical schemas differ in nullability are now reported with the logical (nullable) variant consistently in both
GetFlightInfoandDoGet. Clients that previously failed with "inconsistent schema" errors (e.g. all ADBC Flight SQL consumers) now work.