Skip to content

chore: upgrade DataFusion to 54#453

Merged
JingsongLi merged 1 commit into
apache:mainfrom
JingsongLi:codex/datafusion-54-sql-support
Jul 5, 2026
Merged

chore: upgrade DataFusion to 54#453
JingsongLi merged 1 commit into
apache:mainfrom
JingsongLi:codex/datafusion-54-sql-support

Conversation

@JingsongLi

@JingsongLi JingsongLi commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Upgrade the DataFusion integration to DataFusion 54.0.0 and refresh the generated Rust dependency metadata. This also updates the SQL documentation to reflect the current DataFusion-delegated SQL surface and the Paimon-specific SQL support implemented by SQLContext.

Changes

  • Bump workspace datafusion and datafusion-ffi dependencies to 54.0.0.
  • Sync the Python binding dev dependency to datafusion==54.0.0 so Python DataFusion FFI matches the Rust datafusion-ffi version.
  • Adapt DataFusion integration and Python binding code for DataFusion 54 API changes around downcasting and execution plan statistics.
  • Update INSERT OVERWRITE ... PARTITION explicit column handling for the DataFusion 54 SQL AST shape.
  • Refresh generated DEPENDENCIES.rust.tsv files.
  • Split the CI integration job into parallel matrix shards: rust, datafusion, lumina, python, and go.
  • Update docs/src/sql.md for DataFusion 54 SQL support, Paimon-specific DDL/DML support, positive LIKE pushdown, lateral vector_search joins, async catalog registration examples, and a broken DataFusion docs anchor.

Testing

  • cargo fmt --all -- --check
  • mkdocs build -f docs/mkdocs.yml --strict
  • cargo test -p paimon-datafusion --test sql_context_tests test_show_tables_is_enabled
  • cargo test -p paimon-datafusion --test delete_tests test_delete_data_evolution_table_with_deletion_vectors
  • cargo test -p paimon-datafusion --test merge_into_tests test_when_matched_delete_with_deletion_vectors
  • cargo test -p paimon-datafusion --test read_tables -- --skip vector_search_tests::test_vector_search_top3 --skip vector_search_tests::test_vector_search_top6_returns_all
  • cargo test -p pypaimon_rust
  • uv lock --check in bindings/python
  • make install && uv run --no-sync pytest tests/test_datafusion.py::test_query_simple_table_via_catalog_provider in bindings/python
  • make test in bindings/python
  • Parsed .github/workflows/ci.yml locally and verified the integration matrix contains rust, datafusion, lumina, python, and go
  • git diff --check -- .github/workflows/ci.yml bindings/python/pyproject.toml bindings/python/uv.lock

Notes

A local full cargo test -p paimon-datafusion run passes the rest of the suite but cannot complete two vector search tests without the Lumina native library:

  • vector_search_tests::test_vector_search_top3
  • vector_search_tests::test_vector_search_top6_returns_all

Both fail locally with Failed to load lumina library from 'liblumina_py.dylib'; liblumina_py.dylib was not present on this machine, and LUMINA_LIB_PATH is not set. CI installs lumina-data before the DataFusion and Lumina integration shards.

@JingsongLi JingsongLi force-pushed the codex/datafusion-54-sql-support branch from 5951160 to 685ce57 Compare July 5, 2026 01:39

@QuakeWang QuakeWang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leaves12138 leaves12138 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the DataFusion 54 upgrade. The code changes look like straightforward API migrations for provider/downcast hooks, ExecutionPlan::partition_statistics, SQL parser insert column handling, and the Python UDF wrapper.

Validation I ran locally:

  • cargo fmt --all -- --check
  • cargo clippy -p paimon-datafusion --all-targets -- -D warnings
  • cargo test -p paimon-datafusion --lib -- ... (186 passed; skipped 7 tests that require the external Spark-provisioned /tmp/paimon-warehouse fixture)
  • cargo test -p paimon-datafusion --test pk_tables test_pk_insert_overwrite_with_after_columns_reorder -- --nocapture
  • cargo test -p paimon-datafusion --test merge_into_tests test_merge_insert_reordered_columns -- --nocapture
  • PYO3_NO_PYTHON=1 PYO3_BUILD_EXTENSION_MODULE=1 cargo check -p pypaimon_rust --lib
  • PYO3_NO_PYTHON=1 PYO3_BUILD_EXTENSION_MODULE=1 cargo clippy -p pypaimon_rust --lib -- -D warnings

Notes:

  • A full cargo test -p paimon-datafusion --all-targets needs the pre-provisioned warehouse fixture locally; without it, the fixture-dependent scan tests fail with TableNotExist.
  • The Python Rust unit tests need a dynamic Python >= 3.10 for PyO3 auto-initialize; this environment only has static Python 3.10 and dynamic Python 3.8, so I validated the extension-module build and clippy path instead.
  • git diff --check reports trailing whitespace in generated DEPENDENCIES.rust.tsv files, which appears to come from trailing TSV columns.

No blocking issues found.

@JingsongLi JingsongLi force-pushed the codex/datafusion-54-sql-support branch from 685ce57 to 5cb3a80 Compare July 5, 2026 02:36
@JingsongLi JingsongLi force-pushed the codex/datafusion-54-sql-support branch from 5cb3a80 to 8912db3 Compare July 5, 2026 02:50
@JingsongLi JingsongLi merged commit 8987dea into apache:main Jul 5, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants