Search before asking
Motivation
PyPaimon has two read paths today:
- SQL (
SQLContext.sql) — already runs on the Rust DataFusion engine.
- DataFrame (
ReadBuilder → Split → TableRead.to_arrow/to_pandas/to_ray) —
still pure Python, even though the Rust core already implements the same
model in crates/paimon/src/table/read_builder.rs. It's just not exposed
through bindings/python (PyTable only has identifier/location/schema).
Goal: expose the existing Rust read API to Python so the DataFrame read path can
optionally run on Rust. Initially this lands as a basic, opt-in path behind a
config flag, running alongside the pure-Python reader rather than replacing it,
so the Rust path can mature before it becomes a default. Write path is out of scope.
Scope (incremental PRs)
This can be implemented incrementally:
-
PR 1 — Expose scan planning:
new_read_builder(), with_projection(), with_limit(), and
new_scan().plan() returning serializable splits.
-
PR 2 — Expose filter pushdown:
add with_filter() after the Python Predicate → Rust Predicate conversion
layer is defined.
-
PR 3 — Expose split → Arrow read:
new_read().read(splits) returning Arrow data backed by Rust TableRead.
-
PR 4 (in apache/paimon, [python]) — Wire PyPaimon's
to_arrow / to_pandas / to_ray to the Rust reader as an opt-in path
(config-gated), keeping the pure-Python reader as the default. Unsupported
capabilities error out rather than silently falling back.
PR 1–3 land here; PR 4 lands in the main repo once bindings are released.
Notes
with_filter() is separated from the initial scan-planning PR because it
requires a dedicated Python Predicate → Rust Predicate conversion layer. PR 1
focuses on establishing the Python binding shape and serializable splits.
Design principle: in this model Rust both plans and reads.
new_read().read(splits) returns Arrow from the Rust TableRead, and splits
stay opaque on the Python side — a serializable transport token, not
something Python inspects or reads from. Exposing split internals would imply a
Rust-plans / Python-reads path, which is a different direction and out of scope
here.
Solution
No response
Anything else?
No response
Willingness to contribute
Search before asking
Motivation
PyPaimon has two read paths today:
SQLContext.sql) — already runs on the Rust DataFusion engine.ReadBuilder → Split → TableRead.to_arrow/to_pandas/to_ray) —still pure Python, even though the Rust core already implements the same
model in
crates/paimon/src/table/read_builder.rs. It's just not exposedthrough
bindings/python(PyTableonly hasidentifier/location/schema).Goal: expose the existing Rust read API to Python so the DataFrame read path can
optionally run on Rust. Initially this lands as a basic, opt-in path behind a
config flag, running alongside the pure-Python reader rather than replacing it,
so the Rust path can mature before it becomes a default. Write path is out of scope.
Scope (incremental PRs)
This can be implemented incrementally:
PR 1 — Expose scan planning:
new_read_builder(),with_projection(),with_limit(), andnew_scan().plan()returning serializable splits.PR 2 — Expose filter pushdown:
add
with_filter()after the Python Predicate → Rust Predicate conversionlayer is defined.
PR 3 — Expose split → Arrow read:
new_read().read(splits)returning Arrow data backed by RustTableRead.PR 4 (in
apache/paimon,[python]) — Wire PyPaimon'sto_arrow/to_pandas/to_rayto the Rust reader as an opt-in path(config-gated), keeping the pure-Python reader as the default. Unsupported
capabilities error out rather than silently falling back.
PR 1–3 land here; PR 4 lands in the main repo once bindings are released.
Notes
with_filter()is separated from the initial scan-planning PR because itrequires a dedicated Python Predicate → Rust Predicate conversion layer. PR 1
focuses on establishing the Python binding shape and serializable splits.
Design principle: in this model Rust both plans and reads.
new_read().read(splits)returns Arrow from the RustTableRead, and splitsstay opaque on the Python side — a serializable transport token, not
something Python inspects or reads from. Exposing split internals would imply a
Rust-plans / Python-reads path, which is a different direction and out of scope
here.
Solution
No response
Anything else?
No response
Willingness to contribute