Skip to content

feat(datafusion): support SHOW CREATE TABLE via get_table_definition#444

Open
shyjsarah wants to merge 7 commits into
apache:mainfrom
shyjsarah:show-create-table-ddl
Open

feat(datafusion): support SHOW CREATE TABLE via get_table_definition#444
shyjsarah wants to merge 7 commits into
apache:mainfrom
shyjsarah:show-create-table-ddl

Conversation

@shyjsarah

Copy link
Copy Markdown
Contributor

Summary

  • SHOW CREATE TABLE on a Paimon table returned no DDL because DataFusion 53 rewrites the statement into a query against information_schema.views, whose definition column is populated by TableProvider::get_table_definition(). PaimonTableProvider did not override this method, so the column came back empty.
  • Add a cached table_definition string on PaimonTableProvider, built once in try_new from the table's identifier and schema (fields, primary keys, partition keys, options).
  • Add data_type_to_sql covering all 22 DataType variants (recursive for Array/Map/Multiset/Row/Vector).
  • Override TableProvider::get_table_definition() to return the cached DDL.

Test plan

  • cargo check -p paimon-datafusion
  • cargo clippy -p paimon-datafusion --lib --tests (zero warnings)
  • New tests in sql_context_tests.rs (4 cases): simple table, table with primary key, table with partition + options, table with various data types
  • Full sql_context_tests suite (39/39) — no regression
  • Manual: SHOW CREATE TABLE <db>.<table> returns the DDL string

🤖 Generated with Claude Code

shaoyijie and others added 2 commits July 3, 2026 12:28
DataFusion 53 rewrites `SHOW CREATE TABLE` into a query against
`information_schema.views`, whose `definition` column is populated
by `TableProvider::get_table_definition()`. PaimonTableProvider did
not override this method, so the column came back empty for Paimon
tables.

Add a cached DDL string built from the table's identifier, schema
(fields, primary keys, partition keys, options), and a recursive
`data_type_to_sql` renderer covering all DataType variants. Override
`get_table_definition()` to return it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread crates/integrations/datafusion/src/table/mod.rs Outdated
Comment thread crates/integrations/datafusion/src/table/mod.rs
Comment thread crates/integrations/datafusion/src/table/mod.rs
Comment thread crates/integrations/datafusion/tests/sql_context_tests.rs
shaoyijie and others added 5 commits July 3, 2026 15:45
Previously data_type_to_sql rendered every type as nullable, so a column
declared `payload BLOB NOT NULL` came back as `payload BLOB`, and
replaying the DDL would silently widen the schema. Append ` NOT NULL`
when DataType::is_nullable() returns false.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sqlparser's GenericDialect parses `MAP(k, v)` (ClickHouse style,
parentheses) into SqlType::Map, not `MAP<k, v>` (angle brackets). The
previous angle-bracket output was not parseable by paimon-rust's own
CREATE TABLE path, so map columns could not round-trip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…LE output

sqlparser's GenericDialect parses `STRUCT<name type, ...>` (BigQuery
style, angle brackets with space-separated field name and type) into
SqlType::Struct, which paimon-rust maps to Paimon Row. The previous
`ROW<name: type, ...>` output was not parseable by paimon-rust's own
CREATE TABLE path, so row/struct columns could not round-trip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…TABLE

`NOT NULL` is a column constraint, not a type modifier — it is only
valid at the top of a column definition, not nested inside `MAP`,
`ARRAY`, or `STRUCT` arguments. The previous rendering appended `NOT
NULL` to every non-nullable type, producing output like
`MAP(INT NOT NULL, VARCHAR)` that paimon-rust's own CREATE TABLE
parser rejects (`Expected: ,, found: NOT`).

Move the `NOT NULL` suffix out of `data_type_to_sql` (which is called
recursively for nested type arguments) and into `build_table_definition`
(which renders one column at a time).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The DDL returned by `SHOW CREATE TABLE` must be executable by
paimon-rust's own `CREATE TABLE` parser and reproduce an equivalent
schema (fields, primary keys, partition keys). This guards against
regressions where the rendered DDL drifts away from what the parser
accepts (e.g. `ROW<name: type>` vs `STRUCT<name type>`,
`MAP<k: v>` vs `MAP(k, v)`, or `NOT NULL` leaking into nested type
arguments).

The test creates a table with NOT NULL, ARRAY, MAP, STRUCT, and
nested composite types, drops it, re-executes the rendered DDL, and
asserts schema equivalence via a new `assert_schema_equivalent`
helper. Options are intentionally not compared because the CREATE
TABLE path may inject catalog defaults (e.g. `bucket`) that the user
did not specify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
let mut ddl = String::new();
let _ = write!(
ddl,
"CREATE TABLE {}.{} (",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW CREATE TABLE should emit replayable SQL, but the table name is written without identifier quoting. A table created with quoted identifiers such as CREATE TABLE paimon.test_db."select" ("order" INT) would be rendered as CREATE TABLE test_db.select (order INT), which is invalid or changes the identifier semantics when re-executed. The same issue applies to column names, primary/partition keys, and nested struct field names below. Please quote identifiers when required and escape embedded quotes, then add a round-trip case with a reserved-word or otherwise quoted identifier.

if i > 0 {
ddl.push_str(", ");
}
let _ = write!(ddl, "'{}' = '{}'", k, v);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs SQL string escaping before interpolation. Option values are arbitrary table metadata, so a value containing a single quote, for example WITH ('comment' = 'Bob's table'), makes the returned definition invalid when users copy or re-execute SHOW CREATE TABLE. Please render string literals with SQL escaping (doubling ' to '') for both keys and values, and add a round-trip test with an option value containing a quote.

DataType::Char(t) => format!("CHAR({})", t.length()),
DataType::VarChar(t) => format!("VARCHAR({})", t.length()),
DataType::Date(_) => "DATE".to_string(),
DataType::Time(t) => format!("TIME({})", t.precision()),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This arm (and the TIMESTAMP_LTZ, MULTISET, and VECTOR arms below) emits syntax that the current SQLContext cannot round-trip. sql_data_type_to_paimon_type has no SqlType::Time branch and falls through to Unsupported SQL data type; similarly TIMESTAMP_LTZ, MULTISET, and VECTOR are not accepted by that converter. A table loaded from existing Paimon metadata with any of these types will therefore return a SHOW CREATE TABLE definition that cannot be executed by paimon-rust. Please either add parser/converter support for the emitted syntax with round-trip tests, or avoid advertising these variants as replayable DDL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants