Skip to content

Add a ClickHouse connector (efficient batch writes) #798

Description

@Leomrlin

What & why

ClickHouse is a popular OLAP database, often the downstream analytics layer for
graph-computing results. You can technically connect today via the JDBC
connector, but it lacks ClickHouse-tuned batch writes (buffering + columnar
writes), so performance suffers.

ClickHouse 是流行的 OLAP 数据库,常作为图计算结果的下游分析层。现在用 JDBC connector
勉强能连,但缺针对 ClickHouse 的高效批量写(攒批 + 列式写入),性能差。

The task

Create geaflow-dsl-connector-clickhouse, focusing the sink on efficient batched
writes (buffer + flush threshold); the source should support parallel partitioned reads.

新建 geaflow-dsl-connector-clickhouse,重点优化 sink 批量写(buffer + flush 阈值),
source 支持按分区并行读。

Where to look / 怎么做

  1. Use geaflow-dsl-connector-jdbc as a baseline and spot the differences
    between "generic JDBC" and "ClickHouse-specific".
  2. Sink batching: write() goes into a buffer; commit in bulk on threshold or flush().
  3. SPI registration + parent/aggregation pom + docs + tests.

Done when

  • Sink throughput is clearly better than row-by-row JDBC (attach a simple benchmark in the PR)
  • Source supports parallel partitioned reads
  • Testcontainers integration test
  • CN + EN docs + examples
  • checkstyle / RAT pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions