Skip to content

feat: clustered segment benchmark and fix#19623

Merged
clintropolis merged 2 commits into
apache:masterfrom
clintropolis:clustered-segment-benchmark
Jun 24, 2026
Merged

feat: clustered segment benchmark and fix#19623
clintropolis merged 2 commits into
apache:masterfrom
clintropolis:clustered-segment-benchmark

Conversation

@clintropolis

@clintropolis clintropolis commented Jun 24, 2026

Copy link
Copy Markdown
Member

Description

This PR adds a benchmark for clustered segments, and fixes a bug with clustering columns similar in nature to the problem fixed in #19599 for delegate columns. The problem was caused by the constant selectors of clustering columns advertising the incorrect cardinality (1 since it was a constant selector per group).

This PR also adds an optimized path for the 'single group' path to just delegate directly to the underlying group, since it is correct to do so in the single group case.

The added benchmarks are measuring segments with 3 layouts:
CLUSTERED: clustered segment
UNCLUSTERED: unclustered v10 segment NOT ordered by time and so effectively the same ordering as the clustered segment, to have an apples to apples comparison of clustered vs not
TIME_ORDERED: v10 segment ordered by __time then clustering columns then non-clustering, to provide comparison with more typical segments.

Benchmark results:

      // 0: equality filter on the clustering column (clustered groups can prune non-matching groups)
      "SELECT SUM(valueLong) FROM %s WHERE clusterKey1 = '3'",
      // 1: group by the clustering column.
      "SELECT clusterKey1, SUM(valueLong) FROM %s GROUP BY 1 ORDER BY 2",
      // 2: filter on the clustering column + group by a secondary (higher cardinality) column
      "SELECT dimSecondary, SUM(valueLong) FROM %s WHERE clusterKey1 = '3' GROUP BY 1 ORDER BY 2",
      // 3: no-filter full aggregate (full scan, no pruning) as an overhead baseline
      "SELECT SUM(valueLong) FROM %s"

Benchmark                               (clusteringCardinality)  (complexCompression)  (jsonObjectStorageEncoding)  (numClusteringColumns)  (query)  (rowsPerSegment)  (schemaType)  (segmentLayout)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt   Score   Error  Units
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        0           1500000      explicit        CLUSTERED           MMAP              UTF8        false  avgt    5   1.288 ± 0.080  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        0           1500000      explicit        CLUSTERED           MMAP              UTF8        force  avgt    5   0.706 ± 0.009  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        0           1500000      explicit      UNCLUSTERED           MMAP              UTF8        false  avgt    5   1.713 ± 0.191  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        0           1500000      explicit      UNCLUSTERED           MMAP              UTF8        force  avgt    5   0.826 ± 0.064  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        0           1500000      explicit     TIME_ORDERED           MMAP              UTF8        false  avgt    5   4.368 ± 0.205  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        0           1500000      explicit     TIME_ORDERED           MMAP              UTF8        force  avgt    5   3.602 ± 0.266  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        1           1500000      explicit        CLUSTERED           MMAP              UTF8        false  avgt    5  87.361 ± 1.396  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        1           1500000      explicit        CLUSTERED           MMAP              UTF8        force  avgt    5  34.209 ± 1.648  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        1           1500000      explicit      UNCLUSTERED           MMAP              UTF8        false  avgt    5  64.144 ± 2.320  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        1           1500000      explicit      UNCLUSTERED           MMAP              UTF8        force  avgt    5  19.577 ± 1.139  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        1           1500000      explicit     TIME_ORDERED           MMAP              UTF8        false  avgt    5  49.424 ± 1.254  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        1           1500000      explicit     TIME_ORDERED           MMAP              UTF8        force  avgt    5  10.193 ± 0.463  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        2           1500000      explicit        CLUSTERED           MMAP              UTF8        false  avgt    5   3.887 ± 0.425  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        2           1500000      explicit        CLUSTERED           MMAP              UTF8        force  avgt    5   2.557 ± 0.268  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        2           1500000      explicit      UNCLUSTERED           MMAP              UTF8        false  avgt    5   4.258 ± 0.306  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        2           1500000      explicit      UNCLUSTERED           MMAP              UTF8        force  avgt    5   2.774 ± 0.385  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        2           1500000      explicit     TIME_ORDERED           MMAP              UTF8        false  avgt    5   7.848 ± 0.542  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        2           1500000      explicit     TIME_ORDERED           MMAP              UTF8        force  avgt    5   6.667 ± 0.676  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        3           1500000      explicit        CLUSTERED           MMAP              UTF8        false  avgt    5  30.165 ± 0.952  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        3           1500000      explicit        CLUSTERED           MMAP              UTF8        force  avgt    5   5.671 ± 0.205  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        3           1500000      explicit      UNCLUSTERED           MMAP              UTF8        false  avgt    5  33.635 ± 0.957  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        3           1500000      explicit      UNCLUSTERED           MMAP              UTF8        force  avgt    5   9.464 ± 0.401  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        3           1500000      explicit     TIME_ORDERED           MMAP              UTF8        false  avgt    5  20.001 ± 0.918  ms/op
SqlClusteredSegmentsBenchmark.querySql                       40                  none                        SMILE                       1        3           1500000      explicit     TIME_ORDERED           MMAP              UTF8        force  avgt    5   3.986 ± 0.288  ms/op

Results are in-line with expectations, query 0 and 2 (queries that touch a single cluster group) are a bit faster with clustered segments especially when compared to __time ordered, while cross cluster group queries are slower.

I'll be working on improving queries 1 and 3 as a follow-up PR to make it so that we can still use dictionary encoding when using the concatenating cursors. I also plan to follow-up further optimizations for the single group case, such as being able to treat them as __time ordered as long as it is the next ordering column after the clustering columns.

Comment thread benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlBenchmarkDatasets.java Dismissed
Comment thread benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlBenchmarkDatasets.java Dismissed

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 13 of 13 changed files.


This is an automated review by Codex GPT-5.5

@capistrant capistrant left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool stuff. The single clustered group optimization is a nice win

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the vectorized sibling are neat, thanks for adding

@capistrant

Copy link
Copy Markdown
Contributor

I think UnnestCursorFactoryTest#testUnnestValueMatcherValueDoesntExist is failing cuz of those DataGenerator changes to the seed values. So I think all that is needed is updating the assertion to play nice with the new output data from the generator. 569 instead of 618

@clintropolis clintropolis merged commit 2c634fc into apache:master Jun 24, 2026
39 checks passed
@clintropolis clintropolis deleted the clustered-segment-benchmark branch June 24, 2026 19:25
@github-actions github-actions Bot added this to the 38.0.0 milestone Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants