feat: clustered segment benchmark and fix#19623
Merged
clintropolis merged 2 commits intoJun 24, 2026
Merged
Conversation
FrankChen021
left a comment
Member
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 13 of 13 changed files.
This is an automated review by Codex GPT-5.5
capistrant
approved these changes
Jun 24, 2026
capistrant
left a comment
Contributor
There was a problem hiding this comment.
cool stuff. The single clustered group optimization is a nice win
Contributor
There was a problem hiding this comment.
this and the vectorized sibling are neat, thanks for adding
Contributor
|
I think |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a benchmark for clustered segments, and fixes a bug with clustering columns similar in nature to the problem fixed in #19599 for delegate columns. The problem was caused by the constant selectors of clustering columns advertising the incorrect cardinality (1 since it was a constant selector per group).
This PR also adds an optimized path for the 'single group' path to just delegate directly to the underlying group, since it is correct to do so in the single group case.
The added benchmarks are measuring segments with 3 layouts:
CLUSTERED: clustered segmentUNCLUSTERED: unclustered v10 segment NOT ordered by time and so effectively the same ordering as the clustered segment, to have an apples to apples comparison of clustered vs notTIME_ORDERED: v10 segment ordered by __time then clustering columns then non-clustering, to provide comparison with more typical segments.Benchmark results:
Results are in-line with expectations, query 0 and 2 (queries that touch a single cluster group) are a bit faster with clustered segments especially when compared to __time ordered, while cross cluster group queries are slower.
I'll be working on improving queries 1 and 3 as a follow-up PR to make it so that we can still use dictionary encoding when using the concatenating cursors. I also plan to follow-up further optimizations for the single group case, such as being able to treat them as __time ordered as long as it is the next ordering column after the clustering columns.