Skip to content

minor: add projectionTrace query context for diagnosing projection match failures#19613

Open
cecemei wants to merge 4 commits into
apache:masterfrom
cecemei:trace
Open

minor: add projectionTrace query context for diagnosing projection match failures#19613
cecemei wants to merge 4 commits into
apache:masterfrom
cecemei:trace

Conversation

@cecemei

@cecemei cecemei commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Description

Debugging why matchAggregateProjection returns null is difficult today because there are 14 distinct rejection sites spread across the matching pipeline, and there's no visibility into which one fired without a debugger or global DEBUG logging.

This PR adds a projectionTrace query context flag. When set to true, rejection reasons are logged at INFO level so they're visible without changing log4j config. When unset, the same messages are emitted at DEBUG level.

Usage:

  {                                                                                                                                                     
    "context": { "projectionTrace": true }                                                                                                              
  }                                                                                                                                                   

Rejection messages include the projection name and the specific reason (e.g., incompatible ordering, missing column, granularity mismatch), making it straightforward to identify which projection was rejected and why.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cecemei cecemei marked this pull request as ready for review June 22, 2026 18:26
@cecemei cecemei requested a review from clintropolis June 22, 2026 18:26
public static final String NO_PROJECTIONS = "noProjections";
public static final String FORCE_PROJECTION = "forceProjections";
public static final String USE_PROJECTION = "useProjection";
public static final String PROJECTION_TRACE = "projectionTrace";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if instead of adding a new flag if this should just check ENABLE_DEBUG context flag

@cecemei cecemei Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've thought about that as well, but decided not because debug=true already triggers a lot of things at once, e.x SQL planner rule, CalciteRulesManager in particular is very noisy when it logs planner rules. It'd be very noisy if someone is just trying to understand why a projection isn't matching.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i considered the noise briefly too, but then said to myself it was ok though because the noisy SQL logs are all on the broker, while these logs are only going to be on the historicals and realtime servers, per segment.

If you really want to add a new flag, i still think we should also do this logging if debug is set, so that way someone wanting "all the logs" doesn't have to set a bunch of different flags, so the logTrace method could check either projectionTrace or debug

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i added the logging if debug is set

@jtuglu1 jtuglu1 Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if instead of adding a new flag if this should just check ENABLE_DEBUG context flag

+1 to this. IMO, we should minimize the # of possible query context configuration combinations (there are frankly so many, some of which are documented and others not). My opinion: a user seeking debug information about a particular query should not need to go enable multiple query context flags to get the information they are looking for, especially if they do not know what they are looking for (debugging). For debug, I'd rather get the kitchen sink and then filter out the noise myself manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we filter by Projections class we could always reduce the noise. Updated this to use debug flag.

Comment thread processing/src/main/java/org/apache/druid/segment/projections/Projections.java Outdated
{
private static final Logger log = new Logger(Projections.class);

private static void logTrace(QueryContext context, String format, Object... args)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just pass the CursorBuildSpec into this method? it looks like all of the callers are getting it from there anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a narrower input param would be more declarative? yes all callers are getting it from CursorBuildSpec.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it was just jarring to see logTrace(queryCursorBuildSpec.getQueryContext()... like 20 times in the same file with all the stuff sourced from a CursorBuildSpec.

// match virtual columns first, which will populate the 'remapColumns' of the match builder
matchBuilder = matchQueryVirtualColumns(projection, queryCursorBuildSpec, physicalColumnChecker, matchBuilder);
if (matchBuilder == null) {
logTrace(queryCursorBuildSpec.getQueryContext(), "matchAggregateProjection: projection [%s] rejected — virtual column mismatch", projection.getName());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and some of these other top level logs seem kind of redundant since the methods they call also log when returning nulls, but i guess we sort of need these here though because otherwise i'm not sure its obvious which caller of which match method was the failing one.

This is probably ok for now since its just logs, but in the future we might consider collecting these failure reasons instead of logging them when they happen so that we can produce more concise messaging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider collecting failure reasons when forceProjection or useProjection is set in the future, and maybe add it as a system flag.

even though many messages are added here, only 1-2 lines would be logged so it should not be too verbose.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i think its fine for now, though 'only 1-2 lines' is per projection, per segment of the query, so its going to be pretty verbose anywhere that is touching more than a few segments if there are any projections at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it would be suboptimal if logging for a lot of segments, maybe could be helpful in the case when only several segments out of all are missing proper projections, it gives some amount of visibility and operatibility to projection at least.

@jtuglu1 jtuglu1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the idea, left a comment why I think we should just bake this behavior into the debug query context param.

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 2 of 2 changed files.


This is an automated review by Codex GPT-5.5

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 0
P3 1
Total 1
Severity Findings
P0 0
P1 0
P2 0
P3 1
Total 1

Reviewed 1 of 1 changed files.


This is an automated review by Codex GPT-5.5

Findings that could not be attached inline:

  • processing/src/main/java/org/apache/druid/segment/projections/Projections.java:548 - [P3] Keep the existing public helper overload. This changes the public static matchQueryPhysicalColumn(...) helper from four arguments to five just to pass logging context. Downstream extensions or tests compiled against Projections.matchQueryPhysicalColumn(column, projection, checker, builder) will now fail even though the old behavior can be preserved by retaining the existing overload and delegating with QueryContext.empty() or similar. Please keep the old overload to avoid an avoidable source-compatibility break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants