Skip to content

fix: Fix invalid time comparison#19603

Open
vivek807 wants to merge 11 commits into
apache:masterfrom
deep-bi:deep/feature/DRUID-125-Fix-invalid-time-comparison
Open

fix: Fix invalid time comparison#19603
vivek807 wants to merge 11 commits into
apache:masterfrom
deep-bi:deep/feature/DRUID-125-Fix-invalid-time-comparison

Conversation

@vivek807

Copy link
Copy Markdown
Contributor

Fixes #19602.

Description

forceOrWaitOngoingDatabasePoll in SqlSegmentsMetadataManager checks whether a concurrent OnDemandDatabasePoll that completed while the calling thread waited for the write lock makes a new poll unnecessary. This check was always false due to a clock mismatch, causing an extra database poll on every contended call.

Fixed invalid clock comparison in forceOrWaitOngoingDatabasePoll

OnDemandDatabasePoll.initiationTimeNanos is assigned via System.nanoTime(), which is a monotonic clock with an arbitrary JVM-internal reference point.
The comparison value was constructed as:

long checkStartTimeNanos = TimeUnit.MILLISECONDS.toNanos(checkStartTime);
// checkStartTime = System.currentTimeMillis()

System.currentTimeMillis() returns milliseconds since the Unix epoch (~1.7 × 10¹² ms). Converting to nanoseconds yields ~1.7 × 10²¹ ns, which is orders of magnitude larger than any System.nanoTime() value. The condition initiationTimeNanos > checkStartTimeNanos was therefore always false, so an in-progress on-demand poll was never recognised as sufficient.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added the GHA label Jun 19, 2026
@vivek807 vivek807 force-pushed the deep/feature/DRUID-125-Fix-invalid-time-comparison branch from 534f417 to 81c9773 Compare June 19, 2026 11:25

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1
Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 1 of 1 changed files.


This is an automated review by Codex GPT-5.5

Add Unit test for same
@vivek807 vivek807 requested a review from FrankChen021 June 22, 2026 09:07

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 0
P3 1
Total 1

Reviewed 2 of 2 changed files.


This is an automated review by Codex GPT-5.5

try {
caller.start();
// Allow the caller to capture checkStartTimeNanos and block on the write lock.
Thread.sleep(100);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] Replace timing sleep with deterministic thread coordination

The new regression test depends on the caller thread being scheduled within this fixed 100 ms window so it captures checkStartTimeNanos before concurrentPoll is created. On a slow or overloaded CI worker, the caller can start after the sleep, making concurrentPoll.initiationTimeNanos <= checkStartTimeNanos; then the production branch under test is skipped and the caller may run its own poll or return early, causing a flaky failure or a false pass. Please coordinate on the lock queue/state, e.g. wait until the caller is queued on the write lock before creating concurrentPoll, instead of relying on sleep timing.

@maytasm

maytasm commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@vivek807
Thanks for flagging this issue. You are right, the values are not comparable.
I'm thinking that maybe instead we should standardize the timestamps in OnDemandDatabasePoll and PeriodicDatabasePoll to use the same format (i.e. both use System.nanoTime() or both use System.currentTimeMillis()). This can avoid similar confusion / bug like this in the future. Wdyt? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

4 participants