Skip to content

feat: introduce processing rate to max ratio as poll-idle-ratio replacement#19622

Open
Fly-Style wants to merge 3 commits into
apache:masterfrom
Fly-Style:feat/poll-idle-ratio-replacement
Open

feat: introduce processing rate to max ratio as poll-idle-ratio replacement#19622
Fly-Style wants to merge 3 commits into
apache:masterfrom
Fly-Style:feat/poll-idle-ratio-replacement

Conversation

@Fly-Style

@Fly-Style Fly-Style commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Kafka's poll-idle-ratio is not the best possible metric to represent the idle cost for autoscaler - it reflects time spent polling, not whether the task has spare processing capacity. Also, it is not supported for Kinesis, which is still in play in read-world productions.

This patch adds an alternative idle signal : 1 - (avgProcessingRate / maxObservedRate), gated behind a new opt-in flag useUtilizationRatio (default false, so existing deployments are unaffected).

  • CostBasedAutoScaler: tracks a bounded watermark of the task's best-observed processing rate (maxObservedRate), feeding CostMetrics.
  • WeightedCostFunction: when the flag is on, derives idle ratio from that utilization ratio instead of pollIdleRatio; falls back to IDEAL_IDLE_RATIO until a watermark sample exists (cold start).
  • CostBasedAutoScalerConfig: new useUtilizationRatio boolean, wired through builder/serde/equals/hashCode.
  • CostMetrics: new nullable maxObservedRate field; old constructor kept (delegates with null) since three test call sites still use it.

This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.

@Fly-Style Fly-Style changed the title Introduce processing rate to max ratio as poll-idle-ratio replacement feat: introduce processing rate to max ratio as poll-idle-ratio replacement Jun 23, 2026
@Fly-Style Fly-Style requested a review from kfaraz June 24, 2026 08:45

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1
Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 8 of 8 changed files.


This is an automated review by Codex GPT-5.5


final int lowInitialTaskCount = 1;
// This ensures tasks are busy processing (low idle ratio)
Executors.newSingleThreadExecutor().submit(() -> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Shut down the background publisher executor

The new test creates a single-thread ExecutorService and immediately drops it. That executor uses a non-daemon worker and is never shut down, so the embedded-test JVM can stay alive after the test returns; on timeout or failure it can also keep publishing into the topic while cleanup is running. Keep a reference and shut it down/cancel the Future in a finally block or use an existing managed executor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants