Skip to content

DPTP-2938: fix pod-scaler quantile int64 overflow at digest#5227

Open
deepsm007 wants to merge 1 commit into
openshift:mainfrom
deepsm007:fix/pod-scaler-invalid-recommendation
Open

DPTP-2938: fix pod-scaler quantile int64 overflow at digest#5227
deepsm007 wants to merge 1 commit into
openshift:mainfrom
deepsm007:fix/pod-scaler-invalid-recommendation

Conversation

@deepsm007
Copy link
Copy Markdown
Contributor

@deepsm007 deepsm007 commented Jun 2, 2026

/cc @openshift/test-platform

Pod-scaler: skip invalid negative cache recommendations

This PR hardens the pod-scaler component (the controller that recommends container CPU/memory requests and limits for CI workloads based on Prometheus metrics) by validating quantile-derived values and skipping any invalid recommendations so they are not cached or propagated to containers.

What changed (practical effect)

  • pod-scaler now validates quantile values derived from metrics and will not write invalid conversions into the recommendation cache. Invalid inputs include +/-Inf, NaN, negative values, and extreme/overflow-like values.
  • When an invalid quantile is encountered for a given metadata/resource pair, the code logs a debug message and skips updating that recommendation instead of storing an unusable quantity.
  • Tests were consolidated and expanded into a table-driven test that verifies correct conversions for normal CPU and memory inputs and confirms invalid inputs are rejected.

Components affected

  • cmd/pod-scaler (recommender/cache logic): runtime behavior changed to avoid caching invalid resource.Quantity values.
  • pod-scaler tests: updated to cover the new validation logic.

Impact for CI users/operators

Pod-scaler will be more robust against anomalous Prometheus data and will no longer produce or cache unusable resource requests/limits derived from invalid metric quantiles. This reduces the risk that CI pods receive invalid resource recommendations due to metric corruption or edge-case values.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci Bot requested a review from a team June 2, 2026 16:46
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Jun 2, 2026

@deepsm007: This pull request references DPTP-2938 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

/cc @openshift/test-platform

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR consolidates quantile-to-resource conversion logic by introducing centralized validation and conversion helpers (quantileValueUsable, cpuQuantityFromQuantile, memoryQuantityFromQuantile) that replace per-metric formatter callbacks. The digestData function now switches on resource type to apply the correct helper and skips invalid quantile values with debug logging.

Changes

Resource quantity conversion refactoring

Layer / File(s) Summary
Quantile validation and conversion helpers
cmd/pod-scaler/resources.go
Adds math import and three new helpers: quantileValueUsable validates bounds, cpuQuantityFromQuantile converts to millicores, and memoryQuantityFromQuantile converts to bytes. Simplifies digest entry points by removing formatter callback plumbing.
digestData integration and storage update
cmd/pod-scaler/resources.go
Computes quantile once, switches on resource type to call the appropriate helper, logs debug when conversion fails, and stores the computed quantity directly in byMetaData.
Conversion helper unit tests
cmd/pod-scaler/resources_test.go
Table-driven tests validate successful conversion and proper rejection of infinity, NaN, negative, and overflow-like values for both CPU and memory helpers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

lgtm

Suggested reviewers

  • psalajova
  • jmguzik
🚥 Pre-merge checks | ✅ 16 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (16 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: refactoring quantile-to-quantity conversion logic to fix int64 overflow handling by adding validation and skipping invalid quantile values.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Go Error Handling ✅ Passed PR implements proper Go error handling: no ignored errors, no panics, safe pointer dereferences via non-nil resource.New* functions, and proper (value, ok) pattern for validation.
Test Coverage For New Features ✅ Passed Three new pure functions have comprehensive table-driven test coverage with regression tests for invalid inputs (infinity, NaN, negative, overflow).
Stable And Deterministic Test Names ✅ Passed PR contains standard Go tests (not Ginkgo), and all test names are static, deterministic, and descriptive with no dynamic values.
Test Structure And Quality ✅ Passed The custom check reviews Ginkgo test code for quality requirements. The PR contains standard Go unit tests (testing.T), not Ginkgo tests, so the check is not applicable.
Microshift Test Compatibility ✅ Passed PR contains only standard Go unit tests in cmd/pod-scaler/, not Ginkgo e2e tests. The custom check for MicroShift test compatibility is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds unit tests only (no Ginkgo e2e tests). SNO compatibility check applies only to new Ginkgo e2e tests, which are not present in this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only internal pod-scaler quantile processing logic. No deployment manifests, operator code, or controllers are affected. No topology-aware scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed Pod-scaler is not an OTE binary. The PR changes to resources.go contain no stdout writes—only logrus logging via stderr.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR adds standard Go unit tests, not Ginkgo e2e tests. The custom check applies only to Ginkgo e2e tests with IPv4/connectivity concerns; these unit tests contain none.
No-Weak-Crypto ✅ Passed PR modifies pod-scaler resource calculation logic only; contains no weak crypto, custom crypto, or non-constant-time secret comparisons.
Container-Privileges ✅ Passed PR changes only modify Go source/test files (cmd/pod-scaler/resources.go, resources_test.go). No Kubernetes manifests, container configs, or security context configurations are affected.
No-Sensitive-Data-In-Logs ✅ Passed Logging only contains non-sensitive data: CI metadata (org, repo, branch, target, step, container), numeric counts, and resource metrics. No secrets, credentials, or PII exposed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/pod-scaler/admission_test.go (1)

872-891: ⚡ Quick win

This regression doesn't cover the new CPU-specific guard.

This case is rejected by q.Sign() < 0 before recommendationQuantityUsable() reaches the CPU-only q.MilliValue() < 0 branch, so the new helper's extra logic can still regress unnoticed. Please add a small table-driven TestRecommendationQuantityUsable that covers zero, negative memory, and the CPU-specific path directly.

As per coding guidelines, "New or modified functionality should include test coverage. New Go functions and methods should have corresponding unit tests. Bug fixes should include a regression test that fails without the fix. Pure functions should always be tested. Prefer table-driven tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/admission_test.go` around lines 872 - 891, Add a new
table-driven unit test named TestRecommendationQuantityUsable that directly
exercises recommendationQuantityUsable for the three edge cases: (1) zero
quantity, (2) negative memory quantity, and (3) CPU-specific negative
milli-value path (use resource.NewMilliQuantity with a negative milli to trigger
q.MilliValue() < 0). For each case construct a resource.Quantity for the "ours"
value and an appropriate "theirs" value, call recommendationQuantityUsable and
assert the returned boolean matches the expected result; ensure the CPU case
specifically verifies the CPU-only branch is executed (i.e., a negative milli
quantity is rejected even if q.Sign() handling might differ). Use table entries
with descriptive names and iterate them in the test to produce clear fail
messages if the behavior regresses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/pod-scaler/admission_test.go`:
- Around line 872-891: Add a new table-driven unit test named
TestRecommendationQuantityUsable that directly exercises
recommendationQuantityUsable for the three edge cases: (1) zero quantity, (2)
negative memory quantity, and (3) CPU-specific negative milli-value path (use
resource.NewMilliQuantity with a negative milli to trigger q.MilliValue() < 0).
For each case construct a resource.Quantity for the "ours" value and an
appropriate "theirs" value, call recommendationQuantityUsable and assert the
returned boolean matches the expected result; ensure the CPU case specifically
verifies the CPU-only branch is executed (i.e., a negative milli quantity is
rejected even if q.Sign() handling might differ). Use table entries with
descriptive names and iterate them in the test to produce clear fail messages if
the behavior regresses.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 961543f2-8224-413f-b973-7f39951346f0

📥 Commits

Reviewing files that changed from the base of the PR and between f488302 and bfc421e.

📒 Files selected for processing (2)
  • cmd/pod-scaler/admission.go
  • cmd/pod-scaler/admission_test.go

@deepsm007 deepsm007 force-pushed the fix/pod-scaler-invalid-recommendation branch 2 times, most recently from 9c3f1fd to a812de4 Compare June 2, 2026 16:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/pod-scaler/admission_test.go (1)

924-958: ⚡ Quick win

Add at least one usable-quantity case.

This only asserts the rejection path. A regression that makes recommendationQuantityUsable return false for every input would still pass this test. Please add a positive CPU and/or memory case that expects true.

As per coding guidelines, "New or modified functionality should include test coverage. New Go functions and methods should have corresponding unit tests. Bug fixes should include a regression test that fails without the fix. Pure functions should always be tested. Prefer table-driven tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/admission_test.go` around lines 924 - 958, The test
TestRecommendationQuantityUsable only covers negative/zero cases; add at least
one positive usable-quantity case so the acceptance path is asserted. Inside the
tests slice in TestRecommendationQuantityUsable add a case (e.g., name "positive
memory quantity" and/or "positive cpu milli quantity") that uses a non-zero
positive resource.Quantity (for memory use resource.NewQuantity(1024,
resource.BinarySI); for CPU use resource.NewMilliQuantity(500,
resource.DecimalSI)) with the appropriate corev1.ResourceMemory or
corev1.ResourceCPU field and set want: true so recommendationQuantityUsable(...)
is expected to return true.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/pod-scaler/admission_test.go`:
- Around line 924-958: The test TestRecommendationQuantityUsable only covers
negative/zero cases; add at least one positive usable-quantity case so the
acceptance path is asserted. Inside the tests slice in
TestRecommendationQuantityUsable add a case (e.g., name "positive memory
quantity" and/or "positive cpu milli quantity") that uses a non-zero positive
resource.Quantity (for memory use resource.NewQuantity(1024, resource.BinarySI);
for CPU use resource.NewMilliQuantity(500, resource.DecimalSI)) with the
appropriate corev1.ResourceMemory or corev1.ResourceCPU field and set want: true
so recommendationQuantityUsable(...) is expected to return true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2e152c2a-f1d2-49d2-b347-6b22e5ef4d7f

📥 Commits

Reviewing files that changed from the base of the PR and between bfc421e and 9c3f1fd.

📒 Files selected for processing (2)
  • cmd/pod-scaler/admission.go
  • cmd/pod-scaler/admission_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/pod-scaler/admission.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/pod-scaler/admission_test.go`:
- Around line 872-951: Add a new table-driven test case (next to the "negative
cpu recommendation ignored" cases in admission_test.go) that sets
authoritativeCPU: true and supplies ours.Requests[corev1.ResourceCPU] =
*resource.NewMilliQuantity(0, resource.DecimalSI) (zero CPU), with
theirs.Requests[corev1.ResourceCPU] = resource.MustParse("500m") and
expected.Requests[corev1.ResourceCPU] = resource.MustParse("500m"); this ensures
the zero-CPU authoritative recommendation is ignored (like the negative cases)
and guards the CPU 10m-floor logic in the merge behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0d6bea20-38c0-4668-8a8b-bb3d98c6558b

📥 Commits

Reviewing files that changed from the base of the PR and between 9c3f1fd and a812de4.

📒 Files selected for processing (2)
  • cmd/pod-scaler/admission.go
  • cmd/pod-scaler/admission_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/pod-scaler/admission.go

Comment thread cmd/pod-scaler/admission_test.go Outdated
@deepsm007
Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2026
@deepsm007 deepsm007 force-pushed the fix/pod-scaler-invalid-recommendation branch from a812de4 to 347fcf1 Compare June 2, 2026 17:07
@psalajova
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepsm007, psalajova

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [deepsm007,psalajova]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepsm007 deepsm007 force-pushed the fix/pod-scaler-invalid-recommendation branch from 347fcf1 to da90081 Compare June 2, 2026 17:09
@deepsm007 deepsm007 changed the title DPTP-2938: skip invalid negative cache recommendations in pod-scaler DPTP-2938: fix pod-scaler quantile int64 overflow at digest Jun 2, 2026
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2026
@deepsm007
Copy link
Copy Markdown
Contributor Author

/unhold

@openshift-ci openshift-ci Bot removed lgtm Indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 2, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
cmd/pod-scaler/resources.go (4)

49-58: ⚡ Quick win

Add documentation for cpuQuantityFromQuantile.

This function performs quantile validation and unit conversion (cores → millicores). A comment documenting its purpose, parameters, and the bool return value would improve code clarity. As per coding guidelines, Go documentation on functions should be written properly.

📝 Suggested documentation
+// cpuQuantityFromQuantile converts a CPU quantile value (in cores) to a resource.Quantity
+// in millicores. Returns false if the value is invalid (NaN, Inf, negative, or would overflow).
 func cpuQuantityFromQuantile(cores float64) (resource.Quantity, bool) {
 	if !quantileValueUsable(cores) {
 		return resource.Quantity{}, false
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/resources.go` around lines 49 - 58, Add a proper Go doc
comment above the cpuQuantityFromQuantile function that states its purpose
(validate a quantile CPU value and convert CPU cores to a resource.Quantity in
millicores), documents the parameter (cores float64) and explains the boolean
return (true if conversion succeeded and value is usable, false on
invalid/overflow), and mention edge conditions handled (uses quantileValueUsable
and checks against math.MaxInt64). Ensure the comment follows godoc style
(starts with the function name).

70-78: ⚡ Quick win

Add documentation for memoryQuantityFromQuantile.

This function performs quantile validation and overflow checking for memory quantities. A comment documenting its purpose, parameters, and the bool return value would improve code clarity. As per coding guidelines, Go documentation on functions should be written properly.

📝 Suggested documentation
+// memoryQuantityFromQuantile converts a memory quantile value (in bytes) to a resource.Quantity.
+// Returns false if the value is invalid (NaN, Inf, negative, or would overflow).
 func memoryQuantityFromQuantile(bytes float64) (resource.Quantity, bool) {
 	if !quantileValueUsable(bytes) {
 		return resource.Quantity{}, false
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/resources.go` around lines 70 - 78, Add a proper Go doc
comment above memoryQuantityFromQuantile explaining its purpose (validate a
quantile-derived memory value and convert to resource.Quantity), describing the
parameter (bytes: float64 bytes from a quantile), and clarifying the bool return
(true when conversion succeeded, false for unusable quantile or overflow).
Mention the function behavior: it calls quantileValueUsable, checks overflow
against math.MaxInt64, and returns a resource.NewQuantity(int64(bytes),
resource.BinarySI) on success; place the comment directly above the
memoryQuantityFromQuantile declaration.

45-47: ⚡ Quick win

Add documentation for quantileValueUsable.

While the function name is descriptive, a brief comment explaining why these conditions (NaN, Inf, negative) make a quantile value unusable would improve maintainability. As per coding guidelines, Go documentation on functions should be written properly.

📝 Suggested documentation
+// quantileValueUsable returns true if the quantile value can be safely converted
+// to a resource.Quantity for CPU or memory recommendations.
 func quantileValueUsable(v float64) bool {
 	return !math.IsNaN(v) && !math.IsInf(v, 0) && v >= 0
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/resources.go` around lines 45 - 47, Add a Go doc comment for
the function quantileValueUsable explaining its purpose and why it rejects NaN,
infinite, or negative values (e.g., because quantiles must be finite
non-negative numbers for resource scaling decisions); place the comment directly
above the quantileValueUsable function declaration and follow Go comment style
starting with the function name.

46-46: Update: circonusllhist quantile edge cases (NaN/Inf/negative)

  • circonusllhist.Histogram.ValueAtQuantile returns math.NaN() when ApproxQuantile fails (empty histogram / out-of-bound / out-of-order); the non-error path returns finite computed quantiles and there’s no ±Inf return path in these functions.
  • Negative quantiles are possible when the histogram records negative values (bin.left()/bin.value() propagate sign), so the v >= 0 check in quantileValueUsable is meaningful for pod-scaler resources.
  • The math.IsInf(v, 0) portion of quantileValueUsable is therefore largely defensive extra (won’t come from circonusllhist in these code paths), but it’s harmless.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/resources.go` at line 46, quantileValueUsable should
explicitly document why it checks for NaN, Inf and non‑negative values: add a
short comment above the quantileValueUsable function referencing
circonusllhist.Histogram.ValueAtQuantile returning math.NaN() on failure, noting
that negative quantiles can occur when histogram bins contain negative values
and that math.IsInf(v, 0) is kept as a defensive check even though
circonusllhist won’t produce ±Inf in these code paths; keep the existing return
logic (!math.IsNaN(v) && !math.IsInf(v, 0) && v >= 0) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/pod-scaler/resources.go`:
- Around line 49-58: Add a proper Go doc comment above the
cpuQuantityFromQuantile function that states its purpose (validate a quantile
CPU value and convert CPU cores to a resource.Quantity in millicores), documents
the parameter (cores float64) and explains the boolean return (true if
conversion succeeded and value is usable, false on invalid/overflow), and
mention edge conditions handled (uses quantileValueUsable and checks against
math.MaxInt64). Ensure the comment follows godoc style (starts with the function
name).
- Around line 70-78: Add a proper Go doc comment above
memoryQuantityFromQuantile explaining its purpose (validate a quantile-derived
memory value and convert to resource.Quantity), describing the parameter (bytes:
float64 bytes from a quantile), and clarifying the bool return (true when
conversion succeeded, false for unusable quantile or overflow). Mention the
function behavior: it calls quantileValueUsable, checks overflow against
math.MaxInt64, and returns a resource.NewQuantity(int64(bytes),
resource.BinarySI) on success; place the comment directly above the
memoryQuantityFromQuantile declaration.
- Around line 45-47: Add a Go doc comment for the function quantileValueUsable
explaining its purpose and why it rejects NaN, infinite, or negative values
(e.g., because quantiles must be finite non-negative numbers for resource
scaling decisions); place the comment directly above the quantileValueUsable
function declaration and follow Go comment style starting with the function
name.
- Line 46: quantileValueUsable should explicitly document why it checks for NaN,
Inf and non‑negative values: add a short comment above the quantileValueUsable
function referencing circonusllhist.Histogram.ValueAtQuantile returning
math.NaN() on failure, noting that negative quantiles can occur when histogram
bins contain negative values and that math.IsInf(v, 0) is kept as a defensive
check even though circonusllhist won’t produce ±Inf in these code paths; keep
the existing return logic (!math.IsNaN(v) && !math.IsInf(v, 0) && v >= 0)
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b4b754fb-0972-4ad5-a963-2cfd5cea3acd

📥 Commits

Reviewing files that changed from the base of the PR and between a812de4 and 347fcf1.

📒 Files selected for processing (2)
  • cmd/pod-scaler/resources.go
  • cmd/pod-scaler/resources_test.go
✅ Files skipped from review due to trivial changes (1)
  • cmd/pod-scaler/resources_test.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/pod-scaler/resources_test.go (1)

59-77: ⚡ Quick win

Add memory negative infinity test case for symmetry.

The CPU tests include a negative infinity case (lines 32-35), but the memory tests skip it. Adding a "memory negative infinity" case ensures both conversion paths handle -inf consistently.

✅ Suggested addition

Insert after line 62:

 		{
 			name:     "memory positive infinity",
 			quantile: math.Inf(1),
 			cpu:      false,
 		},
+		{
+			name:     "memory negative infinity",
+			quantile: math.Inf(-1),
+			cpu:      false,
+		},
 		{
 			name:     "memory nan",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pod-scaler/resources_test.go` around lines 59 - 77, Add a symmetric
"memory negative infinity" test case to the memory tests by inserting an entry
with name "memory negative infinity", quantile math.Inf(-1), and cpu: false into
the same test table used in resources_test.go (the slice that already contains
"memory positive infinity", "memory nan", etc.); place it adjacent to the other
memory-edge cases so the conversion logic exercised by the test functions (the
table used by the test for quantile-to-resource conversion) handles -Inf the
same way CPU tests do.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/pod-scaler/resources_test.go`:
- Around line 59-77: Add a symmetric "memory negative infinity" test case to the
memory tests by inserting an entry with name "memory negative infinity",
quantile math.Inf(-1), and cpu: false into the same test table used in
resources_test.go (the slice that already contains "memory positive infinity",
"memory nan", etc.); place it adjacent to the other memory-edge cases so the
conversion logic exercised by the test functions (the table used by the test for
quantile-to-resource conversion) handles -Inf the same way CPU tests do.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ad821efd-4f55-4cd4-b68f-9ac84265e1b2

📥 Commits

Reviewing files that changed from the base of the PR and between 347fcf1 and da90081.

📒 Files selected for processing (2)
  • cmd/pod-scaler/resources.go
  • cmd/pod-scaler/resources_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/pod-scaler/resources.go

@deepsm007
Copy link
Copy Markdown
Contributor Author

/hold this for furthur investigation/improvements

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2026
@deepsm007 deepsm007 force-pushed the fix/pod-scaler-invalid-recommendation branch 3 times, most recently from fae9965 to 34be547 Compare June 2, 2026 20:33
Reject NaN/Inf/negative quantiles before caching recommendations. Use
hist.Max for sparse samples; cap CPU/memory at digest to match admission.
Skip invalid Prometheus samples at ingest.
@deepsm007 deepsm007 force-pushed the fix/pod-scaler-invalid-recommendation branch from 34be547 to 50f74fe Compare June 2, 2026 20:42
@deepsm007
Copy link
Copy Markdown
Contributor Author

/unhold
/e2e

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2026
@psalajova
Copy link
Copy Markdown
Contributor

/test images

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@deepsm007: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants