DPTP-2938: fix pod-scaler quantile int64 overflow at digest#5227
DPTP-2938: fix pod-scaler quantile int64 overflow at digest#5227deepsm007 wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@deepsm007: This pull request references DPTP-2938 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR consolidates quantile-to-resource conversion logic by introducing centralized validation and conversion helpers ( ChangesResource quantity conversion refactoring
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 16 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (16 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/pod-scaler/admission_test.go (1)
872-891: ⚡ Quick winThis regression doesn't cover the new CPU-specific guard.
This case is rejected by
q.Sign() < 0beforerecommendationQuantityUsable()reaches the CPU-onlyq.MilliValue() < 0branch, so the new helper's extra logic can still regress unnoticed. Please add a small table-drivenTestRecommendationQuantityUsablethat covers zero, negative memory, and the CPU-specific path directly.As per coding guidelines, "New or modified functionality should include test coverage. New Go functions and methods should have corresponding unit tests. Bug fixes should include a regression test that fails without the fix. Pure functions should always be tested. Prefer table-driven tests."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/admission_test.go` around lines 872 - 891, Add a new table-driven unit test named TestRecommendationQuantityUsable that directly exercises recommendationQuantityUsable for the three edge cases: (1) zero quantity, (2) negative memory quantity, and (3) CPU-specific negative milli-value path (use resource.NewMilliQuantity with a negative milli to trigger q.MilliValue() < 0). For each case construct a resource.Quantity for the "ours" value and an appropriate "theirs" value, call recommendationQuantityUsable and assert the returned boolean matches the expected result; ensure the CPU case specifically verifies the CPU-only branch is executed (i.e., a negative milli quantity is rejected even if q.Sign() handling might differ). Use table entries with descriptive names and iterate them in the test to produce clear fail messages if the behavior regresses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/pod-scaler/admission_test.go`:
- Around line 872-891: Add a new table-driven unit test named
TestRecommendationQuantityUsable that directly exercises
recommendationQuantityUsable for the three edge cases: (1) zero quantity, (2)
negative memory quantity, and (3) CPU-specific negative milli-value path (use
resource.NewMilliQuantity with a negative milli to trigger q.MilliValue() < 0).
For each case construct a resource.Quantity for the "ours" value and an
appropriate "theirs" value, call recommendationQuantityUsable and assert the
returned boolean matches the expected result; ensure the CPU case specifically
verifies the CPU-only branch is executed (i.e., a negative milli quantity is
rejected even if q.Sign() handling might differ). Use table entries with
descriptive names and iterate them in the test to produce clear fail messages if
the behavior regresses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 961543f2-8224-413f-b973-7f39951346f0
📒 Files selected for processing (2)
cmd/pod-scaler/admission.gocmd/pod-scaler/admission_test.go
9c3f1fd to
a812de4
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/pod-scaler/admission_test.go (1)
924-958: ⚡ Quick winAdd at least one usable-quantity case.
This only asserts the rejection path. A regression that makes
recommendationQuantityUsablereturnfalsefor every input would still pass this test. Please add a positive CPU and/or memory case that expectstrue.As per coding guidelines, "New or modified functionality should include test coverage. New Go functions and methods should have corresponding unit tests. Bug fixes should include a regression test that fails without the fix. Pure functions should always be tested. Prefer table-driven tests."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/admission_test.go` around lines 924 - 958, The test TestRecommendationQuantityUsable only covers negative/zero cases; add at least one positive usable-quantity case so the acceptance path is asserted. Inside the tests slice in TestRecommendationQuantityUsable add a case (e.g., name "positive memory quantity" and/or "positive cpu milli quantity") that uses a non-zero positive resource.Quantity (for memory use resource.NewQuantity(1024, resource.BinarySI); for CPU use resource.NewMilliQuantity(500, resource.DecimalSI)) with the appropriate corev1.ResourceMemory or corev1.ResourceCPU field and set want: true so recommendationQuantityUsable(...) is expected to return true.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/pod-scaler/admission_test.go`:
- Around line 924-958: The test TestRecommendationQuantityUsable only covers
negative/zero cases; add at least one positive usable-quantity case so the
acceptance path is asserted. Inside the tests slice in
TestRecommendationQuantityUsable add a case (e.g., name "positive memory
quantity" and/or "positive cpu milli quantity") that uses a non-zero positive
resource.Quantity (for memory use resource.NewQuantity(1024, resource.BinarySI);
for CPU use resource.NewMilliQuantity(500, resource.DecimalSI)) with the
appropriate corev1.ResourceMemory or corev1.ResourceCPU field and set want: true
so recommendationQuantityUsable(...) is expected to return true.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2e152c2a-f1d2-49d2-b347-6b22e5ef4d7f
📒 Files selected for processing (2)
cmd/pod-scaler/admission.gocmd/pod-scaler/admission_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/pod-scaler/admission.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/pod-scaler/admission_test.go`:
- Around line 872-951: Add a new table-driven test case (next to the "negative
cpu recommendation ignored" cases in admission_test.go) that sets
authoritativeCPU: true and supplies ours.Requests[corev1.ResourceCPU] =
*resource.NewMilliQuantity(0, resource.DecimalSI) (zero CPU), with
theirs.Requests[corev1.ResourceCPU] = resource.MustParse("500m") and
expected.Requests[corev1.ResourceCPU] = resource.MustParse("500m"); this ensures
the zero-CPU authoritative recommendation is ignored (like the negative cases)
and guards the CPU 10m-floor logic in the merge behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0d6bea20-38c0-4668-8a8b-bb3d98c6558b
📒 Files selected for processing (2)
cmd/pod-scaler/admission.gocmd/pod-scaler/admission_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/pod-scaler/admission.go
|
/hold |
a812de4 to
347fcf1
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deepsm007, psalajova The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
347fcf1 to
da90081
Compare
|
/unhold |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
cmd/pod-scaler/resources.go (4)
49-58: ⚡ Quick winAdd documentation for cpuQuantityFromQuantile.
This function performs quantile validation and unit conversion (cores → millicores). A comment documenting its purpose, parameters, and the bool return value would improve code clarity. As per coding guidelines, Go documentation on functions should be written properly.
📝 Suggested documentation
+// cpuQuantityFromQuantile converts a CPU quantile value (in cores) to a resource.Quantity +// in millicores. Returns false if the value is invalid (NaN, Inf, negative, or would overflow). func cpuQuantityFromQuantile(cores float64) (resource.Quantity, bool) { if !quantileValueUsable(cores) { return resource.Quantity{}, false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/resources.go` around lines 49 - 58, Add a proper Go doc comment above the cpuQuantityFromQuantile function that states its purpose (validate a quantile CPU value and convert CPU cores to a resource.Quantity in millicores), documents the parameter (cores float64) and explains the boolean return (true if conversion succeeded and value is usable, false on invalid/overflow), and mention edge conditions handled (uses quantileValueUsable and checks against math.MaxInt64). Ensure the comment follows godoc style (starts with the function name).
70-78: ⚡ Quick winAdd documentation for memoryQuantityFromQuantile.
This function performs quantile validation and overflow checking for memory quantities. A comment documenting its purpose, parameters, and the bool return value would improve code clarity. As per coding guidelines, Go documentation on functions should be written properly.
📝 Suggested documentation
+// memoryQuantityFromQuantile converts a memory quantile value (in bytes) to a resource.Quantity. +// Returns false if the value is invalid (NaN, Inf, negative, or would overflow). func memoryQuantityFromQuantile(bytes float64) (resource.Quantity, bool) { if !quantileValueUsable(bytes) { return resource.Quantity{}, false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/resources.go` around lines 70 - 78, Add a proper Go doc comment above memoryQuantityFromQuantile explaining its purpose (validate a quantile-derived memory value and convert to resource.Quantity), describing the parameter (bytes: float64 bytes from a quantile), and clarifying the bool return (true when conversion succeeded, false for unusable quantile or overflow). Mention the function behavior: it calls quantileValueUsable, checks overflow against math.MaxInt64, and returns a resource.NewQuantity(int64(bytes), resource.BinarySI) on success; place the comment directly above the memoryQuantityFromQuantile declaration.
45-47: ⚡ Quick winAdd documentation for quantileValueUsable.
While the function name is descriptive, a brief comment explaining why these conditions (NaN, Inf, negative) make a quantile value unusable would improve maintainability. As per coding guidelines, Go documentation on functions should be written properly.
📝 Suggested documentation
+// quantileValueUsable returns true if the quantile value can be safely converted +// to a resource.Quantity for CPU or memory recommendations. func quantileValueUsable(v float64) bool { return !math.IsNaN(v) && !math.IsInf(v, 0) && v >= 0 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/resources.go` around lines 45 - 47, Add a Go doc comment for the function quantileValueUsable explaining its purpose and why it rejects NaN, infinite, or negative values (e.g., because quantiles must be finite non-negative numbers for resource scaling decisions); place the comment directly above the quantileValueUsable function declaration and follow Go comment style starting with the function name.
46-46: Update: circonusllhist quantile edge cases (NaN/Inf/negative)
circonusllhist.Histogram.ValueAtQuantilereturnsmath.NaN()whenApproxQuantilefails (empty histogram / out-of-bound / out-of-order); the non-error path returns finite computed quantiles and there’s no±Infreturn path in these functions.- Negative quantiles are possible when the histogram records negative values (
bin.left()/bin.value()propagate sign), so thev >= 0check inquantileValueUsableis meaningful for pod-scaler resources.- The
math.IsInf(v, 0)portion ofquantileValueUsableis therefore largely defensive extra (won’t come from circonusllhist in these code paths), but it’s harmless.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/resources.go` at line 46, quantileValueUsable should explicitly document why it checks for NaN, Inf and non‑negative values: add a short comment above the quantileValueUsable function referencing circonusllhist.Histogram.ValueAtQuantile returning math.NaN() on failure, noting that negative quantiles can occur when histogram bins contain negative values and that math.IsInf(v, 0) is kept as a defensive check even though circonusllhist won’t produce ±Inf in these code paths; keep the existing return logic (!math.IsNaN(v) && !math.IsInf(v, 0) && v >= 0) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/pod-scaler/resources.go`:
- Around line 49-58: Add a proper Go doc comment above the
cpuQuantityFromQuantile function that states its purpose (validate a quantile
CPU value and convert CPU cores to a resource.Quantity in millicores), documents
the parameter (cores float64) and explains the boolean return (true if
conversion succeeded and value is usable, false on invalid/overflow), and
mention edge conditions handled (uses quantileValueUsable and checks against
math.MaxInt64). Ensure the comment follows godoc style (starts with the function
name).
- Around line 70-78: Add a proper Go doc comment above
memoryQuantityFromQuantile explaining its purpose (validate a quantile-derived
memory value and convert to resource.Quantity), describing the parameter (bytes:
float64 bytes from a quantile), and clarifying the bool return (true when
conversion succeeded, false for unusable quantile or overflow). Mention the
function behavior: it calls quantileValueUsable, checks overflow against
math.MaxInt64, and returns a resource.NewQuantity(int64(bytes),
resource.BinarySI) on success; place the comment directly above the
memoryQuantityFromQuantile declaration.
- Around line 45-47: Add a Go doc comment for the function quantileValueUsable
explaining its purpose and why it rejects NaN, infinite, or negative values
(e.g., because quantiles must be finite non-negative numbers for resource
scaling decisions); place the comment directly above the quantileValueUsable
function declaration and follow Go comment style starting with the function
name.
- Line 46: quantileValueUsable should explicitly document why it checks for NaN,
Inf and non‑negative values: add a short comment above the quantileValueUsable
function referencing circonusllhist.Histogram.ValueAtQuantile returning
math.NaN() on failure, noting that negative quantiles can occur when histogram
bins contain negative values and that math.IsInf(v, 0) is kept as a defensive
check even though circonusllhist won’t produce ±Inf in these code paths; keep
the existing return logic (!math.IsNaN(v) && !math.IsInf(v, 0) && v >= 0)
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b4b754fb-0972-4ad5-a963-2cfd5cea3acd
📒 Files selected for processing (2)
cmd/pod-scaler/resources.gocmd/pod-scaler/resources_test.go
✅ Files skipped from review due to trivial changes (1)
- cmd/pod-scaler/resources_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/pod-scaler/resources_test.go (1)
59-77: ⚡ Quick winAdd memory negative infinity test case for symmetry.
The CPU tests include a negative infinity case (lines 32-35), but the memory tests skip it. Adding a "memory negative infinity" case ensures both conversion paths handle
-infconsistently.✅ Suggested addition
Insert after line 62:
{ name: "memory positive infinity", quantile: math.Inf(1), cpu: false, }, + { + name: "memory negative infinity", + quantile: math.Inf(-1), + cpu: false, + }, { name: "memory nan",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/pod-scaler/resources_test.go` around lines 59 - 77, Add a symmetric "memory negative infinity" test case to the memory tests by inserting an entry with name "memory negative infinity", quantile math.Inf(-1), and cpu: false into the same test table used in resources_test.go (the slice that already contains "memory positive infinity", "memory nan", etc.); place it adjacent to the other memory-edge cases so the conversion logic exercised by the test functions (the table used by the test for quantile-to-resource conversion) handles -Inf the same way CPU tests do.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cmd/pod-scaler/resources_test.go`:
- Around line 59-77: Add a symmetric "memory negative infinity" test case to the
memory tests by inserting an entry with name "memory negative infinity",
quantile math.Inf(-1), and cpu: false into the same test table used in
resources_test.go (the slice that already contains "memory positive infinity",
"memory nan", etc.); place it adjacent to the other memory-edge cases so the
conversion logic exercised by the test functions (the table used by the test for
quantile-to-resource conversion) handles -Inf the same way CPU tests do.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ad821efd-4f55-4cd4-b68f-9ac84265e1b2
📒 Files selected for processing (2)
cmd/pod-scaler/resources.gocmd/pod-scaler/resources_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/pod-scaler/resources.go
|
/hold this for furthur investigation/improvements |
fae9965 to
34be547
Compare
Reject NaN/Inf/negative quantiles before caching recommendations. Use hist.Max for sparse samples; cap CPU/memory at digest to match admission. Skip invalid Prometheus samples at ingest.
34be547 to
50f74fe
Compare
|
/unhold |
|
/test images |
|
Scheduling tests matching the |
|
@deepsm007: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/cc @openshift/test-platform
Pod-scaler: skip invalid negative cache recommendations
This PR hardens the pod-scaler component (the controller that recommends container CPU/memory requests and limits for CI workloads based on Prometheus metrics) by validating quantile-derived values and skipping any invalid recommendations so they are not cached or propagated to containers.
What changed (practical effect)
Components affected
Impact for CI users/operators
Pod-scaler will be more robust against anomalous Prometheus data and will no longer produce or cache unusable resource requests/limits derived from invalid metric quantiles. This reduces the risk that CI pods receive invalid resource recommendations due to metric corruption or edge-case values.