Fixes from 0.3.0 rc0#46
Open
tkatila wants to merge 6 commits into
Open
Conversation
The deviceTaint option was added due to confusion. It does not enable/disable device taints. The underlying cmdline argument was meant for testing only. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Previous namespace creation caused a helm uninstall failure which was not detected with e2e. Also replace "--wait" with a namespace removal wait. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
As the chart is now handling the namespace creation, NFD's post delete cleanup caused the helm uninstall to return non zero. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
memory "ecc_disabled" state was converted into a warning use "critical" limit instead of "warning" for temps Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR applies a set of release-candidate fixes for v0.3.0-rc0, focusing on removing an invalid DRA option, adjusting XPUMD health handling (critical thresholds + ECC state filtering), and improving Helm/e2e uninstall reliability, along with corresponding chart/CRD/doc updates.
Changes:
- Remove
deviceTaintsfrom the ClusterPolicy API/CRDs, samples, and Helm charts/templates. - Update XPUM OTel status rules to use critical temperature limits and filter out ECC states.
- Harden Helm/e2e uninstall flows and extend workflow validation to
release-*branches.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
test/e2e/e2e_utils.go |
Adds namespace deletion waiter used by Helm e2e cleanup. |
test/e2e/e2e_test.go |
Adjusts Helm uninstall cleanup and adds namespace deletion wait. |
README.md |
Updates DRA/XPUM docs (remove deviceTaints, adjust defaults/fields). |
internal/controller/xpumanager_controller.go |
Applies health thresholds to the critical state instead of warning. |
internal/controller/dra_controller.go |
Drops args derived from removed deviceTaints option. |
internal/controller/dra_controller_test.go |
Updates DRA args test expectations after removing deviceTaints. |
config/samples/dra/clusterpolicy.yaml |
Removes deviceTaints from sample. |
config/deployments/xpum/otel-config.yaml |
Switches temperature states to critical + filters ECC states. |
config/crd/bases/intel.com_clusterpolicies.yaml |
Removes deviceTaints from generated CRD schema. |
charts/gpu-base-operator/values.yaml |
Adds nfd.postDeleteCleanup value. |
charts/gpu-base-operator/crds/clusterpolicies.yaml |
Removes deviceTaints from Helm CRD copy. |
charts/gpu-base-operator/Chart.yaml |
Adds alias nfd for the NFD dependency to match values/condition usage. |
charts/gpu-base-operator-policy/values.yaml |
Removes deviceTaints from policy chart values. |
charts/gpu-base-operator-policy/templates/clusterpolicy.yaml |
Removes deviceTaints rendering into ClusterPolicy. |
charts/gpu-base-operator-policy/README.md |
Updates policy chart docs for removed/changed values. |
api/v1alpha1/clusterpolicy_types.go |
Removes DeviceTaints field from API type. |
.github/workflows/validate-public.yaml |
Runs validation workflow on release-* branches too. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+227
to
+236
| func waitUntilNamespaceGone(g Gomega) { | ||
| cmd := exec.Command("kubectl", "get", "namespace", "intel-gpu-base-operator", "--no-headers") | ||
| output, err := utils.Run(cmd) | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| lines := utils.GetNonEmptyLines(output) | ||
| g.Expect(lines).To(ContainElement("No resources found"), "Expected namespace intel-gpu-base-operator to be gone") | ||
| } |
Comment on lines
+522
to
+526
| cmd = exec.Command("helm", "uninstall", "-n", namespace, helmOperatorName) | ||
| _, err = utils.Run(cmd) | ||
| Expect(err).NotTo(HaveOccurred(), "Failed to uninstall operator") | ||
|
|
||
| Eventually(waitUntilNamespaceGone).Should(Succeed()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important ones: