Skip to content

CNTRLPLANE-2157: Migrate tests of Prometheus, Audit and TLS to OTE #31360

Open
YamunadeviShanmugam wants to merge 4 commits into
openshift:mainfrom
YamunadeviShanmugam:OTE_migration_audit_tests
Open

CNTRLPLANE-2157: Migrate tests of Prometheus, Audit and TLS to OTE #31360
YamunadeviShanmugam wants to merge 4 commits into
openshift:mainfrom
YamunadeviShanmugam:OTE_migration_audit_tests

Conversation

@YamunadeviShanmugam

@YamunadeviShanmugam YamunadeviShanmugam commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Results_apiserver_migration.txt
Migrate tests to OTE for the following tests

Note for the audit tests added:
The added audit tests verify server-side validation using --dry-run=server, which exercises the complete API server admission chain including CRD schemas, validating webhooks, CEL rules, and built-in validation logic to catch integration issues like misconfigured webhooks, incorrect CRD deployments, or schema bugs in production by testing the deployed API contract rather than code implementation, providing essential regression protection for multi-component validation logic spread across openshift/api, operators, and admission controllers.
Commands :
TLS
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should have valid TLS certificates for authentication and encryption between API server components [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should support adding a custom TLS certificate for the cluster API [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should force etcd encryption key rotation and verify resources are re-encrypted [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should self-recover when etcd encryption configuration secrets are deleted [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"

Prometheus
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have ExtremelyHighIndividualControlPlaneCPU alert rule defined with required annotations [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAPIErrorBudgetBurn alert rule defined with a severity label [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have AuditLogError alert rule defined and referencing audit metrics [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAggregatedAPIErrors alert rule defined with severity label [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAPIDown alert rule defined and not currently firing [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"

Audit
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should report the current audit profile as a known valid value [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject an invalid audit profile name in the APIServer configuration [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject a custom audit rule with an empty group name [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject a custom audit rule with an invalid profile name [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should accept all valid audit profile names via server-side dry-run [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should have audit log files present on master nodes [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should write audit log entries in valid JSON format with required fields [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-network][Feature:Network Policy Audit logging] when using openshift ovn-kubernetes should ensure acl logs are created and correct [apigroup:project.openshift.io][apigroup:network.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have AuditLogError alert rule defined and referencing audit metrics [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"

Summary by CodeRabbit

  • Tests
    • Extended operator TLS coverage to validate kubernetes.io/tls certificates and CA bundle ConfigMaps.
    • Added disruptive coverage for replacing the API server serving certificate, including trust updates and confirmation the original CA no longer validates the new leaf.
    • Added disruptive coverage for etcd encryption key rotation and encryption self-recovery, verifying migration completion after secret recreation.
    • Added API server audit configuration validation (valid/invalid profiles and custom rule edge cases) plus JSON audit log correctness checks.
    • Added Prometheus alert rule validation, including required annotations/labels and that the API-down alert is not firing when expected.
  • Chores
    • Reduced port-forward output logging to limit exposure of sensitive cluster metadata.

   Migrate audit tests to OTE
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 1, 2026
@openshift-ci-robot

openshift-ci-robot commented Jul 1, 2026

Copy link
Copy Markdown

@YamunadeviShanmugam: This pull request references CNTRLPLANE-2157 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Migrate tests to OTE for the following tests

Note for the audit tests added:
The added audit tests verify server-side validation using --dry-run=server, which exercises the complete API server admission chain including CRD schemas, validating webhooks, CEL rules, and built-in validation logic to catch integration issues like misconfigured webhooks, incorrect CRD deployments, or schema bugs in production by testing the deployed API contract rather than code implementation, providing essential regression protection for multi-component validation logic spread across openshift/api, operators, and admission controllers.
Commands :
TLS
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should have valid TLS certificates for authentication and encryption between API server components [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should support adding a custom TLS certificate for the cluster API [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should force etcd encryption key rotation and verify resources are re-encrypted [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Operators / Certs [OTP] should self-recover when etcd encryption configuration secrets are deleted [Disruptive][Slow][apigroup:config.openshift.io] [Serial]"

Prometheus
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have ExtremelyHighIndividualControlPlaneCPU alert rule defined with required annotations [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAPIErrorBudgetBurn alert rule defined with a severity label [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have AuditLogError alert rule defined and referencing audit metrics [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAggregatedAPIErrors alert rule defined with severity label [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have KubeAPIDown alert rule defined and not currently firing [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"

Audit
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should report the current audit profile as a known valid value [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject an invalid audit profile name in the APIServer configuration [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject a custom audit rule with an empty group name [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should reject a custom audit rule with an invalid profile name [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should accept all valid audit profile names via server-side dry-run [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should have audit log files present on master nodes [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-api-machinery] [Jira:apiserver-auth] Audit [OTP] should write audit log entries in valid JSON format with required fields [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-network][Feature:Network Policy Audit logging] when using openshift ovn-kubernetes should ensure acl logs are created and correct [apigroup:project.openshift.io][apigroup:network.openshift.io] [Suite:openshift/conformance/parallel]"
./openshift-tests run-test "[sig-instrumentation] [Jira:apiserver-auth] Prometheus Alerts [OTP] should have AuditLogError alert rule defined and referencing audit metrics [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 1, 2026
@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: YamunadeviShanmugam
Once this PR has been reviewed and has the lgtm label, please assign jogeo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds extended test coverage for apiserver TLS and encryption behavior, audit profile and log validation, and Prometheus alert-rule checks.

Changes

Apiserver TLS and encryption

Layer / File(s) Summary
Shared TLS and polling helpers
test/extended/apiserver/tls.go
Adds certificate parsing, endpoint inspection, operator polling, and related test support helpers.
TLS secret and CA bundle validation
test/extended/apiserver/tls.go
Adds tests that validate TLS secret certificate dates and CA bundle configmap contents.
Custom serving certificate patch
test/extended/apiserver/tls.go
Adds a disruptive test that swaps the API server serving certificate to a generated CA and server cert, verifies the leaf cert, and restores prior state.
Encryption key rotation
test/extended/apiserver/tls.go
Adds a disruptive test that enables encryption when needed, rotates keys, and waits for migration completion.
Encryption self-recovery
test/extended/apiserver/tls.go
Adds a disruptive test that deletes encryption secrets, waits for recreation, and confirms migration resumes.

Audit profile and log validation

Layer / File(s) Summary
Audit profile data
test/extended/cluster/audit.go
Adds accepted audit profile values used by the new audit checks.
Audit admission and log checks
test/extended/cluster/audit.go
Adds tests for audit profile admission, custom rule rejection, and kube-apiserver audit log JSON and field validation.

Prometheus alert rules

Layer / File(s) Summary
Alert rule suite
test/extended/prometheus/prometheus.go
Adds helpers and tests that verify required alert rule metadata and that KubeAPIDown is not firing.

Estimated code review effort: 4 (Complex) | ~60 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: migrating Prometheus, Audit, and TLS tests to OTE.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All new Ginkgo titles are literal static strings; dynamic values appear only in bodies/logs, not in test names.
Test Structure And Quality ✅ Passed New Ginkgo tests are single-purpose, use explicit poll timeouts and defer-based cleanup, and include contextual failure messages in repo-consistent style.
Microshift Test Compatibility ✅ Passed All new It() tests carry [apigroup:config.openshift.io], and the APIServer suite also skips MicroShift/HyperShift; no unguarded MicroShift-unsafe tests found.
Single Node Openshift (Sno) Test Compatibility ✅ Passed New TLS, Audit, and Prometheus tests use cluster APIs/monitoring only; audit node-logs checks require at least one master, not multiple nodes.
Topology-Aware Scheduling Compatibility ✅ Passed Only test files changed; no manifests, operator code, or controllers add scheduling constraints. Topology references are test skips/queries, not pod placement.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes were added; the only fmt.Printf is in a helper called from an It body, and no init/main/TestMain/BeforeSuite hooks exist in the touched files.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No hardcoded IPv4 literals or public internet calls were found; the new tests use cluster APIs/routes only.
No-Weak-Crypto ✅ Passed No banned weak-crypto primitives, custom crypto, or non-constant-time secret/token comparisons found in the touched files; TLS code uses standard x509/tls/OpenSSL cert tooling.
Container-Privileges ✅ Passed No added privileged settings were found; the only new PodSpec is a plain pause pod, and the diff contains none of the flagged fields.
No-Sensitive-Data-In-Logs ✅ Passed PASS: the diff removes port-forward output logging and adds no new logs of tokens, kubeconfig contents, or secret values; only harmless resource names/local ports are logged.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

   Migrate prometheus tests to OTE
@YamunadeviShanmugam YamunadeviShanmugam force-pushed the OTE_migration_audit_tests branch from 1664313 to 56a5da4 Compare July 1, 2026 06:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (8)
.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json (1)

1-470: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider if raw ProwJob JSONs should be committed to the repository.

This 470-line raw ProwJob JSON is being added as a committed file under .work/. These artifacts are typically generated during CI and may be better suited as:

  • Generated artifacts stored elsewhere (GCS/S3 as configured by gcs_configuration)
  • Local debugging files that should be .gitignored

If these are intended as permanent documentation of the failure analysis, consider whether a summarized excerpt or a script to fetch the ProwJob would be more maintainable than committing the full JSON. The pkg/riskanalysis/types.go struct only captures Name, so any tooling consuming this JSON would need to use a different type or unstructured parsing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json
around lines 1 - 470, This raw ProwJob JSON should not be committed under .work/
as a permanent tracked artifact. Remove the committed JSON and either ignore
generated ProwJob logs via .gitignore or move them to external CI storage; if
the intent is to keep failure references, add a small summary or fetch script
instead, and make sure any code in pkg/riskanalysis/types.go or related tooling
does not assume this full JSON shape.
.work/prow-job-analyze-test-failure/2071867594297577472/analysis.md (1)

18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add language specifier to fenced code block.

The fenced code block following the ### Error heading is missing a language tag. Per markdownlint MD040, specify a language for syntax highlighting.

-```
+```text
 the server could not find the requested resource (get clusterversions.config.openshift.io version)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.work/prow-job-analyze-test-failure/2071867594297577472/analysis.md at line
18, Add a language specifier to the fenced code block under the Error section so
it satisfies markdownlint MD040; update the markdown snippet that contains the
“the server could not find the requested resource” message to use a tagged fence
in the analysis.md content. Use the existing fenced block near the “### Error”
heading as the target and keep the text unchanged.
test/extended/cluster/audit.go (2)

250-320: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate master-node listing/HyperShift-skip logic across both tests.

Lines 255-262 and 283-290 repeat the same HyperShift skip check plus master-node listing and empty-check. Consider extracting a shared helper (e.g. getMasterNodesOrSkip(ctx, oc)) to reduce duplication and centralize the master-node selection logic for future maintenance.

♻️ Example helper extraction
func getMasterNodesOrSkip(ctx g.SpecContext, oc *exutil.CLI) []corev1.Node {
	if ok, _ := exutil.IsHypershift(ctx, oc.AdminConfigClient()); ok {
		g.Skip("HyperShift hosts the control plane externally; master node logs are not accessible via node-logs")
	}
	masters, err := oc.AsAdmin().KubeClient().CoreV1().Nodes().List(
		ctx, metav1.ListOptions{LabelSelector: "node-role.kubernetes.io/master"})
	o.Expect(err).NotTo(o.HaveOccurred())
	o.Expect(masters.Items).NotTo(o.BeEmpty(), "expected at least one master node")
	return masters.Items
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/cluster/audit.go` around lines 250 - 320, The two audit tests
duplicate the same HyperShift skip and master-node lookup logic, so centralize
it in a shared helper such as getMasterNodesOrSkip used by both g.It blocks.
Move the exutil.IsHypershift check, Nodes().List call, and empty-list
expectation into that helper, returning the master nodes for the audit-file and
audit-JSON checks to consume. Keep the test bodies focused on their specific
assertions and reference the existing oc, ctx, and master node selection logic
when wiring the helper in.

186-225: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Assert rejection reason, not just presence of an error.

The three admission-rejection tests only check err).To(o.HaveOccurred()). A transient CLI/network failure (not an actual server-side validation rejection) would also satisfy this assertion, masking a regression where the API server incorrectly accepts the invalid value. Consider asserting the error/output contains an expected substring (e.g. field name or "Invalid value") to confirm the rejection is due to the intended validation rule.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/cluster/audit.go` around lines 186 - 225, The three audit
admission tests in the cluster audit spec currently only assert that an error
occurred, which can pass for unrelated CLI or network failures. Update the
checks in the three g.It cases that patch apiserver cluster so they also verify
the returned error/output contains the expected validation message or
field-specific substring, using the same
oc.AsAdmin().WithoutNamespace().Run("patch") flow and the existing o.Expect
assertions to confirm the rejection is from server-side validation.
test/extended/apiserver/tls.go (1)

966-983: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Anchor the encryption-key name matcher.

The current unanchored patterns can match unintended names before extracting the suffix. Use anchored patterns like ^encryption-key-openshift-apiserver-(\d+)$ and parse the captured group.

As per path instructions, regexes should be anchored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 966 - 983, The encryption-key
name matcher is using an unanchored regex that can match partial names before
splitting the suffix. Update the matching logic around regexp.Compile and the
loop over strings.Fields(out) to use an anchored pattern such as
^encryption-key-openshift-apiserver-(\d+)$, then extract and parse the captured
numeric group directly instead of splitting the full name. Keep the existing
maxNum selection behavior, but ensure only exact key names are accepted.

Source: Path instructions

test/extended/prometheus/prometheus.go (3)

1268-1268: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Redundant type conversion.

promv1.AlertingRule.Query is already typed as string in client_golang's v1 package, so string(rule.Query) is a no-op conversion.

-			o.Expect(string(rule.Query)).To(o.MatchRegexp(`audit`),
+			o.Expect(rule.Query).To(o.MatchRegexp(`audit`),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/prometheus/prometheus.go` at line 1268, The test in the
Prometheus extended suite is doing a redundant string conversion on
AlertingRule.Query, since that field is already a string. Update the assertion
that references rule.Query to use it directly without wrapping it in string(),
keeping the existing regexp match logic unchanged.

1296-1329: 🩺 Stability & Availability | 🔵 Trivial

Consider run-isolation for the "not firing" assertion.

Asserting KubeAPIDown is not firing could be flaky if this suite runs alongside/after disruptive tests in the same conformance run (e.g., the etcd encryption key rotation/secret deletion and custom-cert tests added elsewhere in this PR stack, which can transiently affect apiserver availability). Worth confirming test ordering/isolation guarantees or documenting the assumption.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/prometheus/prometheus.go` around lines 1296 - 1329, The
KubeAPIDown “not currently firing” check in the Prometheus alert test may be
flaky when run alongside disruptive tests that can transiently affect apiserver
availability. Update the test around lookupAlertingRule,
wait.PollUntilContextTimeout, and the ALERTS query to either enforce stronger
isolation/order assumptions or gate the assertion so it only runs when the run
environment guarantees no concurrent disruptive operations. If no isolation
guarantee exists, add a clear skip/documentation path in this spec to avoid
false failures.

1203-1330: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Extract duplicated rule-lookup/severity-check logic into helpers.

The lookup-and-nil-check sequence is repeated identically across all five It blocks, and the severity-label presence/non-empty check is repeated verbatim three times. Consolidating reduces duplication and centralizes future changes to messages/behavior.

♻️ Example consolidation
func requireAlertingRule(ctx g.SpecContext, oc *exutil.CLI, alertName string) *promv1.AlertingRule {
	prometheusURL, bearerToken := mustGetPrometheusURLAndToken(ctx, oc)
	g.By(fmt.Sprintf("looking for alert rule %q", alertName))
	rule := lookupAlertingRule(prometheusURL, bearerToken, alertName)
	o.Expect(rule).NotTo(o.BeNil(), "alert rule %q was not found in Prometheus", alertName)
	return rule
}

func expectSeverityLabel(rule *promv1.AlertingRule, alertName string) string {
	severity, ok := rule.Labels["severity"]
	o.Expect(ok).To(o.BeTrue(), "alert rule %q is missing 'severity' label", alertName)
	o.Expect(string(severity)).NotTo(o.BeEmpty(), "alert rule %q has an empty 'severity' label", alertName)
	return string(severity)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/prometheus/prometheus.go` around lines 1203 - 1330, The alert
tests repeat the same rule lookup, nil check, and severity-label validation
across multiple `g.It` cases in `Prometheus Alerts`, so extract that shared
behavior into helpers. Create a reusable helper around
`mustGetPrometheusURLAndToken`/`lookupAlertingRule` that returns a verified rule
for a given alert name, and another helper for the repeated severity
presence/non-empty check using `rule.Labels["severity"]`. Update the existing
`It` blocks to call those helpers instead of duplicating the logic, keeping the
per-alert assertions like `summary`, `description`, `Query`, and `KubeAPIDown`’s
`critical` check in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 701-744: The recreated secret UID check in the polling block is
order-dependent because the jsonpath output loses secret names and the
comparison in the wait.PollUntilContextTimeout callback relies on slice
positions. Update the logic around the secret retrieval and UID comparison to
parse the returned data into a name-to-UID mapping, then compare the new UIDs
against the originals by secret name instead of index. Keep the fix localized to
the recreated encryption-config secret validation in this test helper so the
stability check no longer flakes on output ordering.
- Around line 398-408: Cleanup in the deferred restore path is ignoring failures
and still depends on the spec context, so restore may be skipped on
cancellation. Update the deferred cleanup around the cluster restore logic in
the test to run under a detached context created with context.WithoutCancel(ctx)
plus an explicit timeout, and make the cleanup operations fail loudly by
logging/asserting errors from the patch, kubeconfig copy, and restoreCluster
steps instead of discarding them. Apply the same cleanup pattern to the other
deferred restore blocks referenced in this test file, using the existing
restoreCluster and oc.AsAdmin().WithoutNamespace().Run("patch") flow as the
anchor points.
- Around line 823-839: The certificate validation helper currently only logs
parse failures and still succeeds when no certificates were parsed. Update the
logic around x509.ParseCertificate and the final count check so the routine
fails if count remains zero after scanning all PEM blocks, using the existing
e2e.Logf and o.Expect flow in this helper to report malformed or empty
certificate data.
- Around line 622-627: Wait for both encryption migrations before asserting the
route prefix: the current test only calls waitEncryptionKeyMigration for
newKASEncSecretName, so the route prefix check can still race
openshift-apiserver. Update the test flow in tls.go to also wait on the
OpenShift API server migration using the relevant migration helper for
openshift-apiserver, and keep the existing kube-apiserver wait plus the
completed checks before proceeding to the route prefix assertion.
- Around line 854-858: Make getServerCertInfo use the provided test context for
the TLS probe so a stalled handshake can be canceled with the spec timeout.
Update the tls.Dial call to use a tls.Dialer with DialContext and thread ctx
through the connection setup, and make sure the helper’s call site still passes
the context. Also verify the return value from pool.AppendCertsFromPEM in
getServerCertInfo and fail fast if the CA bundle cannot be parsed.
- Around line 404-405: The test setup in apiserver/tls.go is shelling out
through bash -c for copy and rewrite operations, which is unsafe and brittle
when using originKubeconfig from KUBECONFIG. Update the logic around the cp/sed
flow to use Go file APIs for copying and editing the kubeconfig, and where a
command is still needed, pass explicit args directly via exec.Command instead of
constructing a shell string. Keep the change localized to the setup helpers that
manipulate originKubeconfig and originKubeconfigBkp.

---

Nitpick comments:
In @.work/prow-job-analyze-test-failure/2071867594297577472/analysis.md:
- Line 18: Add a language specifier to the fenced code block under the Error
section so it satisfies markdownlint MD040; update the markdown snippet that
contains the “the server could not find the requested resource” message to use a
tagged fence in the analysis.md content. Use the existing fenced block near the
“### Error” heading as the target and keep the text unchanged.

In @.work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json:
- Around line 1-470: This raw ProwJob JSON should not be committed under .work/
as a permanent tracked artifact. Remove the committed JSON and either ignore
generated ProwJob logs via .gitignore or move them to external CI storage; if
the intent is to keep failure references, add a small summary or fetch script
instead, and make sure any code in pkg/riskanalysis/types.go or related tooling
does not assume this full JSON shape.

In `@test/extended/apiserver/tls.go`:
- Around line 966-983: The encryption-key name matcher is using an unanchored
regex that can match partial names before splitting the suffix. Update the
matching logic around regexp.Compile and the loop over strings.Fields(out) to
use an anchored pattern such as ^encryption-key-openshift-apiserver-(\d+)$, then
extract and parse the captured numeric group directly instead of splitting the
full name. Keep the existing maxNum selection behavior, but ensure only exact
key names are accepted.

In `@test/extended/cluster/audit.go`:
- Around line 250-320: The two audit tests duplicate the same HyperShift skip
and master-node lookup logic, so centralize it in a shared helper such as
getMasterNodesOrSkip used by both g.It blocks. Move the exutil.IsHypershift
check, Nodes().List call, and empty-list expectation into that helper, returning
the master nodes for the audit-file and audit-JSON checks to consume. Keep the
test bodies focused on their specific assertions and reference the existing oc,
ctx, and master node selection logic when wiring the helper in.
- Around line 186-225: The three audit admission tests in the cluster audit spec
currently only assert that an error occurred, which can pass for unrelated CLI
or network failures. Update the checks in the three g.It cases that patch
apiserver cluster so they also verify the returned error/output contains the
expected validation message or field-specific substring, using the same
oc.AsAdmin().WithoutNamespace().Run("patch") flow and the existing o.Expect
assertions to confirm the rejection is from server-side validation.

In `@test/extended/prometheus/prometheus.go`:
- Line 1268: The test in the Prometheus extended suite is doing a redundant
string conversion on AlertingRule.Query, since that field is already a string.
Update the assertion that references rule.Query to use it directly without
wrapping it in string(), keeping the existing regexp match logic unchanged.
- Around line 1296-1329: The KubeAPIDown “not currently firing” check in the
Prometheus alert test may be flaky when run alongside disruptive tests that can
transiently affect apiserver availability. Update the test around
lookupAlertingRule, wait.PollUntilContextTimeout, and the ALERTS query to either
enforce stronger isolation/order assumptions or gate the assertion so it only
runs when the run environment guarantees no concurrent disruptive operations. If
no isolation guarantee exists, add a clear skip/documentation path in this spec
to avoid false failures.
- Around line 1203-1330: The alert tests repeat the same rule lookup, nil check,
and severity-label validation across multiple `g.It` cases in `Prometheus
Alerts`, so extract that shared behavior into helpers. Create a reusable helper
around `mustGetPrometheusURLAndToken`/`lookupAlertingRule` that returns a
verified rule for a given alert name, and another helper for the repeated
severity presence/non-empty check using `rule.Labels["severity"]`. Update the
existing `It` blocks to call those helpers instead of duplicating the logic,
keeping the per-alert assertions like `summary`, `description`, `Query`, and
`KubeAPIDown`’s `critical` check in place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: dac4f419-66f8-4232-ab51-a4d61629f023

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf37da and 1664313.

📒 Files selected for processing (7)
  • .work/prow-job-analyze-test-failure/2071867594297577472/analysis.md
  • .work/prow-job-analyze-test-failure/2071867594297577472/logs/build-log.txt
  • .work/prow-job-analyze-test-failure/2071867594297577472/logs/junit_e2e.xml
  • .work/prow-job-analyze-test-failure/2071867594297577472/logs/prowjob.json
  • test/extended/apiserver/tls.go
  • test/extended/cluster/audit.go
  • test/extended/prometheus/prometheus.go

Comment thread test/extended/apiserver/tls.go Outdated
Comment on lines +398 to +408
defer oc.AsAdmin().WithoutNamespace().Run("delete").Args(
"secret", "custom-api-cert", "-n", "openshift-config", "--ignore-not-found").Execute()
defer func() {
g.By("restoring cluster to original state")
_, _ = oc.AsAdmin().WithoutNamespace().Run("patch").Args(
"apiserver/cluster", "--type=merge", "-p", patchToRecover).Output()
if _, cpErr := exec.Command("bash", "-c",
fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig)).Output(); cpErr != nil {
e2e.Logf("warning: failed to restore kubeconfig: %v", cpErr)
}
restoreCluster()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle cleanup failures and bound deferred cleanup.

These disruptive tests restore cluster-wide API server/encryption state, but cleanup errors are ignored. Please log/assert cleanup failures and use a context.WithoutCancel(ctx) + timeout context for deferred cleanup paths so cancellation does not skip restoration.

As per path instructions, Go code should never ignore error returns. Based on learnings, deferred cleanup in test/extended/ should detach cancellation from the spec context and apply an explicit timeout.

Also applies to: 553-559, 588-592, 690-696

🧰 Tools
🪛 ast-grep (0.44.0)

[error] 403-404: A shell (sh/bash) is invoked with -c and a dynamically built command string (string concatenation, fmt.Sprintf, or a variable holding the command) passed to exec.Command / exec.CommandContext. Untrusted input embedded in the command lets an attacker inject arbitrary shell commands. Avoid the shell: call the target binary directly with exec.Command(name, arg1, arg2, ...) so each argument is passed as a separate, non-interpreted token, and never build a shell command string from external input.
Context: exec.Command("bash", "-c",
fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig))
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(command-injection-exec-sh-c-go)


[error] 403-404: An argument to exec.Command/exec.CommandContext is built with fmt.Sprintf that interpolates a non-constant value. When the command is run through a shell (e.g. sh -c) this allows command injection. Pass untrusted input as a separate, explicit argument to exec.Command (which does not invoke a shell) instead of formatting it into the command string, and validate or allowlist any values that must be embedded.
Context: exec.Command("bash", "-c",
fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig))
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(command-injection-exec-sprintf-arg-go)

🪛 OpenGrep (1.23.0)

[ERROR] 404-405: Dynamic command passed to exec.Command with a shell invocation. Pass arguments directly to exec.Command without a shell wrapper.

(coderabbit.command-injection.go-exec-command)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 398 - 408, Cleanup in the
deferred restore path is ignoring failures and still depends on the spec
context, so restore may be skipped on cancellation. Update the deferred cleanup
around the cluster restore logic in the test to run under a detached context
created with context.WithoutCancel(ctx) plus an explicit timeout, and make the
cleanup operations fail loudly by logging/asserting errors from the patch,
kubeconfig copy, and restoreCluster steps instead of discarding them. Apply the
same cleanup pattern to the other deferred restore blocks referenced in this
test file, using the existing restoreCluster and
oc.AsAdmin().WithoutNamespace().Run("patch") flow as the anchor points.

Sources: Path instructions, Learnings

Comment on lines +404 to +405
if _, cpErr := exec.Command("bash", "-c",
fmt.Sprintf("cp %s %s", originKubeconfigBkp, originKubeconfig)).Output(); cpErr != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify shell-wrapped commands remain in the changed TLS test.
rg -n 'exec\.Command\("bash",\s*"-c"' test/extended/apiserver/tls.go

Repository: openshift/origin

Length of output: 478


🏁 Script executed:

#!/bin/bash
sed -n '380,490p' test/extended/apiserver/tls.go

Repository: openshift/origin

Length of output: 5602


🏁 Script executed:

python3 - <<'PY'
import base64, subprocess, tempfile, os, textwrap

data = b"-----BEGIN CERTIFICATE-----\n" + bytes(range(1, 64)) * 4 + b"\n-----END CERTIFICATE-----\n"
b64 = base64.b64encode(data).decode()

print("base64 contains '/':", '/' in b64)
cmd = f'sed -i "s/certificate-authority-data: .*/certificate-authority-data: {b64}/" /tmp/nonexistent'
print("sample sed command:", cmd[:120] + "...")
p = subprocess.run(["bash", "-lc", cmd], capture_output=True, text=True)
print("exit:", p.returncode)
print("stderr:", p.stderr.strip())
PY

Repository: openshift/origin

Length of output: 500


🏁 Script executed:

rg -n 'originKubeconfig|originKubeconfigBkp|updateKubeconfCmd|caCmd|bash", "-c"|cp %s %s|sed -i' test/extended/apiserver/tls.go

Repository: openshift/origin

Length of output: 1081


Avoid bash -c for these test commands. originKubeconfig comes from KUBECONFIG, so the cp/sed command strings depend on external input, and the sed rewrite is fragile for base64 payloads. Use direct Go file APIs and explicit command args instead.

🧰 Tools
🪛 OpenGrep (1.23.0)

[ERROR] 404-405: Dynamic command passed to exec.Command with a shell invocation. Pass arguments directly to exec.Command without a shell wrapper.

(coderabbit.command-injection.go-exec-command)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 404 - 405, The test setup in
apiserver/tls.go is shelling out through bash -c for copy and rewrite
operations, which is unsafe and brittle when using originKubeconfig from
KUBECONFIG. Update the logic around the cp/sed flow to use Go file APIs for
copying and editing the kubeconfig, and where a command is still needed, pass
explicit args directly via exec.Command instead of constructing a shell string.
Keep the change localized to the setup helpers that manipulate originKubeconfig
and originKubeconfigBkp.

Sources: Path instructions, Linters/SAST tools

Comment thread test/extended/apiserver/tls.go
Comment thread test/extended/apiserver/tls.go Outdated
Comment on lines +701 to +744
uidsOld, err := oc.WithoutNamespace().Run("get").Args(
"secret",
"encryption-config-openshift-apiserver",
"encryption-config-openshift-kube-apiserver",
"-n", "openshift-config-managed",
"-o=jsonpath={.items[*].metadata.uid}",
).Output()
o.Expect(err).NotTo(o.HaveOccurred())

g.By("2. delete encryption-config-* secrets from openshift-config-managed")
for _, item := range []string{
"encryption-config-openshift-apiserver",
"encryption-config-openshift-kube-apiserver",
} {
e2e.Logf("removing finalizers from secret %s", item)
err := oc.WithoutNamespace().Run("patch").Args(
"secret", item, "-n", "openshift-config-managed",
`-p={"metadata":{"finalizers":null}}`).Execute()
o.Expect(err).NotTo(o.HaveOccurred())

e2e.Logf("deleting secret %s", item)
err = oc.WithoutNamespace().Run("delete").Args(
"secret", item, "-n", "openshift-config-managed").Execute()
o.Expect(err).NotTo(o.HaveOccurred())
}

uidsOldSlice := strings.Split(uidsOld, " ")
e2e.Logf("original secret UIDs: %v", uidsOldSlice)
errSecret := wait.PollUntilContextTimeout(ctx, 3*time.Second, 60*time.Second, false,
func(pollCtx context.Context) (bool, error) {
uidsNew, err := oc.WithoutNamespace().Run("get").Args(
"secret",
"encryption-config-openshift-apiserver",
"encryption-config-openshift-kube-apiserver",
"-n", "openshift-config-managed",
"-o=jsonpath={.items[*].metadata.uid}",
).Output()
if err != nil {
e2e.Logf("encryption-config-* secrets not yet recreated: %v — retrying", err)
return false, nil
}
uidsNewSlice := strings.Split(uidsNew, " ")
e2e.Logf("new secret UIDs: %v", uidsNewSlice)
if len(uidsNewSlice) >= 2 && uidsNewSlice[0] != uidsOldSlice[0] && uidsNewSlice[1] != uidsOldSlice[1] {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Compare recreated secret UIDs by name, not slice position.

The jsonpath output drops secret names, then Line 744 assumes positional ordering is stable. Parse JSON and compare metadata.uid keyed by metadata.name to avoid order-dependent flakes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 701 - 744, The recreated secret
UID check in the polling block is order-dependent because the jsonpath output
loses secret names and the comparison in the wait.PollUntilContextTimeout
callback relies on slice positions. Update the logic around the secret retrieval
and UID comparison to parse the returned data into a name-to-UID mapping, then
compare the new UIDs against the originals by secret name instead of index. Keep
the fix localized to the recreated encryption-config secret validation in this
test helper so the stability check no longer flakes on output ordering.

Comment on lines +823 to +839
cert, err := x509.ParseCertificate(block.Bytes)
if err != nil {
e2e.Logf(" [%s] failed to parse certificate block: %v", source, err)
continue
}
count++
now := time.Now()
e2e.Logf(" [%s] cert #%d CN=%q NotBefore=%s NotAfter=%s",
source, count, cert.Subject.CommonName,
cert.NotBefore.Format(time.RFC3339), cert.NotAfter.Format(time.RFC3339))
o.Expect(cert.NotAfter.After(now)).To(o.BeTrue(),
"certificate %q from %s has expired (NotAfter=%s)", cert.Subject.CommonName, source, cert.NotAfter)
o.Expect(cert.NotBefore.Before(now)).To(o.BeTrue(),
"certificate %q from %s is not yet valid (NotBefore=%s)", cert.Subject.CommonName, source, cert.NotBefore)
}
e2e.Logf(" [%s] checked %d certificates", source, count)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fail when certificate parsing finds no valid certs.

Line 825 only logs parse failures, and Line 838 allows count == 0, so malformed or empty certificate data can pass this validation.

Proposed fix
 		cert, err := x509.ParseCertificate(block.Bytes)
-		if err != nil {
-			e2e.Logf("  [%s] failed to parse certificate block: %v", source, err)
-			continue
-		}
+		o.Expect(err).NotTo(o.HaveOccurred(), "failed to parse certificate block from %s", source)
 		count++
@@
 	}
+	o.Expect(count).To(o.BeNumerically(">", 0), "no PEM certificates found in %s", source)
 	e2e.Logf("  [%s] checked %d certificates", source, count)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cert, err := x509.ParseCertificate(block.Bytes)
if err != nil {
e2e.Logf(" [%s] failed to parse certificate block: %v", source, err)
continue
}
count++
now := time.Now()
e2e.Logf(" [%s] cert #%d CN=%q NotBefore=%s NotAfter=%s",
source, count, cert.Subject.CommonName,
cert.NotBefore.Format(time.RFC3339), cert.NotAfter.Format(time.RFC3339))
o.Expect(cert.NotAfter.After(now)).To(o.BeTrue(),
"certificate %q from %s has expired (NotAfter=%s)", cert.Subject.CommonName, source, cert.NotAfter)
o.Expect(cert.NotBefore.Before(now)).To(o.BeTrue(),
"certificate %q from %s is not yet valid (NotBefore=%s)", cert.Subject.CommonName, source, cert.NotBefore)
}
e2e.Logf(" [%s] checked %d certificates", source, count)
}
cert, err := x509.ParseCertificate(block.Bytes)
o.Expect(err).NotTo(o.HaveOccurred(), "failed to parse certificate block from %s", source)
count++
now := time.Now()
e2e.Logf(" [%s] cert #%d CN=%q NotBefore=%s NotAfter=%s",
source, count, cert.Subject.CommonName,
cert.NotBefore.Format(time.RFC3339), cert.NotAfter.Format(time.RFC3339))
o.Expect(cert.NotAfter.After(now)).To(o.BeTrue(),
"certificate %q from %s has expired (NotAfter=%s)", cert.Subject.CommonName, source, cert.NotAfter)
o.Expect(cert.NotBefore.Before(now)).To(o.BeTrue(),
"certificate %q from %s is not yet valid (NotBefore=%s)", cert.Subject.CommonName, source, cert.NotBefore)
}
o.Expect(count).To(o.BeNumerically(">", 0), "no PEM certificates found in %s", source)
e2e.Logf(" [%s] checked %d certificates", source, count)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 823 - 839, The certificate
validation helper currently only logs parse failures and still succeeds when no
certificates were parsed. Update the logic around x509.ParseCertificate and the
final count check so the routine fails if count remains zero after scanning all
PEM blocks, using the existing e2e.Logf and o.Expect flow in this helper to
report malformed or empty certificate data.

Comment on lines +854 to +858
pool := x509.NewCertPool()
pool.AppendCertsFromPEM(caData)

conn, err := tls.Dial("tcp", fmt.Sprintf("%s:%s", fqdn, port),
&tls.Config{RootCAs: pool, ServerName: fqdn})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the helper around the cited lines
sed -n '820,900p' test/extended/apiserver/tls.go

# Find the helper signature and its callers
rg -n "func .*getServerCertInfo|getServerCertInfo\(" test/extended/apiserver/tls.go test/extended/apiserver -S

# Check whether this file already uses context-aware dialing elsewhere
rg -n "DialContext|tls\.Dialer|context\.Context|WithTimeout|Deadline" test/extended/apiserver/tls.go test/extended/apiserver -S

Repository: openshift/origin

Length of output: 7087


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the call sites with surrounding context
sed -n '470,515p' test/extended/apiserver/tls.go

Repository: openshift/origin

Length of output: 2749


Make the TLS probe context-aware. getServerCertInfo ignores the available test context, so a stuck TCP/TLS handshake can outlive the spec timeout. Pass ctx through and use tls.Dialer.DialContext; also check the return from pool.AppendCertsFromPEM so a bad CA bundle fails fast.

🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 857-857: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{RootCAs: pool, ServerName: fqdn}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures

(missing-ssl-minversion-go)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 854 - 858, Make
getServerCertInfo use the provided test context for the TLS probe so a stalled
handshake can be canceled with the spec timeout. Update the tls.Dial call to use
a tls.Dialer with DialContext and thread ctx through the connection setup, and
make sure the helper’s call site still passes the context. Also verify the
return value from pool.AppendCertsFromPEM in getServerCertInfo and fail fast if
the CA bundle cannot be parsed.

Source: Path instructions

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 296-355: The TLS/CA validation in tls.go is currently only logging
and continuing on decode/read/parse failures, which lets broken inputs pass
unnoticed. In the secret loop and the ConfigMap handling path, replace the
silent Logf-then-continue behavior with assertions that fail the test when
base64 decode, JSON unmarshal, or configmap retrieval fails, and use
parseAndCheckPEMs only after successful validation. Also update the fallback key
scan logic to assert that a matching CA/crt key was actually found before
proceeding, using the existing secret/configmap iteration code and
parseAndCheckPEMs as the main validation entry point.
- Around line 419-423: The kubeconfig/CA extraction in the TLS test still shells
out through bash -c using caCmd, which is brittle and injection-prone. Update
the apiserver TLS helper logic in tls.go to parse the kubeconfig directly in Go
instead of using grep/base64, and replace the later OpenSSL command construction
with explicit exec.CommandContext arguments or crypto/x509-based certificate
handling. Keep the changes localized around the kubeconfig/OpenSSL helper paths
so the test setup no longer depends on shell pipelines.
- Around line 907-922: The condition check in the ClusterOperator polling logic
is incorrectly treating missing requested conditions as success. In the
condition loop that builds on co.Status.Conditions, add an observed-condition
map in the relevant check function and verify every requested key from
conditions is present before returning true; if any requested condition is
absent or mismatched, keep returning false and preserve the existing throttled
logging behavior. Focus the fix around the condition validation path that
currently iterates over co.Status.Conditions and the conditions lookup.
- Around line 971-985: The key-scanning logic in the function that parses `out`
and returns `maxNum` currently falls back to `0, nil` when no matching
encryption key is found, which masks setup failures. Update this function to
detect the no-match case after iterating over `strings.Fields(out)` and return a
non-nil error instead of zero; keep the existing parsing behavior for valid
matches, but make sure callers can distinguish “no key found” from a real key
numbered zero.
- Around line 615-618: Do not ignore the diagnostic output from the two oc get
secret calls in tls.go; instead of assigning both Execute() results to
underscore, capture and log any errors/output so failures are visible when the
wait path is hit. Update the diagnostic helper around
WithoutNamespace().Run("get").Args(...) to surface the results for both the
openshift-apiserver and openshift-kube-apiserver secret lookups, preserving the
existing namespace and label selectors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ff3e2265-e1e3-49bd-b2d9-e784e5b3dcf9

📥 Commits

Reviewing files that changed from the base of the PR and between 1664313 and 56a5da4.

📒 Files selected for processing (2)
  • test/extended/apiserver/tls.go
  • test/extended/prometheus/prometheus.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/prometheus/prometheus.go

Comment thread test/extended/apiserver/tls.go
Comment thread test/extended/apiserver/tls.go Outdated
Comment thread test/extended/apiserver/tls.go Outdated
Comment thread test/extended/apiserver/tls.go
Comment thread test/extended/apiserver/tls.go
@YamunadeviShanmugam YamunadeviShanmugam force-pushed the OTE_migration_audit_tests branch from 56a5da4 to 9f22025 Compare July 1, 2026 10:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/extended/apiserver/tls.go (1)

443-480: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Optional: switch the test certs to a short-lived P-256 key. openssl genrsa plus -days 100000 is unnecessary for this temporary secret, and a shorter-lived cert keeps the fixture aligned with current crypto guidance if this path accepts it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 443 - 480, The TLS test
certificate generation in the openssl setup still uses long-lived RSA keys and
an excessively large validity window; update the certificate fixture in the
openssl command sequence to use a short-lived P-256 key/cert path if supported.
Keep the changes localized to the certificate creation flow that builds the CA,
server key, CSR, and signed server cert, and preserve the existing SAN/subject
handling in the same helper.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 630-632: The polling in the TLS spec should stay bound to the
current spec context instead of starting from context.Background(). Update the
polling setup around the context.WithTimeout calls in the test helpers to derive
from the existing ctx so Ginkgo cancellation still stops the waits, while
preserving the explicit timeout; apply the same change in the related poll
blocks referenced in the TLS test flow.
- Around line 374-405: The cleanup in restoreCluster and the deferred patching
around apiserver/cluster currently wipes serving cert configuration by always
restoring namedCertificates to null and deleting a fixed custom-api-cert secret.
Update the test setup to snapshot the original apiserver/cluster
namedCertificates state before modifying it, then restore that exact value in
the deferred patch instead of hardcoding null. Also change the secret handling
to use a per-test secret name derived from this test’s context and only delete
that secret, so existing cluster config and pre-existing secrets are not
touched.
- Around line 601-604: The regexes passed to getEncryptionKeyNumber are too
loose and can match unintended substrings or similarly named secrets; update the
encryption-key pattern matching in tls.go to use anchored numeric-only regexes
so only the exact expected secret names are selected. Apply the same anchored
pattern fix at each call site that uses getEncryptionKeyNumber, including the
other matching blocks referenced in this test file, and keep the change
localized to the regex arguments rather than the helper itself.

---

Nitpick comments:
In `@test/extended/apiserver/tls.go`:
- Around line 443-480: The TLS test certificate generation in the openssl setup
still uses long-lived RSA keys and an excessively large validity window; update
the certificate fixture in the openssl command sequence to use a short-lived
P-256 key/cert path if supported. Keep the changes localized to the certificate
creation flow that builds the CA, server key, CSR, and signed server cert, and
preserve the existing SAN/subject handling in the same helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1dab033a-d54d-451c-90a5-de3f069fe630

📥 Commits

Reviewing files that changed from the base of the PR and between 56a5da4 and 9f22025.

📒 Files selected for processing (1)
  • test/extended/apiserver/tls.go

Comment thread test/extended/apiserver/tls.go Outdated
Comment thread test/extended/apiserver/tls.go Outdated
Comment on lines +601 to +604
oasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-apiserver-[^ ]*`)
o.Expect(err).NotTo(o.HaveOccurred())
kasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-kube-apiserver-[^ ]*`)
o.Expect(err).NotTo(o.HaveOccurred())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Anchor encryption-key name matching.

The regexes currently match substrings and arbitrary suffixes. Use anchored numeric patterns so similarly named secrets cannot be selected accidentally.

Suggested fix
-			oasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-apiserver-[^ ]*`)
+			oasEncNumberBefore, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-apiserver-[0-9]+$`)
...
-			kasEncNumberBefore, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-kube-apiserver-[^ ]*`)
+			kasEncNumberBefore, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-kube-apiserver-[0-9]+$`)
...
-			oasEncNumber, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-apiserver-[^ ]*`)
+			oasEncNumber, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-apiserver-[0-9]+$`)
...
-			kasEncNumber, err := getEncryptionKeyNumber(oc, `encryption-key-openshift-kube-apiserver-[^ ]*`)
+			kasEncNumber, err := getEncryptionKeyNumber(oc, `^encryption-key-openshift-kube-apiserver-[0-9]+$`)

As per path instructions, regexes should be anchored.

Also applies to: 826-829, 1057-1065

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 601 - 604, The regexes passed to
getEncryptionKeyNumber are too loose and can match unintended substrings or
similarly named secrets; update the encryption-key pattern matching in tls.go to
use anchored numeric-only regexes so only the exact expected secret names are
selected. Apply the same anchored pattern fix at each call site that uses
getEncryptionKeyNumber, including the other matching blocks referenced in this
test file, and keep the change localized to the regex arguments rather than the
helper itself.

Source: Path instructions

Comment thread test/extended/apiserver/tls.go Outdated
@YamunadeviShanmugam YamunadeviShanmugam force-pushed the OTE_migration_audit_tests branch from 9f22025 to 91037b8 Compare July 2, 2026 06:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/extended/apiserver/tls.go (1)

374-374: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Use nanosecond entropy for the temporary secret name.

time.Now().Unix() can collide when disruptive tests are retried or run in parallel within the same second, causing oc create secret failures or cross-test cleanup interference.

Suggested fix
-			testSecretName := fmt.Sprintf("custom-api-cert-test-%d", time.Now().Unix())
+			testSecretName := fmt.Sprintf("custom-api-cert-test-%d", time.Now().UnixNano())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` at line 374, The temporary secret name in the
tls test setup uses second-level timestamps and can collide across parallel or
retried runs; update the name generation in the code path that assigns
testSecretName to use nanosecond-level entropy instead of time.Now().Unix().
Keep the change local to the temporary secret naming logic so the secret created
for oc create secret is unique per test invocation and does not interfere with
cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@test/extended/apiserver/tls.go`:
- Line 374: The temporary secret name in the tls test setup uses second-level
timestamps and can collide across parallel or retried runs; update the name
generation in the code path that assigns testSecretName to use nanosecond-level
entropy instead of time.Now().Unix(). Keep the change local to the temporary
secret naming logic so the secret created for oc create secret is unique per
test invocation and does not interfere with cleanup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e86831f6-ace6-41b6-a54b-c8af61691e6c

📥 Commits

Reviewing files that changed from the base of the PR and between 9f22025 and 91037b8.

📒 Files selected for processing (1)
  • test/extended/apiserver/tls.go

   Migrate etcd encryption tests to OTE
@YamunadeviShanmugam YamunadeviShanmugam force-pushed the OTE_migration_audit_tests branch from 91037b8 to 5943904 Compare July 2, 2026 07:26

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
test/extended/apiserver/tls.go (8)

913-928: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fail when certificate parsing finds no valid certs.

ParseCertificate failures are still only logged, and count == 0 still succeeds. Assert parse success for certificate blocks and assert count > 0. This was already flagged in the previous review.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 913 - 928, The
certificate-checking logic in the helper that iterates parsed PEM blocks still
lets all parse failures pass and succeeds when no valid certificates are found.
Update the flow around x509.ParseCertificate so parsing failures are asserted
instead of only logged, and add an explicit expectation that the certificate
counter is greater than zero before returning; use the existing loop and count
handling in the tls.go certificate verification helper to locate the fix.

620-622: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Anchor encryption-key name matching.

These patterns still match substrings and non-numeric suffixes. Use ^encryption-key-openshift-apiserver-[0-9]+$ and ^encryption-key-openshift-kube-apiserver-[0-9]+$. This was already flagged in the previous review. As per path instructions, regexes should be anchored.

Also applies to: 654-655, 834-837

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 620 - 622, The regexes used in
getEncryptionKeyNumber matching are still too loose because they can match
substrings and non-numeric suffixes; update the encryption-key name patterns in
the tls.go test helpers to use anchored numeric-only matches. Specifically,
wherever the test calls getEncryptionKeyNumber for openshift-apiserver and
openshift-kube-apiserver keys, replace the current patterns with anchored forms
like ^encryption-key-openshift-apiserver-[0-9]+$ and
^encryption-key-openshift-kube-apiserver-[0-9]+$ so the matches are exact.

Source: Path instructions


311-315: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fail on malformed TLS secrets instead of skipping them.

A kubernetes.io/tls secret without tls.crt is a broken input for this certificate-validation test, so this should assert rather than Logf and continue. This is the same validation gap already flagged in the earlier review.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 311 - 315, The TLS secret
validation in the certificate test currently skips over malformed
kubernetes.io/tls secrets when tls.crt is missing, but this should fail the test
instead. Update the logic in the secret-processing path around rawcrt handling
to assert or require tls.crt rather than using e2e.Logf and continue, so broken
inputs are reported immediately. Keep the existing validation flow in the tls.go
test helper, but make malformed TLS secrets a hard failure instead of a skipped
case.

420-421: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Avoid shelling out for kubeconfig copy/rewrite.

originKubeconfig is environment-derived, and the sed replacement is brittle for base64 payloads. Use Go file APIs to copy and rewrite the kubeconfig instead. This was already flagged in the previous review.

Also applies to: 524-528

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 420 - 421, The kubeconfig
backup/copy logic in the tls test helpers still shells out via
exec.Command("bash", "-c", ...) and relies on brittle text replacement, which
should be removed. Update the affected helper code around originKubeconfig
handling to use Go file APIs for copying and rewriting the kubeconfig contents
directly, and apply the same change in the later duplicate block referenced in
the comment. Keep the fix localized to the kubeconfig copy/rewrite flow so the
behavior stays the same without using bash, cp, or sed-like replacement.

Sources: Path instructions, Linters/SAST tools


416-430: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle cleanup failures and bound cleanup contexts.

These disruptive cleanup paths still discard restore errors and are not using a detached timeout context, so failed cleanup can leave cluster-wide API server/encryption state modified. This was already flagged in the previous review.

Also applies to: 599-605, 636-640, 762-768, 1183-1188

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 416 - 430, Cleanup in the
TLS/apiserver test teardown is still ignoring restore failures and using the
wrong context lifetime, which can leave cluster state partially modified. Update
the deferred cleanup in the apiserver TLS test helpers around the restore logic
in the same cleanup block to capture and surface patch/restore errors instead of
discarding them, and run the cleanup operations under a detached timeout context
rather than the parent test context. Apply the same fix to the other matching
cleanup blocks referenced by the same teardown pattern so all restore paths in
the TLS test suite handle failures consistently.

Sources: Path instructions, Learnings


693-699: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Wait for the OpenShift API server migration too.

The test rotates both components but still waits only for newKASEncSecretName; the route-prefix assertion can race the openshift-apiserver migration. This was already flagged in the previous review.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 693 - 699, The migration wait in
the TLS test only covers the kube-apiserver secret, so the route-prefix check
can run before the openshift-apiserver migration finishes. Update the logic
around waitEncryptionKeyMigration in the tls.go test to also wait for the
OpenShift API server encryption migration using the corresponding
openshift-apiserver secret name before continuing to the assertion.

939-948: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the TLS probe context-aware and validate the CA pool.

getServerCertInfo still ignores the spec context and does not check AppendCertsFromPEM, so a stalled handshake can outlive cancellation and bad CA data is not failed fast. This was already flagged in the previous review.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 939 - 948, The TLS probe in
getServerCertInfo is still not using the spec context and it does not validate
the result of AppendCertsFromPEM. Update getServerCertInfo to accept and use the
context from the caller when performing the tls.Dial handshake so cancellation
can stop a stalled probe, and explicitly check the cert pool append result after
reading caPath so invalid CA PEM fails fast with an error.

Source: Path instructions


773-825: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Compare recreated secret UIDs by name, not slice position.

The jsonpath output still drops secret names, so Line 825 assumes stable ordering. Parse names and UIDs together, then compare metadata.uid keyed by metadata.name. This was already flagged in the previous review.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/apiserver/tls.go` around lines 773 - 825, The UID verification
in the recreated secret check is still matching by slice position from the
jsonpath output, so the comparison can be wrong if the order changes. Update the
logic in the secret polling block that uses `uidsOld`, `uidsNew`, and
`wait.PollUntilContextTimeout` to fetch and compare each secret’s `metadata.uid`
keyed by `metadata.name` instead of relying on array order. Keep the existing
secret names `encryption-config-openshift-apiserver` and
`encryption-config-openshift-kube-apiserver`, but parse the name/UID pairs and
compare them by name before deciding the secrets were recreated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/apiserver/tls.go`:
- Around line 462-498: The certificate-generation flow in the TLS test helper
still shells out with plain exec.Command and uses RSA-2048 keys, which conflicts
with the cancellation and crypto guidance. Update the openssl invocations in the
certificate setup path to use exec.CommandContext with the existing ctx so child
processes stop on timeout/cancel, and switch the key/certificate generation in
this helper to Ed25519 or ECDSA P-256+ instead of RSA-2048. Keep the changes
localized around the openssl genrsa/req/x509 sequence in the TLS generation
logic so the helper remains consistent with the rest of the certificate setup.

---

Duplicate comments:
In `@test/extended/apiserver/tls.go`:
- Around line 913-928: The certificate-checking logic in the helper that
iterates parsed PEM blocks still lets all parse failures pass and succeeds when
no valid certificates are found. Update the flow around x509.ParseCertificate so
parsing failures are asserted instead of only logged, and add an explicit
expectation that the certificate counter is greater than zero before returning;
use the existing loop and count handling in the tls.go certificate verification
helper to locate the fix.
- Around line 620-622: The regexes used in getEncryptionKeyNumber matching are
still too loose because they can match substrings and non-numeric suffixes;
update the encryption-key name patterns in the tls.go test helpers to use
anchored numeric-only matches. Specifically, wherever the test calls
getEncryptionKeyNumber for openshift-apiserver and openshift-kube-apiserver
keys, replace the current patterns with anchored forms like
^encryption-key-openshift-apiserver-[0-9]+$ and
^encryption-key-openshift-kube-apiserver-[0-9]+$ so the matches are exact.
- Around line 311-315: The TLS secret validation in the certificate test
currently skips over malformed kubernetes.io/tls secrets when tls.crt is
missing, but this should fail the test instead. Update the logic in the
secret-processing path around rawcrt handling to assert or require tls.crt
rather than using e2e.Logf and continue, so broken inputs are reported
immediately. Keep the existing validation flow in the tls.go test helper, but
make malformed TLS secrets a hard failure instead of a skipped case.
- Around line 420-421: The kubeconfig backup/copy logic in the tls test helpers
still shells out via exec.Command("bash", "-c", ...) and relies on brittle text
replacement, which should be removed. Update the affected helper code around
originKubeconfig handling to use Go file APIs for copying and rewriting the
kubeconfig contents directly, and apply the same change in the later duplicate
block referenced in the comment. Keep the fix localized to the kubeconfig
copy/rewrite flow so the behavior stays the same without using bash, cp, or
sed-like replacement.
- Around line 416-430: Cleanup in the TLS/apiserver test teardown is still
ignoring restore failures and using the wrong context lifetime, which can leave
cluster state partially modified. Update the deferred cleanup in the apiserver
TLS test helpers around the restore logic in the same cleanup block to capture
and surface patch/restore errors instead of discarding them, and run the cleanup
operations under a detached timeout context rather than the parent test context.
Apply the same fix to the other matching cleanup blocks referenced by the same
teardown pattern so all restore paths in the TLS test suite handle failures
consistently.
- Around line 693-699: The migration wait in the TLS test only covers the
kube-apiserver secret, so the route-prefix check can run before the
openshift-apiserver migration finishes. Update the logic around
waitEncryptionKeyMigration in the tls.go test to also wait for the OpenShift API
server encryption migration using the corresponding openshift-apiserver secret
name before continuing to the assertion.
- Around line 939-948: The TLS probe in getServerCertInfo is still not using the
spec context and it does not validate the result of AppendCertsFromPEM. Update
getServerCertInfo to accept and use the context from the caller when performing
the tls.Dial handshake so cancellation can stop a stalled probe, and explicitly
check the cert pool append result after reading caPath so invalid CA PEM fails
fast with an error.
- Around line 773-825: The UID verification in the recreated secret check is
still matching by slice position from the jsonpath output, so the comparison can
be wrong if the order changes. Update the logic in the secret polling block that
uses `uidsOld`, `uidsNew`, and `wait.PollUntilContextTimeout` to fetch and
compare each secret’s `metadata.uid` keyed by `metadata.name` instead of relying
on array order. Keep the existing secret names
`encryption-config-openshift-apiserver` and
`encryption-config-openshift-kube-apiserver`, but parse the name/UID pairs and
compare them by name before deciding the secrets were recreated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5bd7a7a2-46e2-40e2-81cd-4d3a78b4c6b3

📥 Commits

Reviewing files that changed from the base of the PR and between 91037b8 and 5943904.

📒 Files selected for processing (1)
  • test/extended/apiserver/tls.go

Comment thread test/extended/apiserver/tls.go Outdated
@YamunadeviShanmugam YamunadeviShanmugam marked this pull request as ready for review July 2, 2026 07:53
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2026
@openshift-ci openshift-ci Bot requested review from deads2k and jan--f July 2, 2026 07:55
   Extract helpers, fix security, prevent panics
@YamunadeviShanmugam YamunadeviShanmugam force-pushed the OTE_migration_audit_tests branch from f28c1e4 to dcaf11e Compare July 2, 2026 08:41
@YamunadeviShanmugam

Copy link
Copy Markdown
Contributor Author

/test verify

@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@YamunadeviShanmugam: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify dcaf11e link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants