no-jira: test/monitoring: increase load balancer readiness and curl connection timeout#30994
no-jira: test/monitoring: increase load balancer readiness and curl connection timeout#30994tthvo wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@tthvo: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughIncreased the initial HTTP reachability check timeout from 10 to 20 minutes in the load-balancer disruption test; added Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tthvo The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Scheduling required tests: |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/monitor/backenddisruption/disruption_backend_sampler.go`:
- Around line 235-237: Validate the timeout in BackendSampler.WithTimeout and
refuse non-positive values: check if timeout <= 0 and if so do not set b.timeout
(return b unchanged) so callers cannot accidentally disable timeouts; otherwise
set b.timeout = &timeout and return b. This ensures BackendSampler.WithTimeout
enforces a positive time.Duration (use the existing BackendSampler and
WithTimeout identifiers and the b.timeout field).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 304a4f0e-faeb-43e6-99fa-921457c4fef5
📒 Files selected for processing (2)
pkg/monitor/backenddisruption/disruption_backend_sampler.gopkg/monitortests/network/disruptionserviceloadbalancer/monitortest.go
✅ Files skipped from review due to trivial changes (1)
- pkg/monitortests/network/disruptionserviceloadbalancer/monitortest.go
|
Scheduling required tests: |
|
/test e2e-aws-ovn-fips |
The service-type-load-balancer-availability monitor test was timing out during preparation when waiting for the load balancer to become reachable. DNS SOA records for AWS ELB in EUSC region show a 900s (15-minute) retry interval, which can cause the load balancer hostname resolution to take longer than the previous 10-minute timeout. Increase the timeout from 10 to 20 minutes to accommodate slow DNS propagation in regions with high TTL values.
|
/retitle no-jira: test/monitoring: increase load balancer readiness and curl connection timeout |
The service-load-balancer-with-pdb disruption test was experiencing DNS resolution failures when connecting to the load balancer during test execution. AWS ELB DNS zones have an SOA TTL of 900 seconds, which means DNS propagation can take up to 15 minutes for newly created load balancers. This change adds a WithTimeout() method to BackendSampler and sets a 120-second timeout for the service load balancer disruption samplers, allowing sufficient time for DNS retries during the propagation period. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
A couple of
I think the slow DNS propagation can be delegated to AWS to improve their infrastructure in the near future (this EUSC cloud is net new - Jan 2026). However, I am taking the proactive approach here and this should help better avoid flakes 🤔 |
|
Scheduling required tests: |
|
/test e2e-gcp-csi |
|
@tthvo: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR increases the timeout for load balancer readiness and curl connection to address slow DNS propagation on cloud provider, for example, AWS EU Sovereign Cloud.
Background
AWS Load balancer DNS SOA records for AWS ELB in EUSC region show a 900s (15-minute) retry interval, which can cause the load balancer hostname resolution (if first hit
NXDOMAIN) to take longer than the previous 10-minute timeout. Additionally, it can take a few more minutes for DNS to fully propagate to its nameserver.References
We are seeing a similar issue in CCM hairpin tests: openshift/cluster-cloud-controller-manager-operator#449.