fix(validate-go): bound govulncheck memory so it stops OOM-killing the runner#270
Merged
Merged
Conversation
…e runner govulncheck's default symbol scan builds a whole-program call graph whose peak memory grows with the module's dependency graph. On large modules (e.g. ksail, which imports Kubernetes/Flux/Talos/Omni clients) the scan exhausts the hosted runner's 16 GiB and the host kills the runner mid-scan (exit 143 / "runner has received a shutdown signal") with no govulncheck output — an opaque, retry-resistant gate failure that blocks every large-repo Go PR. Re-running just reproduces it. Cap the Go runtime heap with GOMEMLIMIT=12GiB so the GC reclaims aggressively and stays under the host ceiling instead of OOM-killing, and add timeout-minutes: 15 so any remaining worst case fails fast and legibly rather than hanging. Symbol-scan reachability semantics are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Mitigates GitHub-hosted runner OOM terminations during the govulncheck vulnerability scan in the validate-go-project reusable workflow by bounding Go runtime heap usage and limiting maximum job runtime.
Changes:
- Add a job-level
GOMEMLIMITto cap Go heap usage duringgovulncheckruns. - Add a
timeout-minuteslimit to ensure the scan fails fast and visibly instead of hanging or being OOM-killed. - Document the rationale inline in the workflow for future maintainers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
|
🎉 This PR is included in version 5.3.2 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This was referenced Jun 1, 2026
botantler Bot
pushed a commit
that referenced
this pull request
Jun 1, 2026
…#272) * feat(validate-go): risk-acceptance allowlist for the govulncheck gate The hard `govulncheck ./...` gate (introduced in #266, un-OOMed in #270/v5.3.2) is unsatisfiable for large consumers: it fails on reachable advisories that have no upstream fix (`Fixed in: N/A`), wedging every Go PR through no fault of the PR. Scan in JSON mode and fail only on reachable findings whose ID is not in an optional consumer-owned `.govulncheck-allow.txt`. With no allowlist file the behaviour is unchanged (strict). Fixes #271 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: re-trigger CI (transient GitHub-API 500 in delete-workflow-runs dry-run test) The `[Test] Delete Workflow Runs - All Workflows` job hit a GitHub-API HTTP 500 ("other side closed") while paginating runs in dry-run mode; its sibling Minimal/Specific variants passed. Unrelated to this diff (validate-go jobs skip on this repo). Empty re-trigger commit (same tree) to re-run CI - Required Checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #269
Problem
The
🛡️ Vulnerability Scanjob (govulncheck ./..., added in #266) kills the runner mid-scan on large Go modules. govulncheck's default-scan symbolbuilds a whole-program call graph whose peak memory grows with the dependency graph; on a k8s-scale module (ksail imports Kubernetes/Flux/Talos/Omni clients) it exhausts the hosted runner's 16 GiB and the host terminates the runner —exit 143/ "The runner has received a shutdown signal" / "The operation was canceled", with no govulncheck output. It's an effective gate (orgcode_qualityruleset on consumers), so it blocks the PR, and re-running just reproduces it.Two consecutive ksail#4982 runs both died ~2 min into the scan with zero output — see #269 for the run-by-run evidence. This regression (mine, from #266 ~5h ago) blocks every large-repo Go PR, not just ksail.
Change
permissions: contents: read + timeout-minutes: 15 + env: + GOMEMLIMIT: 12GiBGOMEMLIMIT=12GiB— caps the Go runtime heap so the GC reclaims aggressively and stays under the host ceiling instead of OOM-killing. The standard Go remedy for CI OOMs. (ubuntu-latest = 16 GiB; 12 GiB leaves headroom for the OS, Go toolchain subprocesses and harden-runner.)timeout-minutes: 15— bounds any remaining worst case to a fast, legible failure instead of a hung runner.Reachability semantics are unchanged — still
-scan symbol, still exit-3-on-reachable-vuln only.Validation
code-quality: writescope, unrelated to this change).GOMEMLIMIT=12GiBis valid Go syntax.if: github.repository != 'devantler-tech/reusable-workflows'), so the real proof is re-running ksail#4982's Vulnerability Scan against this branch's reusable workflow once it ships. I'll verify the rollout after merge.Trade-off / fallback (maintainer decision)
If a module's live call-graph genuinely exceeds the GOMEMLIMIT headroom, this converts the OOM into a clean timeout but the gate still can't pass. The robust fallback is switching the gate to
govulncheck -scan module ./...(deterministic, low-memory; catches every known-vulnerable dependency version), at the cost of losing reachability filtering. That's a deliberate semantic change, so it's documented in #269 rather than applied here.