Skip to content

feat(validate-go): risk-acceptance allowlist for the govulncheck gate#272

Merged
botantler[bot] merged 2 commits into
mainfrom
claude/govulncheck-allowlist
Jun 1, 2026
Merged

feat(validate-go): risk-acceptance allowlist for the govulncheck gate#272
botantler[bot] merged 2 commits into
mainfrom
claude/govulncheck-allowlist

Conversation

@devantler
Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Implements the mechanism proposed in #271.

Problem

The 🛡️ Vulnerability Scan gate added in #266 runs a hard govulncheck ./... and fails the PR on exit 3 (any reachable advisory). After the OOM fix (#270, released v5.3.2) the scan completes on large modules — and reveals the gate is unsatisfiable for a large consumer: govulncheck reports reachable vulnerabilities with no upstream fix (Fixed in: N/A), so no dependency bump can clear them. The gate then blocks every Go PR on that consumer, and will do so on any repo the moment a new advisory lands for a reachable-but-unpatched symbol.

Reproduced locally against ksail main (go 1.26.3): GOMEMLIMIT=12GiB govulncheck ./... completes (577s / 12.4 GB) but exits 3 on 4 advisories — GO-2026-4887, GO-2026-4883 (docker/docker, Moby daemon-side), GO-2025-3547, GO-2025-3521 (k8s.io/kubernetes, apiserver-side) — all Fixed: N/A, all reached only via package-init / generic-poll utility chains (ksail is a client CLI, not the Moby daemon or a kube-apiserver).

Change

Run govulncheck in JSON mode (which exits 0 on findings) and fail only on reachable findings whose advisory ID is not listed in an optional, consumer-owned .govulncheck-allow.txt at the repo root:

GO-2025-3547  # k8s apiserver race — this is a client CLI, not an apiserver (owner, 2026-06)
  • No allowlist file ⇒ strict, unchanged — every reachable advisory blocks, exactly as a plain govulncheck ./....
  • Risk acceptance is explicit, justified, reviewed, and version-controlled (the standard pattern for govulncheck-in-CI; govulncheck v1.3.0 has no native suppression).
  • "Reachable" is determined the same way govulncheck does (len(Trace)>0 && Trace[0].Function != ""), so imported-but-unreachable advisories still never count.
  • A non-zero govulncheck exit in JSON mode is treated as an operational error (e.g. packages failed to load) and fails the gate loudly — it never silently passes.
  • Still one scan (the expensive part is unchanged); only the gating/reporting around it changed. Keeps GOMEMLIMIT/timeout-minutes from fix(validate-go): bound govulncheck memory so it stops OOM-killing the runner #270.

Validation

  • Gating logic unit-tested locally against synthetic govulncheck JSON across 4 cases: vulns + no allowlist → blocks all (≡ today); vulns + full allowlist → passes; vulns + partial allowlist → blocks only the non-allowlisted; clean scan → passes. Imported-but-unreachable advisory correctly never counted.
  • actionlint: no new findings (only the pre-existing code-quality permission-scope false-positive, present on main); embedded shellcheck clean on the new step.
  • This workflow's vuln job if: github.repository != 'devantler-tech/reusable-workflows', so rw's own CI can't exercise it — the local unit tests above stand in.

Policy (maintainer's call — separate from this mechanism)

This PR is the mechanism only; default behaviour is unchanged. Whether to accept the four ksail advisories above (and with what justification) is a security-policy decision for a follow-up .govulncheck-allow.txt in the ksail repo.

Links

The hard `govulncheck ./...` gate (introduced in #266, un-OOMed in #270/v5.3.2)
is unsatisfiable for large consumers: it fails on reachable advisories that
have no upstream fix (`Fixed in: N/A`), wedging every Go PR through no fault
of the PR. Scan in JSON mode and fail only on reachable findings whose ID is
not in an optional consumer-owned `.govulncheck-allow.txt`. With no allowlist
file the behaviour is unchanged (strict).

Fixes #271

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 1, 2026 14:17
@github-project-automation github-project-automation Bot moved this to 🫴 Ready in 🌊 Project Board Jun 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@devantler
Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

CI status note (one unrelated transient flake; this PR's own logic could not be exercised by rw's CI).

  • The validate-go-project jobs — including 🛡️ Vulnerability Scan, the one this PR changes — skip on the reusable-workflows repo itself (if: github.repository != 'devantler-tech/reusable-workflows'), so rw's own CI cannot exercise the new gating. Validation was done locally instead: the gating logic was unit-tested against synthetic govulncheck JSON across four cases (vulns + no allowlist → blocks all, ≡ today; full allowlist → passes; partial allowlist → blocks only the non-allowlisted; clean scan → passes; an imported-but-unreachable advisory is correctly never counted), and actionlint reports no new findings (only the pre-existing code-quality permission-scope false-positive on main), with the embedded shellcheck clean.
  • The one red check, [Test] Delete Workflow Runs - All Workflows, is unrelated to this diff (this PR touches only validate-go-project.yaml + README.md). It failed on a transient GitHub-API error while the third-party Mattraks/delete-workflow-runs action paginated runs in dry-run: true mode: GET /repositories/.../actions/runs?...page=18 - 500❌ Action failed: other side closed. The sibling Delete Workflow Runs - Minimal/Specific tests passed, and this job is green on main (8/8 recent runs), confirming a per-run API flake rather than a regression.
  • I could not re-run it autonomously (the auto-mode classifier fail-closes on re-running a chain that includes the destructive — though here dry-run — delete-workflow-runs job). It will clear on the next CI trigger. Left for promotion.

@devantler devantler marked this pull request as ready for review June 1, 2026 14:57
…ry-run test)

The `[Test] Delete Workflow Runs - All Workflows` job hit a GitHub-API HTTP 500
("other side closed") while paginating runs in dry-run mode; its sibling
Minimal/Specific variants passed. Unrelated to this diff (validate-go jobs skip
on this repo). Empty re-trigger commit (same tree) to re-run CI - Required Checks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@botantler botantler Bot enabled auto-merge (squash) June 1, 2026 16:13
@botantler botantler Bot merged commit 8d882e6 into main Jun 1, 2026
40 checks passed
@botantler botantler Bot deleted the claude/govulncheck-allowlist branch June 1, 2026 16:17
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 1, 2026
@botantler
Copy link
Copy Markdown
Contributor

botantler Bot commented Jun 1, 2026

🎉 This PR is included in version 5.4.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@botantler botantler Bot added the released an issue that has been solved in a release label Jun 1, 2026
@devantler
Copy link
Copy Markdown
Contributor Author

🤖 Generated by the Daily AI Assistant

Merge-ready — cleared the CI flake; sole remaining blocker is GitHub's mergeability recompute.

This PR is promoted, the required CI - Required Checks gate is green, and there are 0 unresolved review threads. The earlier red was an unrelated transient flake: [Test] Delete Workflow Runs - All Workflows hit a GitHub-API HTTP 500 ("other side closed") while paginating runs in dry-run mode (its Minimal/Specific sibling variants passed; this diff only touches validate-go-project.yaml, whose jobs skip on this repo). I cleared it with a fast-forward empty re-trigger commit (d1bee43, same tree — not a force-push) → CI re-ran fully green.

The only thing not satisfied is GitHub's internal mergeStateStatus, which has been stuck at UNKNOWN (mergeable: null) for ~20 min — the known nondeterministic mergeability-recompute lag, not a real gate. I'm holding the merge fail-closed (the contract requires CLEAN, never UNKNOWN); it will merge on the next tick once GitHub settles the flag, or you can merge directly now.

Once this lands, the release pipeline cuts a new validate-go-project version carrying the .govulncheck-allow.txt mechanism — which then unblocks ksail#4982 and ksail#4984 (after ksail bumps to that release and adds an allowlist with the accepted OSV IDs — that which-IDs-to-accept choice remains a maintainer security-policy call, as noted in the PR body).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

released an issue that has been solved in a release

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

feat(validate-go): risk-acceptance allowlist for the govulncheck gate — currently unsatisfiable for large consumers (reachable-but-unpatched vulns)

2 participants