Skip to content

ci: cap cargo build parallelism to fit the 5Gi cachekit runner#51

Merged
27Bslash6 merged 1 commit into
mainfrom
ci/cap-cargo-build-jobs
Jun 17, 2026
Merged

ci: cap cargo build parallelism to fit the 5Gi cachekit runner#51
27Bslash6 merged 1 commit into
mainfrom
ci/cap-cargo-build-jobs

Conversation

@27Bslash6

@27Bslash6 27Bslash6 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Cap cargo build parallelism on the self-hosted CI workflows by setting CARGO_BUILD_JOBS: "4", plus add fail-fast timeout-minutes backstops.

  • ci.yml — top-level env cap; timeout-minutes: 20 on test and security
  • security.yml — append cap to the existing env block (covers the cargo-fuzz ASAN builds)
  • codeql.yml — env cap; timeout-minutes: 30 on analyze (cold cargo build --release)
  • release.ymlunchanged, its publish job already runs on ubuntu-latest

Why

cargo's available_parallelism() reads the node's 32 threads (the 5950X), not the pod's cgroup CPU quota, so a cold build fans out ~22-way rustc/ld inside the 5Gi cachekit runner cgroup. At that fan-out, peak link-time RSS can exceed 5Gi → the kernel OOM-kills the linker → the runner "loses communication with the server" and the job dies at the ~10-min heartbeat with no logs.

This is the same root cause diagnosed and fixed in cachekit-rs #25/#26. cachekit-core has been spared so far only because:

  1. Swatinem/rust-cache keeps the dep graph warm, so the full cold compile only happens on a cache miss (Cargo.lock bump, toolchain rotation, cache eviction).
  2. The core crate's dependency graph is smaller than cachekit-rs's async/TLS/HTTP stack.

Capping -j removes the latent OOM regardless of cache state and is effectively free on a warm cache (4 jobs is plenty when most crates are already built).

Note: for the cargo-fuzz ASAN builds in security.yml, -j4 bounds parallel codegen but not ASAN's absolute footprint — a strong mitigation there, a full fix for the normal cargo test/build/check jobs.

Summary by CodeRabbit

Chores

  • Configured build job timeout safeguards across CI workflows to prevent extended hangs.
  • Optimised Rust build parallelism settings across CI and security scanning workflows for improved stability.

cargo's available_parallelism() reads the node's 32 threads, not the pod's cgroup CPU quota, so a cold (cache-miss) build fans out ~22-way and can OOM-kill the linker in the 5Gi cachekit runner — the same failure diagnosed and fixed in cachekit-rs (#25/#26). cachekit-core has been spared so far only by Swatinem/rust-cache keeping builds warm and a smaller dep graph.

Add CARGO_BUILD_JOBS=4 to ci.yml, security.yml, and codeql.yml (the workflows that build cargo on the self-hosted 5Gi pool) plus fail-fast timeout-minutes backstops. release.yml is unaffected — its publish job runs on ubuntu-latest.
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d8c2dcb1-ef7a-4f1e-a257-1e77da000fff

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5ebc1 and 8976173.

📒 Files selected for processing (3)
  • .github/workflows/ci.yml
  • .github/workflows/codeql.yml
  • .github/workflows/security.yml

Walkthrough

Three GitHub Actions workflow files (ci.yml, codeql.yml, security.yml) receive a new top-level CARGO_BUILD_JOBS: "4" environment variable to cap Rust build parallelism. Additionally, ci.yml adds timeout-minutes: 20 to both the test and security jobs.

Changes

CI OOM mitigation and timeout guardrails

Layer / File(s) Summary
CARGO_BUILD_JOBS=4 env cap across all workflows
.github/workflows/ci.yml, .github/workflows/codeql.yml, .github/workflows/security.yml
Adds a workflow-level env block setting CARGO_BUILD_JOBS: "4" with inline comments explaining it limits Rust codegen parallelism to prevent linker OOM kills during cold and ASAN fuzz builds on the cachekit runner.
Job-level timeouts in ci.yml
.github/workflows/ci.yml
Adds timeout-minutes: 20 to the test job and the security job to bound maximum execution time and prevent indefinite hangs.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: capping Cargo build parallelism to address OOM issues on the memory-constrained cachekit runner.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/cap-cargo-build-jobs

Comment @coderabbitai help to get the list of available commands and usage tips.

@27Bslash6 27Bslash6 merged commit 98f85f1 into main Jun 17, 2026
30 checks passed
@27Bslash6 27Bslash6 deleted the ci/cap-cargo-build-jobs branch June 17, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant