ci: move mutants-cli off lean-mem to rust-cpu (#523) by avrabe · Pull Request #526 · pulseengine/rivet

avrabe · 2026-06-10T16:31:57Z

Closes #523 — implements the "Recommended fix (cheapest first)" option 1 from the issue body (author's stated "strongly preferred" choice).

Why

mutants-cli was running per-PR + per-push pinned to the 4-runner lean-mem pool. The 14-day audit in #523 measured it as the single largest consumer of that class (488 instances over 14 days against 4 runners), and the operator-visible consequence was that Miri (~17 h median wait, 43% fail rate) and Verus (~18 h median wait, 94% fail rate) were starving — queueing for a runner against the per-PR mutation churn while rust-cpu sat 86% idle.

This PR makes the one-line runner-pool change the issue body recommends and expands the surrounding comment block so the rationale survives drift.

Acceptance criteria (from #523)

mutants-cli no longer runs on lean-mem. runs-on: on .github/workflows/ci.yml:597 now resolves to [self-hosted, linux, x64, rust-cpu]. Grep confirms mutants-cli is the only lean-mem user removed; Miri (line 416), nightly mutants-core (line 496), and the other comment-only references to lean-mem are untouched.
lean-mem median job wait drops back under a few minutes; Miri/Verus stop hitting multi-hour queues. This is environmental: only verifiable by operator observation against the runner pool after this lands. Once merged, the same query that produced the audit table in ci: mutants-cli runs per-PR on the scarce lean-mem pool, starving Miri/Verus (14-day audit) #523 (gh api ... jobs?status=completed, filter by runner_labels) should show the lean-mem median fall back from ~64 min to single-digit minutes within a day or two.

Why this is a draft

Same reason as #525 (and as carried across the recent triage threads): the hard triage rule requires consulting https://pulseengine.eu/blog/ before opening a PR, and the blog has been HTTP 503 throughout this run. The fix itself matches the author's "strongly preferred" option in the issue body verbatim, so the draft state is purely about clearing the workflow-guidance hard rule once the blog is reachable.

The change is small, comment-heavy, and reversible:

   mutants-cli:
     name: Mutation Testing (rivet-cli)
     needs: [test]
-    runs-on: [self-hosted, linux, x64, lean-mem]
+    runs-on: [self-hosted, linux, x64, rust-cpu]

(plus a 6-line comment block above documenting why this pool, so drift doesn't quietly re-pin.)

CI has a single point of failure: when the self-hosted runner pool goes offline, every gate queues forever with no fallback and no liveness alert #509 — self-hosted pool fragility / single-point-of-failure theme.
ci(miri): parallelize with cargo-nextest so Miri uses the cores (not 45 min on one) #521 — Miri parallelization via nextest, also targeting lean-mem starvation from the consumer side; complementary to this change (this removes a starver; ci(miri): parallelize with cargo-nextest so Miri uses the cores (not 45 min on one) #521 makes Miri use its share of lean-mem more efficiently).

Generated by Claude Code — issue-triage agent run 2026-06-10.

Generated by Claude Code

`mutants-cli` (`Mutation Testing (rivet-cli)`) was running on every PR and push pinned to the 4-runner `lean-mem` pool — the one runner class with no spare capacity. A 14-day audit of the self-hosted fleet showed it as the single largest consumer of that pool (488 instances), and as a direct consequence Miri (~17 h median wait, 43% fail rate) and Verus (~18 h median wait, 94% fail rate) were starving against `cancel-in-progress` PR-push churn while `rust-cpu` sat 86% idle. The fix is a one-line runner-pool change: `rivet-cli` is the small crate running `--jobs 2` with `--timeout 30`; the `rust-cpu` class (16 G `MemoryHigh`, 7 runners) handles it without contention. Per-PR mutation coverage is preserved, no cadence change is needed, and `lean-mem` is freed up for the genuinely RAM-bound gating jobs (Miri, Verus) plus the nightly `mutants-core` fan-out. Also extends the surrounding comment block to document why this pool choice matters so future drift doesn't quietly re-pin to `lean-mem`. The post-merge bullet of the issue's Acceptance ("lean-mem median job wait drops back under a few minutes") can only be confirmed by operator observation against the runner pool after this lands; the in-repo bullet ("mutants-cli no longer runs on lean-mem") is the diff itself. Note: the pulseengine.eu/blog/ workflow guidance was HTTP 503 throughout this triage run (same symptom carried across #420 / #516 / #522 / …), so this PR ships as a draft for maintainer review against the authoritative process posts once the blog is reachable. Refs: #523, #509

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Rivet Criterion Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite	Current: `4cc7823`	Previous: `e60a3a9`	Ratio
`traceability_matrix/1000`	`58318` ns/iter (`± 645`)	`43193` ns/iter (`± 499`)	`1.35`
`query/10000`	`333406` ns/iter (`± 1298`)	`236806` ns/iter (`± 4501`)	`1.41`

This comment was automatically generated by workflow using github-action-benchmark.

codecov · 2026-06-10T20:53:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

avrabe mentioned this pull request Jun 10, 2026

ci: mutants-cli runs per-PR on the scarce lean-mem pool, starving Miri/Verus (14-day audit) #523

Open

github-actions Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: move mutants-cli off lean-mem to rust-cpu (#523)#526

ci: move mutants-cli off lean-mem to rust-cpu (#523)#526
avrabe wants to merge 1 commit into
mainfrom
ci/issue-523-mutants-cli-runner-pool

avrabe commented Jun 10, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

codecov Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avrabe commented Jun 10, 2026

Why

Acceptance criteria (from #523)

Why this is a draft

Related

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

codecov Bot commented Jun 10, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants