test(assets): move cache cold-vs-hit timing out of the gate into a bench#21
Merged
Merged
Conversation
The M0.6 cache_diff test asserted wall-clock ratios (hit < 10 ms, plus miss > hit*20) inside `zig build test`. A single cache-hit sample can spike on a page fault / AV scan / cold directory — ~3 ms observed even on an idle dev box — which red-fails on slower or Windows CI runners and also inflates the relative threshold (3 ms hit * 20 = 60 ms > the cold cook). Flagged as pre-existing M0.6 debt in the M0.7 brief (Acted deviations -> "Known debt left untouched"). The correctness gate now asserts only deterministic, cross-host facts: the miss -> hit transition and a byte-identical cached artifact. The host- and load-dependent cold-vs-hit differential moves to a new `zig build bench-asset-cache` (bench/asset_cache.zig), archived and non-blocking like every other perf number, measured under the opposable protocol on the reference machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
tests/assets/cache_diff.zig(M0.6 / E4) asserted wall-clock ratios inside thezig build testcorrectness gate —hit < 10 ms(absolute) plusmiss > hit*20(relative). Both are fragile across hosts and CPU load: a single cache-hit sample can spike on a page fault / AV scan / cold directory, which red-fails on slower or Windows CI runners. The spike also inflates the relative threshold (e.g. a 3 ms hit ⇒ needs miss > 60 ms, but a ReleaseSafe cold cook is only ~13 ms ⇒ fail).This was flagged as pre-existing M0.6 debt during M0.7 / E3 Windows validation — see
briefs/M0.7-ipc-scm-rights-windows-fuzz.md§ Acted deviations → "Known debt left untouched" — to be resolved separately by making the assertion tolerant or moving it out of the gate.Measured evidence (idle dev Mac, ReleaseSafe, 16 MiB asset): cold p50 12.7 ms, hit p50 184 µs (69× speedup) — but hit max 2.98 ms. On a loaded Windows runner that single-sample spike breaks both the old absolute and relative checks.
What changed
Picked option 2 — move the timing to the bench suite, keep a deterministic functional assertion in the gate. This is the only option that makes the gate fully deterministic + cross-host, and it matches the repo's established methodology: every perf number lives in a separate
bench-*step (ECS, IPC RTT, adler32, paeth, etch), archived and non-blocking, measured under the opposable protocol on the reference machine — never insidezig build test.tests/assets/cache_diff.zig— stripped all wall-clock; the gate now asserts only deterministic facts: the miss → hit transition and a byte-identical cached artifact. Asset shrunk 16 MiB → 256 KiB (the large size only existed to make the cold cook expensive for timing, now the bench's job), keeping the gate fast on every host.bench/asset_cache.zig(new) —zig build bench-asset-cache, multi-sample cold-vs-hit differential,--smokefor CI, writes the gitignoredbench/out/asset_cache_<os>.md. Mirrors the adler32 / paeth / render bench conventions.build.zig— registered thebench-asset-cachestep alongside the other M0.6 benches.Verification
zig build test(Debug)zig build test -Doptimize=ReleaseSafezig build lint/zig buildzig fmt --checkpre-pushhook (build + test + test-release)(The macOS full-suite
failed commandlines for events/plugin_loader are pre-existing spurious output; the suites exit 0.)Scope
No IPC / M0.7 code touched. Branched off
main(this is an independent M0.6-debt fix, not stacked on the unmerged M0.7 work).🤖 Generated with Claude Code