You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sub-issue of #744. Before reading further, read #744 in full, including all of its comments, where the benchmarking tool's design is worked out in detail. This issue is one slice of that design and assumes the decisions recorded in those comments.
This is step 5 of 5. It depends on #783 and builds on the gating from #784.
Scope
The premise (detailed in the #744 comments): an application developer runs fedify bench in their own CI, where runners are too noisy to gate on precise latency, so the gate leans on robust signals and on same-runner comparison.
fedify bench compare
fedify bench compare --base <ref> --head <ref> runs the base and head revisions of the application against the same runner and reports the delta. Running both halves on one machine cancels the runner's absolute speed, which is the only trustworthy way to detect a latency regression in noisy CI.
--max-regression sets the tolerance, and a regression fails only when the delta exceeds the measured inter-run noise band, which is reported so the gate stays interpretable.
Gate robustness in the run engine
Median-of-N aggregation (runs, default 3) for latency and throughput gates, so one unlucky run does not fail the build; correctness gates (success rate, errors) need only a single run.
A CI-safe profile (correctness plus gross bounds) and a perf-lab profile (tight latency with compare in a controlled environment), to make the boundary concrete and steer teams away from brittle tight-latency gates in shared CI.
Dependencies
Depends on #783 (the engine); builds on the expect gating and severity from #784.
Acceptance criteria
fedify bench compare runs base and head on the same runner and reports a noise-aware delta gated by --max-regression.
Median-of-N (runs) aggregation applies to latency and throughput gates.
CI-safe and perf-lab example profiles are provided.
Documentation
Add compare and the CI-safe versus perf-lab guidance to docs/manual/benchmarking.md, drawing the line between what CI should gate and what belongs in a controlled environment.
Note
Sub-issue of #744. Before reading further, read #744 in full, including all of its comments, where the benchmarking tool's design is worked out in detail. This issue is one slice of that design and assumes the decisions recorded in those comments.
This is step 5 of 5. It depends on #783 and builds on the gating from #784.
Scope
The premise (detailed in the #744 comments): an application developer runs
fedify benchin their own CI, where runners are too noisy to gate on precise latency, so the gate leans on robust signals and on same-runner comparison.fedify bench comparefedify bench compare --base <ref> --head <ref>runs the base and head revisions of the application against the same runner and reports the delta. Running both halves on one machine cancels the runner's absolute speed, which is the only trustworthy way to detect a latency regression in noisy CI.--max-regressionsets the tolerance, and a regression fails only when the delta exceeds the measured inter-run noise band, which is reported so the gate stays interpretable.Gate robustness in the run engine
runs, default 3) for latency and throughput gates, so one unlucky run does not fail the build; correctness gates (success rate, errors) need only a single run.warn/failseverity from Benchmarking: required scenarios end-to-end with safety guard and CI gating #784.Example profiles
comparein a controlled environment), to make the boundary concrete and steer teams away from brittle tight-latency gates in shared CI.Dependencies
Depends on #783 (the engine); builds on the
expectgating and severity from #784.Acceptance criteria
fedify bench compareruns base and head on the same runner and reports a noise-aware delta gated by--max-regression.runs) aggregation applies to latency and throughput gates.Documentation
Add
compareand the CI-safe versus perf-lab guidance to docs/manual/benchmarking.md, drawing the line between what CI should gate and what belongs in a controlled environment.